nosql - Practical example for each type of database (real cases) -
there several types of database different purposes, mysql used everything, because know database. give example in company application of big data has mysql database @ initial stage, unbelievable , bring serious consequences company. why mysql? because no 1 know how (and when) should use dbms.
so, question not vendors, type of databases. can give me practical example of specific situations (or apps) each type of database highly recommended use it?
example:
• social network should use type x because of y.
• mongodb or couch db can't support transactions, document db not app bank or auctions site.
and on...
relational: mysql, postgresql, sqlite, firebird, mariadb, oracle db, sql server, ibm db2, ibm informix, teradata
object: zodb, db4o, eloquera, versant , objectivity db, velocitydb
graph databases: allegrograph, neo4j, orientdb, infinitegraph, graphbase, sparkledb, flockdb, brightstardb
key value-stores: amazon dynamodb, redis, riak, voldemort, foundationdb, leveldb, bangdb, kai, hamsterdb, tarantool, maxtable, hyperdex, genomu, memcachedb
column family: big table, hbase, hyper table, cassandra, apache accumulo
rdf stores: apache jena, sesame
multimodel databases: arangodb, datomic, orient db, fatdb, alchemydb
document: mongo db, couch db, rethink db, raven db, terrastore, jas db, raptor db, djon db, ejdb, denso db, couchbase
xml databases: basex, sedna, exist
hierarchical: intersystems caché, gt.m thanks @laurent parenteau
i found 2 impressive articles subject.
all credits highscalability.com. information transcribed these urls.
http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html
http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html
if application needs...
• complex transactions because can't afford lose data or if simple transaction programming model @ relational or grid database.
• example: inventory system might want full acid. unhappy when bought product , said later out of stock. did not want compensated transaction. wanted item!
• to scale nosql or sql can work. systems support scale-out, partitioning, live addition , removal of machines, load balancing, automatic sharding , rebalancing, , fault tolerance.
• always able write database because need high availability @ bigtable clones feature eventual consistency.
• handle lots of small continuous reads , writes, may volatile, @ document or key-value or databases offering fast in-memory access. consider ssd.
• implement social network operations first may want graph database or second, database riak supports relationships. in- memory relational database simple sql joins might suffice small data sets. redis' set , list operations work too.
• operate on a wide variety of access patterns , data types @ document database, flexible , perform well.
• powerful offline reporting large datasets @ hadoop first , second, products support mapreduce. supporting mapreduce isn't same being @ it.
• span multiple data-centers @ bigtable clones , other products offer distributed option can handle long latencies , partition tolerant.
• build crud apps @ document database, make easy access complex data without joins.
• built-in search @ riak.
• operate on data structures lists, sets, queues, publish-subscribe @ redis. useful distributed locking, capped logs, , lot more.
• programmer friendliness in form of programmer friendly data types json, http, rest, javascript first @ document databases , key-value databases.
• transactions combined materialized views real-time data feeds @ voltdb. great data-rollups , time windowing.
• enterprise level support , slas product makes point of catering market. membase example.
• log continuous streams of data may have no consistency guarantees necessary @ @ bigtable clones because work on distributed file systems can handle lot of writes.
• as simple possible operate hosted or paas solution because work you.
• sold enterprise customers consider relational database because used relational technology.
• dynamically build relationships between objects have dynamic properties consider graph database because not require schema , models can built incrementally through programming.
• support large media storage services s3. nosql systems tend not handle large blobs, though mongodb has file service.
• bulk upload lots of data , efficiently product supports scenario. not because don't support bulk operations.
• easier upgrade path use fluid schema system document database or key-value database because supports optional fields, adding fields, , field deletions without need build entire schema migration framework.
• implement integrity constraints pick database support sql ddl, implement them in stored procedures, or implement them in application code.
• very deep join depth use graph database because support blisteringly fast navigation between entities.
• move behavior close data data doesn't have moved on network @ stored procedures of 1 kind or another. these can found in relational, grid, document, , key-value databases.
• cache or store blob data @ key-value store. caching can bits of web pages, or save complex objects expensive join in relational database, reduce latency, , on.
• proven track record not corrupting data , working pick established product , when hit scaling (or other issues) use on of common workarounds (scale-up, tuning, memcached, sharding, denormalization, etc).
• fluid data types because data isn't tabular in nature, or requires flexible number of columns, or has complex structure, or varies user (or whatever), @ document, key-value, , bigtable clone databases. each has lot of flexibility in data types.
• other business units run quick relational queries don't have reimplement use database supports sql.
• operate in cloud , automatically take full advantage of cloud features may not there yet.
• support secondary indexes can data different keys @ relational databases , cassandra's new secondary index support.
• creates ever-growing set of data (really bigdata) gets accessed @ bigtable clone spread data on distributed file system.
• integrate other services check if database provides sort of write-behind syncing feature can capture database changes , feed them other systems ensure consistency.
• fault tolerance check how durable writes in face power failures, partitions, , other failure scenarios.
• push technological envelope in direction nobody seems going build because that's takes great sometimes.
• work on mobile platform @ couchdb/mobile couchbase.
general use cases (nosql)
• bigness. nosql seen key part of new data stack supporting: big data, big numbers of users, big numbers of computers, big supply chains, big science, , on. when becomes massive must become massively distributed, nosql there, though not nosql systems targeting big. bigness can across many different dimensions, not using lot of disk space.
• massive write performance. canonical usage based on google's influence. high volume. facebook needs store 135 billion messages month. twitter, example, has problem of storing 7 tb/data per day prospect of requirement doubling multiple times per year. data big fit on 1 node problem. @ 80 mb/s takes day store 7tb writes need distributed on cluster, implies key-value access, mapreduce, replication, fault tolerance, consistency issues, , rest. faster writes in-memory systems can used.
• fast key-value access. second cited virtue of nosql in general mind set. when latency important it's hard beat hashing on key , reading value directly memory or in little 1 disk seek. not every nosql product fast access, more reliability, example. people have wanted long time better memcached , many nosql systems offer that.
• flexible schema , flexible datatypes. nosql products support whole range of new data types, , major area of innovation in nosql. have: column-oriented, graph, advanced data structures, document-oriented, , key-value. complex objects can stored without lot of mapping. developers love avoiding complex schemas , orm frameworks. lack of structure allows more flexibility. have program , programmer friendly compatible datatypes likes json.
• schema migration. schemalessness makes easier deal schema migrations without worrying. schemas in sense dynamic, because imposed application @ run-time, different parts of application can have different view of schema.
• write availability. writes need succeed no mater what? can partitioning, cap, eventual consistency , jazz.
• easier maintainability, administration , operations. product specific, many nosql vendors trying gain adoption making easy developers adopt them. spending lot of effort on ease of use, minimal administration, , automated operations. can lead lower operations costs special code doesn't have written scale system never intended used way.
• no single point of failure. not every product delivering on this, seeing definite convergence on relatively easy configure , manage high availability automatic load balancing , cluster sizing. perfect cloud partner.
• generally available parallel computing. seeing mapreduce baked products, makes parallel computing normal part of development in future.
• programmer ease of use. accessing data should easy. while relational model intuitive end users, accountants, it's not intuitive developers. programmers grok keys, values, json, javascript stored procedures, http, , on. nosql programmers. developer led coup. response database problem can't hire knowledgeable dba, schema right, denormalize little, etc., programmers prefer system can make work themselves. shouldn't hard make product perform. money part of issue. if costs lot scale product won't go cheaper product, control, that's easier use, , that's easier scale?
• use right data model right problem. different data models used solve different problems. effort has been put into, example, wedging graph operations relational model, doesn't work. isn't better solve graph problem in graph database? seeing general strategy of trying find best fit between problem , solution.
• avoid hitting wall. many projects hit type of wall in project. they've exhausted options make system scale or perform , wondering next? it's comforting select product , approach can jump on wall linearly scaling using incrementally added resources. @ 1 time wasn't possible. took custom built everything, that's changed. seeing usable out-of-the-box products project can readily adopt.
• distributed systems support. not worried scale or performance on , above can achieved non-nosql systems. need distributed system can span datacenters while handling failure scenarios without hiccup. nosql systems, because have focussed on scale, tend exploit partitions, tend not use heavy strict consistency protocols, , positioned operate in distributed scenarios.
• tunable cap tradeoffs. nosql systems products "slider" choosing want land on cap spectrum. relational databases pick strong consistency means can't tolerate partition failure. in end business decision , should decided on case case basis. app care consistency? few drops ok? app need strong or weak consistency? availability more important or consistency? being down more costly being wrong? it's nice have products give choice.
• more specific use cases
• managing large streams of non-transactional data: apache logs, application logs, mysql logs, clickstreams, etc.
• syncing online , offline data. niche couchdb has targeted.
• fast response times under loads.
• avoiding heavy joins when query load complex joins become large rdbms.
• soft real-time systems low latency critical. games 1 example.
• applications wide variety of different write, read, query, , consistency patterns need supported. there systems optimized 50% reads 50% writes, 95% writes, or 95% reads. read-only applications needing extreme speed , resiliency, simple queries, , can tolerate stale data. applications requiring moderate performance, read/write access, simple queries, authoritative data. read-only application complex query requirements.
• load balance accommodate data , usage concentrations , keep microprocessors busy.
• real-time inserts, updates, , queries.
• hierarchical data threaded discussions , parts explosion.
• dynamic table creation.
• 2 tier applications low latency data made available through fast nosql interface, data can calculated , updated high latency hadoop apps or other low priority apps.
• sequential data reading. right underlying data storage model needs selected. b-tree may not best model sequential reads.
• slicing off part of service may need better performance/scalability onto it's own system. example, user logins may need high performance , feature use dedicated service meet goals.
• caching. high performance caching tier web sites , other applications. example cache data aggregation system used large hadron collider. voting.
• real-time page view counters.
• user registration, profile, , session data.
• document, catalog management , content management systems. these facilitated ability store complex documents has whole rather organized relational tables. similar logic applies inventory, shopping carts, , other structured data types.
• archiving. storing large continual stream of data still accessible on-line. document-oriented databases flexible schema can handle schema changes on time.
• analytics. use mapreduce, hive, or pig perform analytical queries , scale-out systems support high write loads.
• working heterogenous types of data, example, different media types @ generic level.
• embedded systems. don’t want overhead of sql , servers, uses simpler storage.
• "market" game, own buildings in town. want building list of pop quickly, partition on owner column of building table, select single-partitioned. when buys building of else update owner column along price.
• jpl using simpledb store rover plan attributes. references kept full plan blob in s3.
• federal law enforcement agencies tracking americans in real-time using credit cards, loyalty cards , travel reservations.
• fraud detection comparing transactions known patterns in real-time.
• helping diagnose typology of tumors integrating history of every patient. in-memory database high update situations, web site displays everyone's "last active" time (for chat maybe). if users performing activity once every 30 sec, pretty @ limit 5000 simultaneous users. handling lower-frequency multi-partition queries using materialized views while continuing process high-frequency streaming data.
• priority queues.
• running calculations on cached data, using program friendly interface, without have go through orm.
• unique large dataset using simple key-value columns.
• keep querying fast, values can rolled-up different time slices.
• computing intersection of 2 massive sets, join slow.
• timeline ala twitter.
Comments
Post a Comment