Search This Blog

Friday, August 12, 2011

LevelDB - Fast and Lightweight Key/Value Database From the Authors of MapReduce and BigTable

LevelDB - Fast and Lightweight Key/Value Database From the Authors of MapReduce and BigTable: "


LevelDB is an exciting new entrant into the pantheon of embedded databases, notable both for its pedigree, being authored by the makers of the now mythical Google MapReduce and BigTable products, and for its emphasis on efficient disk based random access using log-structured-merge (LSM) trees. 


The plan is to keep LevelDB fairly low-level. The intention is that it will be a useful building block for higher-level storage systems. Basho is already investigating using LevelDB as one if its storage engines.


In the past many systems were built around embedded databases, though most developers now use database servers connected to via RPCs. An embedded database is a database distributed as a library and linked directly into your application. The application is responsible for providing a service level API, sharding, backups, initiating consistency checking, initiation rollback,  startup, shutdown, queries, etc. Applications become the container for the database and the manager of the database.


Architectures using embedded databases typically never expose a raw database abstraction at all. They have a service API and the services use the embedded database library call transparently behind the scene. Often an embedded database will provide multiple access types, like indexed access for key-value uses and btrees for range queries and cursors. 


BerkelyDB is one well known example of an embedded database, SQLite is another, the file system is perhaps the most commonly used database, and there have been many many other btree libraries in common use. I've used C-tree on several projects. In a battle of old versus new, a user named IM46 compared Leveldb to BerkelyDB and found that LevelDB solidly outperforms Berkeley DB for larger databases.


Programmers usually thought doing this stuff was easy, wrote their own failed on-disk btree library (raises hand), and then look around for a proven product. It's only relatively recently the databases have gone up market and included a network layer and higher level services.


Building a hybrid application/database architecture is still a very viable option when you want everything to be just so. If you are going to load balance requests across sharded application servers anyway, using a heavy weight external database infrastructure may not be necessary.


The LevelDB mailing list started off very active and has died down a bit, but is still nicely active and informative. Here are some excellent FAQish tips, performance suggestions, and porting issues extracted from the list:



"

No comments:

Post a Comment