It is a key-value store of the type saved in a file, and is a library made by Google that can read and write the string of key => value at high speed. Here in Python, I will introduce a tutorial to try using LevelDB with python using a library called plyvel. (Although there is a Python LevelDB library other than plyvel, it seems that plyvel looks good after using it.)
Berkeley DB is famous for this kind of thing in the old days, and Kyoto Cabinet created by Mr. Hirabayashi of Japan for the modern one. There are memcached and Redis in the server type, but LevelDB is not a server type but a library that operates local files. If you want to share and save key-value data from many server processes, memcached, Kyoto Tycoon, Redis, etc. will be more suitable.
Python has dict
as a data structure that manages by associating values with keys, but it is a convenient library when you want to handle a large amount of data that does not survive in memory.
--LevelDB is a key-value store that manages a key, which is a byte string, by associating it with a value, which is a byte string. You can look up value from key at high speed.
--The LevelDB database is a directory.
--The LevelDB database can only be opened by at most one process at a time. (An error will occur if another process is already open)
--Since it is not a server, it does not operate key-value in communication. (A library that manipulates local files.)
--plyvel is implemented in a C extension that uses the native leveldb library (hence fast but requires library installation and compilation)
--Binary values will be given to LevelDB keys and values.
--If the value you want to save is not string
, you need to serialize it in some way, such as pickle or msgpack.
--The value of python's ʻunicode`` type needs to be encoded in binary with `` val.encode ('utf-8')
etc.
First, you need the native library of leveldb, so install it. If you are using Homebrew on Mac, it is easier to use brew.
$ brew install leveldb
There seems to be a Makefile
on other platforms, so I think you can install it with make
and sudo make install
.
Let's install plyvel
. I'm not sure why it's spelled like this, but it's plyvel anyway.
$ pip install plyvel
If Python.h is not in the include path of the system, the extension library cannot be compiled and an error may occur. For Linux, it's a good idea to check if a package such as python-devel
is included.
Note that the LevelDB database is a directory.
import plyvel
my_db = plyvel.DB('/tmp/test.ldb', create_if_missing=True) #If not, make
my_db.close()
In case of create_if_missing = False
, an exception will be thrown if the DB file does not exist.
LevelDB can only associate a byte string with a byte string, so the basics are as follows.
my_db.put('key1', 'value1')
my_db.put(u'Hoge'.encode('utf-8'), u'Hogeバリュー'.encode('utf-8')) #Byte string when using unicode
If you want to link a complex data structure to a key and save it, try using the pickle or MessagePack below.
Just do get
.
value1 = my_db.get('key1')
value2 = my_db.get(u'Unicode key is encoded in byte string'.encode('utf-8'))
Delete with the delete
method.
my_db.delete('key1')
my_db.delete(u'Unicode key is encoded in byte string'.encode('utf-8'))
There are times when you want to register a lot of keys and values and finally put them all out in CSV format. Of course, LevelDB can also retrieve the stored keys and values with an iterator.
my_db.put('key1', '1')
my_db.put('key2', '1')
my_db.put('key3', '3')
for key, value in my_db:
print '%s => %s' % (key, value)
#output:
# key1 => 1
# key2 => 2
# key3 => 3
It seems that you can also specify the key range or specify the key prefix to retrieve it. See the iterators section of the plyvel documentation (https://plyvel.readthedocs.org/en/latest/user.html#iterators) for more information.
As I wrote at the beginning, value must be a byte string, so if you want to associate structured data such as dict and list / with key and save it in LevelDB, you need to serialize it. Python has a standard library for serialization called pickle, so it's easy to use. Pickle is very powerful because it can serialize not only basic data types but also functions and objects. However, it is difficult to deserialize pickle data (return to the original data structure) in languages other than Python, so if you want to divert the data to other languages, you should serialize it in the Message Pack format described later.
fukuzatsu1 = dict(a=10, b=20, c=[123, 234, 456])
my_db.put('key1', fukuzatsu1) #Get an error
import pickle
serialized1 = pickle.dumps(fukuzatsu1)
my_db.put('key1', serialized1) # OK
#When using value
serialized1 = my_db.get('key1')
fukuzatsu1 = pickle.loads(serialized1)
print fukuzatsu1['a'] # => 10
For details on how to use pickle, refer to pickle documentation.
In addition to pickle, we recommend MessagePack, a serializing format from Japan. MessagePack is characterized by its compact and high-speed serialization of data. When using MessagePack with Python, install a package called msgpack-python.
$ pip install msgpack-python
msgpack uses packb
/ `ʻunpackbinstead of pickle's
dumps/
loads``.
fukuzatsu1 = dict(a=10, b=20, c=[123, 234, 456])
import msgpack
serialized1 = msgpack.packb(fukuzatsu1, encoding='utf-8')
my_db.put('key1', serialized1)
#When using value
serialized1 = my_db.get('key1')
fukuzatsu1 = msgpack.unpackb(serialized1, encoding='utf-8')
print fukuzatsu1['a'] # => 10
See the msgpack-python API documentation for more information.
Recommended Posts