100 amateur language processing knocks: 62

It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).

Chapter 7: Database

artist.json.gz is a file in the open music database MusicBrainz that is converted to JSON format and compressed in gzip format. In this file, information about one artist is stored in JSON format on one line. The outline of JSON format is as follows.

field Mold Contents Example
id Unique identifier integer 20660
gid Global identifier String "ecf9f3a3-35e9-4c58-acaa-e707fba45060"
name artist name String "Oasis"
sort_name Artist name (for dictionary order) String "Oasis"
area Place of activity String "United Kingdom"
aliases alias List of dictionary objects
aliases[].name alias String "oasis"
aliases[].sort_name Alias (for alignment) String "oasis"
begin Activity start date dictionary
begin.year Activity start year integer 1991
begin.month Activity start month integer
begin.date Activity start date integer
end Activity end date dictionary
end.year End of activity year integer 2009
end.month Activity end month integer 8
end.date Activity end date integer 28
tags tag List of dictionary objects
tags[].count Number of times tagged integer 1
tags[].value Tag content String "rock"
rating Rating Dictionary object
rating.count Rating votes integer 13
rating.value Rating value (average value) integer 86

Consider storing and retrieving artist.json.gz data in a key-value-store (KVS) and document-oriented database. Use LevelDB, Redis, Kyoto Cabinet, etc. as KVS. MongoDB was adopted as the document-oriented database, but CouchDB, RethinkDB, etc. may also be used.

62. Iterative processing in KVS

Find the number of artists whose activity location is "Japan" using the database constructed in> 60.

The finished code:

main.py


# coding: utf-8
import leveldb

fname_db = 'test_db'

#LevelDB open
db = leveldb.LevelDB(fname_db)

#value is'Japan'Enumerate
clue = 'Japan'.encode()
result = [value[0].decode() for value in db.RangeIter() if value[1] == clue]

#Number display
print('{}Case'.format(len(result)))

Execution result:

Execution result


22821

LevelDB enumeration

The registration contents were enumerated by acquiring the iterator with LevelDB.RangeIter ().

That's all for the 63rd knock. If you have any mistakes, I would appreciate it if you could point them out.


Recommended Posts

100 amateur language processing knocks: 41
100 amateur language processing knocks: 71
100 amateur language processing knocks: 56
100 amateur language processing knocks: 24
100 amateur language processing knocks: 59
100 amateur language processing knocks: 70
100 amateur language processing knocks: 62
100 amateur language processing knocks: 60
100 amateur language processing knocks: 92
100 amateur language processing knocks: 30
100 amateur language processing knocks: 06
100 amateur language processing knocks: 84
100 amateur language processing knocks: 81
100 amateur language processing knocks: 33
100 amateur language processing knocks: 46
100 amateur language processing knocks: 88
100 amateur language processing knocks: 89
100 amateur language processing knocks: 40
100 amateur language processing knocks: 45
100 amateur language processing knocks: 43
100 amateur language processing knocks: 55
100 amateur language processing knocks: 22
100 amateur language processing knocks: 61
100 amateur language processing knocks: 94
100 amateur language processing knocks: 54
100 amateur language processing knocks: 04
100 amateur language processing knocks: 63
100 amateur language processing knocks: 78
100 amateur language processing knocks: 12
100 amateur language processing knocks: 14
100 amateur language processing knocks: 08
100 amateur language processing knocks: 42
100 amateur language processing knocks: 19
100 amateur language processing knocks: 73
100 amateur language processing knocks: 75
100 amateur language processing knocks: 98
100 amateur language processing knocks: 83
100 amateur language processing knocks: 95
100 amateur language processing knocks: 32
100 amateur language processing knocks: 96
100 amateur language processing knocks: 87
100 amateur language processing knocks: 72
100 amateur language processing knocks: 79
100 amateur language processing knocks: 23
100 amateur language processing knocks: 05
100 amateur language processing knocks: 00
100 amateur language processing knocks: 02
100 amateur language processing knocks: 37
100 amateur language processing knocks: 21
100 amateur language processing knocks: 68
100 amateur language processing knocks: 11
100 amateur language processing knocks: 90
100 amateur language processing knocks: 74
100 amateur language processing knocks: 66
100 amateur language processing knocks: 28
100 amateur language processing knocks: 64
100 amateur language processing knocks: 34
100 amateur language processing knocks: 36
100 amateur language processing knocks: 77
100 amateur language processing knocks: 01
100 amateur language processing knocks: 16