A memo that I touched the Datastore with python

I've been touching GCP's Datastore little by little for the past six months.

Let's take a note of the miscellaneous feelings of touching the Datastore. I'd like to compare it with DynamoDB, but it's almost a characteristic of NoSQL. There weren't many articles that taught me about design guidelines and changes in ideas when RDB-> KVS, so I summarized them.

["NoSQL Guide for RDB Engineers"](http://www.amazon.co.jp/RDB%E6%8A%80%E8%A1%93%E8%80%85%E3%81%AE % E3% 81% 9F% E3% 82% 81% E3% 81% AENoSQL% E3% 82% AC% E3% 82% A4% E3% 83% 89-% E6% B8% A1% E9% 83% A8- % E5% BE% B9% E5% A4% AA% E9% 83% 8E / dp / 479804573X) I wonder if it is written in such a book to some extent, but I feel that Datastore was not mentioned. ..

By the way, I'm touching from GAE / py.

Correspondence with the concept of RDB

First, let's sort out the basic terms.

New basic knowledge of database Understanding Google's huge distributed data store Bigtable and Datastore (4/12) As mentioned in this article,

datastore RDB
kind table
entity record
property field

It seems.

Features of Datastore

Here's a summary of what I thought about when designing a table. Basically, I think it's a common concept in Datastore or schemaless NoSQL.

No table

Datastore does not have the concept of a table, but manages multiple kind entities in one place. So it looks like kind is acting like a table.

By the way, GCP called namespace? GAE? There is also a concept that has, which allows you to create an independent Datastore for the same project.

Multiple kind transactions

Multiple kind can update information at the same time in transaction by putting it in entity group. However, it seems that there is a restriction that only about 1 / sec can be put in one entity group.

Get using key is fast

The key entity get is very fast. You can only get property after getting entity. So, query can only get the list of keys, so if you issue a query normally, it seems that the contents are internally delayed.

Consistency

There is a trade-off for integrity.

If it is an ordinary put that cannot be included in the entity group, the resulting consistency is guaranteed. This does not reflect the results immediately and some queries return old content for a while. (Convenience of node?) If you put it in the entity group, the update frequency will be limited, but strong consistency will be guaranteed, and you will be able to get new information immediately.

Design notes

Start designing from view

From the viewpoint of data management, it seems very strange, but when designing a datastore, it seems better to design objects with View, that is, how data is displayed and processed.

In other words, it is necessary to properly anticipate data acquisition / update use cases at the design stage. For example, do you want to get the user list or data? Such.

The reason is related to the denormalization described below, but the API takes more and more time when issuing a number of queries. It's bad for UX, and if you're using GAE, there's a one-minute limit. Therefore, it seems better to think that you should bring something to be displayed together as data in the first place. Let's throw away the design guidelines in RDB.

Recommendation of denormalization

Unlike RDB, Datastore is almost impossible to process aggregates. Therefore, there were many articles that recommended such a technique, such as totaling, or having information that is known to be referred to in advance in all tables as much as possible.

Get as much as possible with key

I personally think this is the most important point. If there is a search or query, it will be fetched by query, but in the end, KVS (although it may be different in a strict sense) shows its true value in key-triggered fetch. Consistency at the time of renewal is also guaranteed if key acquisition. And as I noticed later, I can only get it with key in the transaction w

List acquisition is only key

I haven't practiced this very much, but I sweat It is faster to get the key list and then get a certain number of entities instead of trying to get all the properties. If you only need the name, get the key with the get option and display it.

Referenced articles

I searched for articles that could be helpful when designing data. All of them are old articles, but they seem to be helpful to some extent.

However, there is also workaround information, so that area may be unnecessary due to the update. In particular, if you put a part of the property information in the key and get the key list, you do not have to look at the contents of the entity, which seems a bit special.

ITPro cloud design design pattern [Google App Engine] Increase search speed by devising schema design

It was easy to understand how the design method is different from RDB using SQL from the viewpoint of denormalization.

What is written:

Best Practices on Google App Engine, Part 1: Datastore

Satoshi Nakajima's blog also wrote about Datastore. Denormalization is also recommended here, but it was also easy to understand how to use the entity group and the design policy. However, I got the impression that the problem of query speed and the high error rate have improved considerably since the time of this blog.

What is written:

[gae] Divide the entity into two --How BuddyPoke Scales on Facebook Using Google App Engine

There seems to be a way to split the entity for get and put. (I feel that kind is also different) This may not be very practical.

Concept of data organization in Google Cloud Datastore

This is the official google blog last year, but it is very helpful for getting started.

Recommended Posts

A memo that I touched the Datastore with python
[Python] A memo that I tried to get started with asyncio
A memo that I wrote a quicksort in Python
I made a fortune with Python.
I liked the tweet with python. ..
I made a daemon with Python
A memo that reads data from dashDB with Python & Spark
I replaced the Windows PowerShell cookbook with a python script.
[Python] A program that creates stairs with #
I made a package that can compare morphological analyzers with Python
I want to use a wildcard that I want to shell with Python remove
I made a character counter with Python
I drew a heatmap with seaborn [Python]
A memo with Python2.7 and Python3 on CentOS
Search the maze with the python A * algorithm
I tried a functional language with Python
I made a shuffle that can be reset (reverted) with Python
What I did with a Python array
I wanted to solve the ABC164 A ~ D problem with Python
I made a Hex map with Python
A typed world that begins with Python
I made a roguelike game with Python
I made a program that automatically calculates the zodiac with tkinter
[Python] A program that rounds the score
The story of making a module that skips mail with python
I made a simple blackjack with Python
I made a configuration file with Python
I made a neuron simulator with Python
A story that didn't work when I tried to log in with the Python requests module
A memo organized by renaming the file names in the folder with python
A memo that allows you to change Pineapple's Python environment with pyenv
Extract lines that match the conditions from a text file with python
I get a Python No module named'encodings' error with the aws command
[Python, ObsPy] I drew a beach ball on the map with Cartopy + ObsPy.
I made a tool that makes decompression a little easier with CLI (Python3)
I made a module PyNanaco that can charge nanaco credit with python
A model that identifies the guitar with fast.ai
I tried "smoothing" the image with Python + OpenCV
[Python] Get the files in a folder with Python
I made a weather forecast bot-like with Python.
I made a GUI application with Python + PyQt5
I tried "differentiating" the image with Python + OpenCV
A memo that made a graph animated with plotly
I made a Twitter fujoshi blocker with Python ①
I want to make a game with Python
Create a page that loads infinitely with python
[Python] I made a Youtube Downloader with Tkinter.
[Memo] I tried a pivot table in Python
I tried "binarizing" the image with Python + OpenCV
A memo when creating a python environment with miniconda
I touched some of the new features of Python 3.8 ①
I want to write to a file with Python
I made a bin picking game with Python
I made a Mattermost bot with Python (+ Flask)
I took a quick look at the fractions package that handles Python built-in fractions.
I registered PyQCheck, a library that can perform QuickCheck with Python, in PyPI.
A story that I was addicted to when I made SFTP communication with python
I learned Python with a beautiful girl at Paiza # 02
I made a Twitter BOT with GAE (python) (with a reference)
[Python] A program that counts the number of valleys
[Trainer's Recipe] I touched the flame of the Python framework.