I've been touching GCP's Datastore little by little for the past six months.

Let's take a note of the miscellaneous feelings of touching the Datastore. I'd like to compare it with DynamoDB, but it's almost a characteristic of NoSQL. There weren't many articles that taught me about design guidelines and changes in ideas when RDB-> KVS, so I summarized them.

["NoSQL Guide for RDB Engineers"](http://www.amazon.co.jp/RDB%E6%8A%80%E8%A1%93%E8%80%85%E3%81%AE % E3% 81% 9F% E3% 82% 81% E3% 81% AENoSQL% E3% 82% AC% E3% 82% A4% E3% 83% 89-% E6% B8% A1% E9% 83% A8- % E5% BE% B9% E5% A4% AA% E9% 83% 8E / dp / 479804573X) I wonder if it is written in such a book to some extent, but I feel that Datastore was not mentioned. ..

By the way, I'm touching from GAE / py.

Correspondence with the concept of RDB

First, let's sort out the basic terms.

New basic knowledge of database Understanding Google's huge distributed data store Bigtable and Datastore (4/12) As mentioned in this article,

datastore	RDB
kind	table
entity	record
property	field

It seems.

Features of Datastore

Here's a summary of what I thought about when designing a table. Basically, I think it's a common concept in Datastore or schemaless NoSQL.

No table

Datastore does not have the concept of a table, but manages multiple kind entities in one place. So it looks like kind is acting like a table.

By the way, GCP called namespace? GAE? There is also a concept that has, which allows you to create an independent Datastore for the same project.

Multiple kind transactions

Multiple kind can update information at the same time in transaction by putting it in entity group. However, it seems that there is a restriction that only about 1 / sec can be put in one entity group.

Get using key is fast

The key entity get is very fast. You can only get property after getting entity. So, query can only get the list of keys, so if you issue a query normally, it seems that the contents are internally delayed.

Consistency

There is a trade-off for integrity.

If it is an ordinary put that cannot be included in the entity group, the resulting consistency is guaranteed. This does not reflect the results immediately and some queries return old content for a while. (Convenience of node?) If you put it in the entity group, the update frequency will be limited, but strong consistency will be guaranteed, and you will be able to get new information immediately.

Design notes

Start designing from view

From the viewpoint of data management, it seems very strange, but when designing a datastore, it seems better to design objects with View, that is, how data is displayed and processed.

In other words, it is necessary to properly anticipate data acquisition / update use cases at the design stage. For example, do you want to get the user list or data? Such.

The reason is related to the denormalization described below, but the API takes more and more time when issuing a number of queries. It's bad for UX, and if you're using GAE, there's a one-minute limit. Therefore, it seems better to think that you should bring something to be displayed together as data in the first place. Let's throw away the design guidelines in RDB.

Recommendation of denormalization

Unlike RDB, Datastore is almost impossible to process aggregates. Therefore, there were many articles that recommended such a technique, such as totaling, or having information that is known to be referred to in advance in all tables as much as possible.

Get as much as possible with key

I personally think this is the most important point. If there is a search or query, it will be fetched by query, but in the end, KVS (although it may be different in a strict sense) shows its true value in key-triggered fetch. Consistency at the time of renewal is also guaranteed if key acquisition. And as I noticed later, I can only get it with key in the transaction w

List acquisition is only key

I haven't practiced this very much, but I sweat It is faster to get the key list and then get a certain number of entities instead of trying to get all the properties. If you only need the name, get the key with the get option and display it.

Referenced articles

I searched for articles that could be helpful when designing data. All of them are old articles, but they seem to be helpful to some extent.

However, there is also workaround information, so that area may be unnecessary due to the update. In particular, if you put a part of the property information in the key and get the key list, you do not have to look at the contents of the entity, which seems a bit special.

ITPro cloud design design pattern [Google App Engine] Increase search speed by devising schema design

It was easy to understand how the design method is different from RDB using SQL from the viewpoint of denormalization.

What is written:

Denormalization for inclusion in display
Denormalization for inclusion in search criteria
Denormalization to use the aggregation function
Denormalization to speed up specific queries (create search kind)
Asynchronous update for consistency
A hack that makes query faster if the key contains information

Best Practices on Google App Engine, Part 1: Datastore

Satoshi Nakajima's blog also wrote about Datastore. Denormalization is also recommended here, but it was also easy to understand how to use the entity group and the design policy. However, I got the impression that the problem of query speed and the high error rate have improved considerably since the time of this blog.