I've been touching GCP's Datastore little by little for the past six months.
Let's take a note of the miscellaneous feelings of touching the Datastore. I'd like to compare it with DynamoDB, but it's almost a characteristic of NoSQL. There weren't many articles that taught me about design guidelines and changes in ideas when RDB-> KVS, so I summarized them.
["NoSQL Guide for RDB Engineers"](http://www.amazon.co.jp/RDB%E6%8A%80%E8%A1%93%E8%80%85%E3%81%AE % E3% 81% 9F% E3% 82% 81% E3% 81% AENoSQL% E3% 82% AC% E3% 82% A4% E3% 83% 89-% E6% B8% A1% E9% 83% A8- % E5% BE% B9% E5% A4% AA% E9% 83% 8E / dp / 479804573X) I wonder if it is written in such a book to some extent, but I feel that Datastore was not mentioned. ..
By the way, I'm touching from GAE / py.
First, let's sort out the basic terms.
New basic knowledge of database Understanding Google's huge distributed data store Bigtable and Datastore (4/12) As mentioned in this article,
datastore | RDB |
---|---|
kind | table |
entity | record |
property | field |
It seems.
Here's a summary of what I thought about when designing a table. Basically, I think it's a common concept in Datastore or schemaless NoSQL.
Datastore does not have the concept of a table, but manages multiple kind entities in one place. So it looks like kind is acting like a table.
By the way, GCP called namespace? GAE? There is also a concept that has, which allows you to create an independent Datastore for the same project.
Multiple kind can update information at the same time in transaction by putting it in entity group. However, it seems that there is a restriction that only about 1 / sec can be put in one entity group.
The key entity get is very fast. You can only get property after getting entity. So, query can only get the list of keys, so if you issue a query normally, it seems that the contents are internally delayed.
There is a trade-off for integrity.
If it is an ordinary put that cannot be included in the entity group, the resulting consistency is guaranteed. This does not reflect the results immediately and some queries return old content for a while. (Convenience of node?) If you put it in the entity group, the update frequency will be limited, but strong consistency will be guaranteed, and you will be able to get new information immediately.
From the viewpoint of data management, it seems very strange, but when designing a datastore, it seems better to design objects with View, that is, how data is displayed and processed.
In other words, it is necessary to properly anticipate data acquisition / update use cases at the design stage. For example, do you want to get the user list or data? Such.
The reason is related to the denormalization described below, but the API takes more and more time when issuing a number of queries. It's bad for UX, and if you're using GAE, there's a one-minute limit. Therefore, it seems better to think that you should bring something to be displayed together as data in the first place. Let's throw away the design guidelines in RDB.
Unlike RDB, Datastore is almost impossible to process aggregates. Therefore, there were many articles that recommended such a technique, such as totaling, or having information that is known to be referred to in advance in all tables as much as possible.
I personally think this is the most important point. If there is a search or query, it will be fetched by query, but in the end, KVS (although it may be different in a strict sense) shows its true value in key-triggered fetch. Consistency at the time of renewal is also guaranteed if key acquisition. And as I noticed later, I can only get it with key in the transaction w
I haven't practiced this very much, but I sweat It is faster to get the key list and then get a certain number of entities instead of trying to get all the properties. If you only need the name, get the key with the get option and display it.
I searched for articles that could be helpful when designing data. All of them are old articles, but they seem to be helpful to some extent.
However, there is also workaround information, so that area may be unnecessary due to the update. In particular, if you put a part of the property information in the key and get the key list, you do not have to look at the contents of the entity, which seems a bit special.
It was easy to understand how the design method is different from RDB using SQL from the viewpoint of denormalization.
What is written:
Satoshi Nakajima's blog also wrote about Datastore. Denormalization is also recommended here, but it was also easy to understand how to use the entity group and the design policy. However, I got the impression that the problem of query speed and the high error rate have improved considerably since the time of this blog.
What is written:
There seems to be a way to split the entity for get and put. (I feel that kind is also different) This may not be very practical.
This is the official google blog last year, but it is very helpful for getting started.
Recommended Posts