I've researched and discussed various things when I wanted to access the Datastore from Compute Engine, so I'll keep a record of it.
For the time being, there are roughly two options:
Even though there is Cloud Datastore API, why not put the App Engine in between? It seems to be thought. However, due to circumstances such as the Cloud Datastore API is not good and the access from App Engine to Datastore is fast, after all, App Engine is in between. It was better to pinch it or something like that.
But that is also an old story. I don't know what's going on now! So I tried various things.
I tried to compare by a simple process of fetching one Entity by specifying the Key. Since it is not measured accurately, please think that each value may deviate by about 20 to 30 ms.
When I tried it from my Macbook Pro as a starting point, the results were as follows.
Cloud Datastore API | Via App Engine | Via App Engine(With memcache) | |
---|---|---|---|
Local | About 1000 ms | About 200 ms | About 170 ms |
At this point, I was completely despaired that the Cloud Datastore API was no longer good, but my senior at the company next to me said "I'm sorry!", So I went to the US region [^ 1] to Compute. I built an instance of Engine and tried my best. The result is here.
[^ 1]: Datastore lives in the US and EU, so let's build an instance nearby
Cloud Datastore API | Via App Engine | Via App Engine(With memcache) | |
---|---|---|---|
GCE (US) | 50~About 200 ms | 45~About 50 ms | 15~About 20 ms |
The time is short and it's getting harder to measure, so it's a little sloppy, but I feel that the speed of the Cloud Datastore API has somehow become acceptable.
However, the speed of Cloud Datastore API is not stable, and when it is fast, it is as fast as via App Engine, but when it is slow, it may take about 200 ms, so I do not understand the reason for that. It was. Also, when using App Engine, memcache is fast because it saves almost all the time (about 30 ms) to access the Datastore from App Engine.
If you have a lot of cacheable requests, memcache can do a lot of work. It took about 30ms to get the data from App Engine, but it took only 2ms to get it from the cache. It is also advantageous that there is no read operation fee.
If you want to batch process a large amount of data in parallel, it will cost you a little because you will grow an instance of App Engine.
It's easy to implement, and of course there's no cost for an App Engine instance. Accessing from Compute Engine may be as fast as going through App Engine.
There is no Memcache. gcloud-py Documentation is nothing enough to think that it is the beginning of the universe [^ 2].
[^ 2]: If you look closely, most of the items are links to the source code.
If you just think about speed, it's a little faster to go through App Engine for now, but in some situations it seems that hitting the Cloud Datastore API directly from Compute Engine is also an option. ..
If memcache does not work such as inputting data in batch processing, it seems easier to implement and less expensive to hit Cloud Datastore directly.
Installation of gcloud-python is basically
pip install gcloud
That's fine, but there seems to be a newer one on the master branch on GitHub.
Looking at code in master branch, if there is gRPC like this, use it A new description like this was added [^ 3].
[^ 3]: As of 2016-08-28, this feature was not implemented in v0.18.1, which comes in with pip.
so
pip install git+https://github.com/GoogleCloudPlatform/gcloud-python
I also tried gcloud-python, which was installed directly from the master branch on GitHub, and found that it was up to 20 ms faster, and it was a pretty good match with via App Engine.
I felt that the speed was not very stable as usual, but I feel that I am trying hard to improve the performance, so I am expecting it.
Recommended Posts