Cloud Datalab hasn't been hit by the sun, although it looks good, but Google Cloud NEXT According to the timing of .com /), BETA changed to GA on March 8, 2017, and v1.0 was released. Was there.
It should be pretty amazing, but it hasn't received much attention yet, so I'll try to convey the appeal of Datalab little by little. (Maybe I will write the article in about 3 times.) First, I will give you an overview of Cloud Datalab.
--Interactive analysis environment for data analysis, visualization, and machine learning on GCP --Since it is developed based on Jupyter, which is popular with data analysts, it is a nice tool for users who are already using Jupyter to make a smooth transition. --The advantage of running on GCP is that it is integrated with BigQuery, GCS, and CloudML Engine, so you can seamlessly touch large data. --Datalab itself is published on github as a Docker image on the assumption that it will run on GCE.
--There is no particular cost for Datalab itself --However, since Datalab is supposed to run on GCE, you will incur costs for the GCE you use. --In addition, you will be charged only for the cost of GCP components. --BigQuery or GCS --In addition, by default, a disk is created for persistence and data is also retained in GCS for backup, so costs for that area will be incurred by default.
It's almost like Cloud Datalab's Quick Start.
Assuming that Google Cloud SDK is installed, get additional datalab command
$ gcloud components install datalab
If you set up a project or zone, you don't have to add command options, so it's easy. As mentioned above, since it runs on GCE, make settings related to GCE.
$ gcloud config set core/project ${PROJECT_ID}
$ gcloud config set compute/zone ${ZONE}
$ datalab create ${INSTANCE_NAME}
This will launch an instance for Datalab, create a nice network and configure it, launch a browser and connect to Datalab. It's easy.
It's a screen that people who have used Jupyter can understand what to do.
Sign in with your account from your browser. That's because Datalab uses a service account to use other GCP services. It's easy because it's from the GUI.
If you are familiar with Jupyter, you can analyze it as you like. Even if you are not familiar with it, Nice README is included, so if you follow it, how to use notebook, BigQuery And I think you can understand the cooperation with GCS. For the time being, it's easy because you can see the page here when you launch datalab.
The good thing about the cloud is that you can say goodbye if you use it as much as you want. Let's say goodbye.
$ datalab delete ${INSTANCE_NAME}
It's also easy to say goodbye. However, if you don't want to be charged anymore because this is just deleting the instance, you can delete the default disk (notebook content itself is mounted here) or Don't forget to delete the backup GCS.
――It's nice to be able to build a Jupyter environment very easily. ――It seems better to be integrated with other cloud services than to run Jupyter alone. ――It seems that there are merits unique to the cloud, so it seems good to dig a little deeper.
Next time, we'll take a closer look at Datalab itself.