Data analysis practice content "Data Science 100 Knock (Structured Data Processing)" has been released by the Data Scientist Association. Since it requires Docker operation to move it, I will leave a method to move it with Colaboratory for those who want to see it for the first time easily.
First, create a suitable notebook and open Colaboratory. After opening, execute the following command to download the data on Google Drive.
from google.colab import drive
drive.mount('/content/drive')
!git clone https://github.com/The-Japan-DataScientist-Society/100knocks-preprocess.git 'drive/My Drive/100knocks-preprocess'
If you are mounting the drive for the first time, you will see the following display below the cell you executed. Click the URL to grant access to Google Colaboratory's Drive. At the end, the message "Please copy this code, switch to the application and paste it." Is displayed. Paste the copied code into the "Enter your authorization code:" field above and execute it. If you go back to My Drive, you will see a folder called "100 knocks-preprocess". If all goes well, I won't use this notebook anymore.
The notebook file is stored in the following directory. Let's open preprocess_knock_Python.ipynb in Google Colabatory.
If you execute the first cell as it is, an error will occur, so if you only import the library, let's load the data with the following code
def get_df(filename):
path = 'drive/My Drive/100knocks-preprocess/docker/work/data'
return pd.read_csv(os.path.join(path, filename))
df_customer = get_df('customer.csv')
df_category = get_df('category.csv')
df_geocode = get_df('geocode.csv')
df_product = get_df('product.csv')
df_receipt = get_df('receipt.csv')
df_store = get_df('store.csv')
By the way, there is a pdf file that explains the aim of this content under the following folder, so it seems good to read it before working on it.
100knocks-preprocess/docker/dock
Now you are ready If you run it after a while, you may lose the connection with Drive. (Maybe ...) In that case, execute the following code again, or mount the drive from the sidebar and read the data again.
from google.colab import drive
drive.mount('/content/drive')
As I wrote this article, building an environment with Docker is not so difficult, and it is often useful if you can do it, so I think it is good to take this opportunity to challenge. The article here seems to be good for how to build on Mac. If you can create an environment, you can practice SQL!
Recommended Posts