Below is the flow of this article. __1. Enable download directly from Google Drive shared link __ __2. Download data using the above URL with python or wget, curl (on CLI) __ __3. Precautions when executing with Google Coloaboratory __
If you create a file sharing link on Google Drive, you will have to jump to the following page and download it manually.
Then click the URL and convert the URL so that you can download it directly. There are URL conversion tools, etc., but you can download directly from the URL simply by rewriting the URL as follows.
file/d
->uc?id=
oruc?export=download&id=
/view?usp=sharing
->
https://drive.google.com/file/d/<file_id>/view?usp=sharing
↓
https://drive.google.com/uc?id=<file_id>
or
https://drive.google.com/uc?export=download&id=<file_id>
Code to download with urlretrieve, wget, curl by specifying the URL converted earlier Python
import urllib.request
import sys
url = "https://drive.google.com/uc?export=download&id=<file_id>"
file_name = "file_name"
urllib.request.urlretrieve(url, file_name)
Shell
wget "https://drive.google.com/uc?export=download&id=<FILE_ID>" -O <FILE_NAME>
or
curl "https://drive.google.com/uc?export=download&id=<FILE_ID>" -O <FILE_NAME>
If the file size is too large as shown below, virus scanning will not be performed, so confirmation will be required when downloading, and when the above code is executed, the html file of the confirmation page itself will be downloaded.
To avoid that, you need to get the code for confirm. You can get it with the following code.
curl -sc /tmp/cookie "https://drive.google.com/uc?export=download&id=<FILE_ID>" > /dev/null
CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"
curl -Lb /tmp/cookie "https://drive.google.com/uc?export=download&confirm=${CODE}&id=<FILE_ID>" -o <FILE_NAME>
I just run the code above in a cell, but just adding a !
At the beginning of the line doesn't save the variable, as shown below. Therefore, you can execute it like a shell script by writing %% shell at the beginning of the cell.
Add %% shell to the beginning of the code above
To be honest, I don't really feel the need to do this because I only have to put it on my Drive from the share link, but lol I can not put a large amount of data on github, so when sharing a notebook file of Google Colaboratory etc. I wondered if there is an advantage that you only have to execute the cell immediately after cloning by writing it in advance in.
Download published Google Drive data with curl or wget Download files on the Web with Python
Recommended Posts