0. Contents of this article

This article is a memo of how to read and output a file for data analysis.

1. 1. Reference site

Read csv / tsv file with pandas (read_csv, read_table)

2. Read CSV with jupyther-notebook

df = pd.read_csv('train.csv', sep = ',', na_values = '.', header = None)

#Tips read type Read_csv () to read csv files, read_table () to read tsv files (tab-separated)

#Tips Data delimiter For data delimiters that are neither commas nor tabs, you can specify the delimiter with the argument (sep or delimiter).

#Tips When there is no header in the read data By default, the first line of read data is treated as header. If there is no header in the read data, specify header = None.

#Tips When there is a header in the read data Explicitly specify the reading position of header with header = 2 etc. It is not read before the specified location.

#Tips Read data type There are two ways to specify the data type when reading data. The first is when you specify dtype = str. This applies to all read data. The second is to specify dtype = {'b': str,'c': str} in dictionary format.

#Tips Handling of missing values If you want to treat it as a missing value when reading data, you can treat it as a missing value by specifying na_values = ["-",". "].

3. Read CSV with Google Colaboratory

1. Icon click
Mount Drive selection
1. This part is added automatically (*)

3 or later. You will be asked for an account to link with Colaboratory on another screen, so select it. After that, the ID will be issued, so copy the ID and paste it into the Colaboratory.

(*) If it is not added automatically, enter the following command

from google.colab import drive
drive.mount('/content/drive')

Specify the path in pd.reac_csv () as follows

data_fixed = pd.read_csv("/content/drive/My Drive/ColabNotebooks/XXX.csv")

Subsequent reading of CSV files is the same as 2. Reading CSV with jupyther-notebook.

3.2 Upload from local

Use the following command to select the file to be uploaded locally.

from google.colab import files
uploaded = files.upload()

import io
df = pd.read_csv(io.StringIO(uploaded['XXX.csv'].decode('utf-8')))

XXX.csv is the uploaded CSV file.

Four. CSV file output to Google Colaboratory

The output format is as follows.

df.to_csv("/content/drive/My Drive/Colab Notebooks/XXX.csv")

Five. Download locally via browser (common to Google Colaboratory and Jupyter-notebook)

df.to_csv('XXX.csv' , index=False)
files.download('XXX.csv')

Omission of #Tips index If you do not need the ʻindex part when outputting data, specify ʻindex = False. It's useful for me personally because I often don't need ʻindex` when committing with kaggle.

Download from #Tips colaboratory When downloading from colaboratory, you need to read the following.

from google.colab import files

Reading pandas format file