This article is a memo of how to read and output a file for data analysis.
Read csv / tsv file with pandas (read_csv, read_table)
df = pd.read_csv('train.csv', sep = ',', na_values = '.', header = None)
#Tips read type Read_csv () to read csv files, read_table () to read tsv files (tab-separated)
#Tips Data delimiter For data delimiters that are neither commas nor tabs, you can specify the delimiter with the argument (sep or delimiter).
#Tips When there is no header in the read data By default, the first line of read data is treated as header. If there is no header in the read data, specify
header = None
.
#Tips When there is a header in the read data Explicitly specify the reading position of header with
header = 2
etc. It is not read before the specified location.
#Tips Read data type There are two ways to specify the data type when reading data. The first is when you specify
dtype = str
. This applies to all read data. The second is to specifydtype = {'b': str,'c': str}
in dictionary format.
#Tips Handling of missing values If you want to treat it as a missing value when reading data, you can treat it as a missing value by specifying
na_values = ["-",". "]
.
3 or later. You will be asked for an account to link with Colaboratory on another screen, so select it. After that, the ID will be issued, so copy the ID and paste it into the Colaboratory.
(*) If it is not added automatically, enter the following command
from google.colab import drive
drive.mount('/content/drive')
Specify the path in pd.reac_csv ()
as follows
data_fixed = pd.read_csv("/content/drive/My Drive/ColabNotebooks/XXX.csv")
Subsequent reading of CSV files is the same as 2. Reading CSV with jupyther-notebook
.
Use the following command to select the file to be uploaded locally.
from google.colab import files
uploaded = files.upload()
import io
df = pd.read_csv(io.StringIO(uploaded['XXX.csv'].decode('utf-8')))
XXX.csv is the uploaded CSV file.
The output format is as follows.
df.to_csv("/content/drive/My Drive/Colab Notebooks/XXX.csv")
df.to_csv('XXX.csv' , index=False)
files.download('XXX.csv')
Omission of #Tips index If you do not need the ʻindex
part when outputting data, specify ʻindex = False
. It's useful for me personally because I often don't need ʻindex` when committing with kaggle.
Download from #Tips colaboratory When downloading from colaboratory, you need to read the following.
from google.colab import files
Recommended Posts