When reading a csv file with read_csv of pandas, the first column becomes index

Overview

Introducing a case where I was addicted to the index in the first column when I tried to process the data downloaded by the in-house system with pandas

phenomenon

The data in question (of course not the actual data)

name,population,area
Osaka,2691k,223,
Nara,353k,276,
Kyoto,1472k,827,
Koube,1542k,552,
Wakayama,355k,208,

When you read_csv () the following data that seems to have no problem at first glance, the first column (name) is index.

Screen Shot 2020-01-29 at 23.24.55.png

Cause

The cause is that there is a "," at the end of each record, but there is no "," at the end of the Header line. If you try putting "," at the end of the header line. As shown below, an extra column is added, but the index is automatically calculated.

Screen Shot 2020-01-29 at 23.32.12.png

This sample is summarized in a csv file for easy understanding, but it took extra time because it was tsv (tab delimited) that was actually clogged up in the work.

Lessons and impressions

Let's see the data properly without drowning in the tool.

Even so, I feel that the chances of using Excel have decreased since I became able to edit data lightly with pandas. The data this time was also a tsv file of about 50M, but it could be read in a few seconds. (Excel has hung ...)

Recommended Posts

When reading a csv file with read_csv of pandas, the first column becomes index

Format the CSV file of "National Holiday" of the Cabinet Office with pandas

[Python] How to read a csv file (read_csv method of pandas module)

CRLF becomes LF when reading a Python file

Output the output result of sklearn.metrics.classification_report as a CSV file

[Python: UnicodeDecodeError] One of the error solutions when reading CSV

A collection of methods used when aggregating data with pandas

Process the contents of the file in order with a shell script

pandas Fetch the name of a column that contains a specific character

Manage the overlap when drawing scatter plots with a large amount of data (Matplotlib, Pandas, Datashader)

What to do when a part of the background image becomes transparent when the transparent image is combined with Pillow

The idea of feeding the config file with a python file instead of yaml

Read and format a csv file mixed with comma tabs with Python pandas

I made a mistake in fetching the hierarchy with MultiIndex of pandas

Error due to UnicodeDecodeError when reading CSV file with Python [For beginners]

Check the existence of the file with python

[pandas] .csv file reading and display method

Load csv with pandas and play with Index

Download Pandas DataFrame as a CSV file

Various ways to read the last line of a csv file in Python

[Introduction to Pandas] Read a csv file without a column name and give it a column name

The story of making a web application that records extensive reading with Django

Draw a line / scatter plot on the CSV file (2 columns) with python matplotlib

A memorandum of method often used when analyzing data with pandas (for beginners)

[Introduction to Python] How to get the index of data with a for statement

How to read a CSV file with Python 2/3

Speaking Japanese with gTTS (reading a text file)

I tried reading a CSV file using Python

Save the object to a file with pickle

Draw a graph with matplotlib from a csv file

Convert the character code of the file with Python3

Example of reading and writing CSV with Python

When a file is placed in the shared folder of Raspberry Pi, the process is executed.

[Ansible] Example of playbook that adds a character string to the first line of the file

[Shell art] Only when it is a multiple of 3 and a number with 3 becomes stupid

A memo of misunderstanding when trying to load the entire self-made module with Python3

Read the csv file with jupyter notebook and write the graph on top of it

[Caution] When creating a binary image (1bit / pixel), be aware of the file format!

When writing to a csv file with python, a story that I made a mistake and did not meet the delivery date

Specify the file name when sending the csv attached mail

I tried to touch the CSV file with Python

Be careful when differentiating the eigenvectors of a matrix

Be careful when reading data with pandas (specify dtype)

Make a note of the list of basic Pandas usage

How to output CSV of multi-line header with pandas

How to convert JSON file to CSV file with Python Pandas

Make a CSV formatting tool with Python Pandas PyInstaller

[Python] A memo to write CSV vertically with Pandas

Record of the first machine learning challenge with Keras

Type after reading an excel file with pandas read_excel

Calculate the product of matrices with a character expression?

[Memo] Load csv of s3 into pandas with boto3

The value of meta when specifying a function with no return value in Dask dataframe apply

When reading an image with SimpleITK, there is a problem if there is Japanese in the path

About the contents of wscript when building a D language environment like that with Waf

[Python] Extracts data frames that do not match a specific column with other data frames of Pandas