When reading a csv file with read_csv of pandas, the first column becomes index

Overview

Introducing a case where I was addicted to the index in the first column when I tried to process the data downloaded by the in-house system with pandas

phenomenon

The data in question (of course not the actual data)

name,population,area
Osaka,2691k,223,
Nara,353k,276,
Kyoto,1472k,827,
Koube,1542k,552,
Wakayama,355k,208,

When you read_csv () the following data that seems to have no problem at first glance, the first column (name) is index.

Screen Shot 2020-01-29 at 23.24.55.png

Cause

The cause is that there is a "," at the end of each record, but there is no "," at the end of the Header line. If you try putting "," at the end of the header line. As shown below, an extra column is added, but the index is automatically calculated.

Screen Shot 2020-01-29 at 23.32.12.png

This sample is summarized in a csv file for easy understanding, but it took extra time because it was tsv (tab delimited) that was actually clogged up in the work.

Lessons and impressions

Let's see the data properly without drowning in the tool.

Even so, I feel that the chances of using Excel have decreased since I became able to edit data lightly with pandas. The data this time was also a tsv file of about 50M, but it could be read in a few seconds. (Excel has hung ...)

Recommended Posts

When reading a csv file with read_csv of pandas, the first column becomes index
Format the CSV file of "National Holiday" of the Cabinet Office with pandas
[Python] How to read a csv file (read_csv method of pandas module)
CRLF becomes LF when reading a Python file
Output the output result of sklearn.metrics.classification_report as a CSV file
[Python: UnicodeDecodeError] One of the error solutions when reading CSV
A collection of methods used when aggregating data with pandas
Process the contents of the file in order with a shell script
pandas Fetch the name of a column that contains a specific character
Manage the overlap when drawing scatter plots with a large amount of data (Matplotlib, Pandas, Datashader)
What to do when a part of the background image becomes transparent when the transparent image is combined with Pillow
The idea of feeding the config file with a python file instead of yaml
Read and format a csv file mixed with comma tabs with Python pandas
I made a mistake in fetching the hierarchy with MultiIndex of pandas
Error due to UnicodeDecodeError when reading CSV file with Python [For beginners]
Check the existence of the file with python
[pandas] .csv file reading and display method
Load csv with pandas and play with Index
Download Pandas DataFrame as a CSV file
Various ways to read the last line of a csv file in Python
[Introduction to Pandas] Read a csv file without a column name and give it a column name
The story of making a web application that records extensive reading with Django
Draw a line / scatter plot on the CSV file (2 columns) with python matplotlib
A memorandum of method often used when analyzing data with pandas (for beginners)
[Introduction to Python] How to get the index of data with a for statement
How to read a CSV file with Python 2/3
Speaking Japanese with gTTS (reading a text file)
I tried reading a CSV file using Python
Save the object to a file with pickle
Draw a graph with matplotlib from a csv file
Convert the character code of the file with Python3
Example of reading and writing CSV with Python
When a file is placed in the shared folder of Raspberry Pi, the process is executed.
[Ansible] Example of playbook that adds a character string to the first line of the file
[Shell art] Only when it is a multiple of 3 and a number with 3 becomes stupid
A memo of misunderstanding when trying to load the entire self-made module with Python3
Read the csv file with jupyter notebook and write the graph on top of it
[Caution] When creating a binary image (1bit / pixel), be aware of the file format!
When writing to a csv file with python, a story that I made a mistake and did not meet the delivery date
Specify the file name when sending the csv attached mail
I tried to touch the CSV file with Python
Be careful when differentiating the eigenvectors of a matrix
Be careful when reading data with pandas (specify dtype)
Make a note of the list of basic Pandas usage
How to output CSV of multi-line header with pandas
How to convert JSON file to CSV file with Python Pandas
Make a CSV formatting tool with Python Pandas PyInstaller
[Python] A memo to write CSV vertically with Pandas
Record of the first machine learning challenge with Keras
Type after reading an excel file with pandas read_excel
Calculate the product of matrices with a character expression?
[Memo] Load csv of s3 into pandas with boto3
The value of meta when specifying a function with no return value in Dask dataframe apply
When reading an image with SimpleITK, there is a problem if there is Japanese in the path
About the contents of wscript when building a D language environment like that with Waf
[Python] Extracts data frames that do not match a specific column with other data frames of Pandas