Read and format a csv file mixed with comma tabs with Python pandas

I had a hard time handling the csv file with tabs and commas, so make a note of the correspondence at that time. When I opened the csv file with a text editor, it looked like this:

txt


0.2,	1.01,	0.60,	-0.68
0.4,	1.00,	0.67,	-0.69
0.6,	1.01,	0.61,	-0.72

First, read the file with read_csv of pandas. The file name is sample.csv and the file exists in the same directory.

In


import pandas as pd

df = pd.read_csv_("sample.csv",header = None)

#Check the contents of DataFrame
print(df)
print(df.dtypes)

here

df = pd.read_csv_("sample.csv",sep="\t", header = None)


 I should have done it, but because I didn't do that, the data with mixed tabs was read so that it would be in the first column.


#### **`Out`**
```ruby

	0
0	0.2\t1.01\t0.60\t-0.68
1	0.4\t1.00\t0.67\t-0.69
2	0.6\t1.01\t0.61\t-0.72


0    object
dtype: object

Do the following to separate them into separate columns separated by tabs.

In


df = df[0].apply(lambda x: pd.Series(x.split('\t')))

#Check the contents of DataFrame
print(df)
print(df.dtypes)

Out


	0		1		2		3
0	0.2		1.01	0.60	-0.68
1	0.4		1.00	0.67	-0.69
2	0.6		1.01	0.61	-0.72


0    object
1    object
2    object
3    object
dtype: object

Since I want to calculate after this, all are converted to float type.

Get the number of columns of DataFrame with df.shape [1], create a list of the number of columns with the range function, and turn the for statement.

In


for i in range(df.shape[1]):
    df[i] = df[i].astype(float)

#Check the contents of DataFrame
print(df.dtypes)

Out


0    float64
1    float64
2    float64
3    float64
dtype: object

That's it.


** Supplement 1 If you want to read the files placed under the folder, you can do as follows.

In


import pandas as pd
import os

#You need to change the directory to the folder that contains the files.
os.chdir("./Folder name")

#Get a list of files.
file = os.listdir("./")

# file[0]Get the file name with (assuming that only one file exists)
df = pd.read_csv(file[0],header=None)

** Supplement 2 It seems that the astype argument can also be a dictionary type. It seems to be convenient when converting to a different type.

In


df.astype({'a': int, 'c': str}).dtypes

** Supplement 3 To enter the column name, to add the column, execute the following.

In


from pandas import DataFrame

#Insert column name
df.columns=['a','b','c','d']

#Add column by specifying column name
df = DataFrame(df, columns=['a','b','c','d','e'])

** 20170410 postscript

As pointed out in the comment, I tried the method of using "sep =" \ t "" as the argument of read_csv, but it was read with commas.

Out


     0       1       2        3
0    0.2,    1.01,   0.60,   -0.68
1    0.4,    1.00,   0.67,   -0.69
2    0.6,    1.01,   0.61,   -0.72

However, when I deleted the description of "sep =" \ t "" and performed read_csv, it read it as a float type normally without split or astype processing.

I'm running with pycharm, but if I reset the argument of sep and try again, will the tool automatically determine it? I do not understand.

In


import pandas as pd
import os ##For reading files

#Read the csv file under the sample folder
os.chdir("./sample")
file = os.listdir("./")

df = pd.read_csv(file[0],header=None)

os.chdir("../")

#Check the contents of DataFrame
print(df)
print(df.dtypes)

Out


     0      1      2      3
0    0.2    1.01   0.60   -0.68
1    0.4    1.00   0.67   -0.69
2    0.6    1.01   0.61   -0.72

0    float64
1    float64
2    float64
3    float64
dtype: object

So far, four types of events have been confirmed, and the cause is unknown.

Read mixed comma tab data using Pycharm_When reading with csv
① When it is read correctly with float without doing anything
(2) When it is read with tabs mixed if nothing is done
    ③sep="\t"When is correctly read by float when is specified
    ④sep="\t"When is read with commas when is specified

Recommended Posts

Read and format a csv file mixed with comma tabs with Python pandas
How to read a CSV file with Python 2/3
Read csv with python pandas
Read json file with Python, format it, and output json
[Python] Read the csv file and display the figure with matplotlib
Read CSV file with python (Download & parse CSV file)
Read CSV and analyze with Pandas and Seaborn
[Python] Read a csv file with a large data size using a generator
Read Python csv data with Pandas ⇒ Graph with Matplotlib
Read JSON with Python and output as CSV
[Python] How to read excel file with pandas
Read CSV file: pandas
[Python] How to read a csv file (read_csv method of pandas module)
Create a Photoshop format file (.psd) with python
Read CSV file with Python and convert it to DataFrame as it is
Read line by line from a file with Python
Read Python csv file
[Introduction to Pandas] Read a csv file without a column name and give it a column name
Read and analyze arff format dataset with python scipy.io
How to convert JSON file to CSV file with Python Pandas
Make a CSV formatting tool with Python Pandas PyInstaller
[Python] A memo to write CSV vertically with Pandas
Read and write csv file
Read and write a file
Write and read a file
Download csv file with python
[Python] Read Japanese csv with pandas without garbled characters (and extract columns written in Japanese)
How to read an Excel file (.xlsx) with Pandas [Python]
[Python] Write to csv file with Python
Output to csv file with Python
Python CSV file reading and writing
Reading and writing CSV with Python
[Python] Format when to_csv with pandas
Format the CSV file of "National Holiday" of the Cabinet Office with pandas
Read a file in Python with a relative path from the program
[ROS2] How to play a bag file with python format launch
Get OCTA simulation conditions from a file and save with pandas
Create and return a CP932 CSV file for Excel with Chalice
Creating a simple PowerPoint file with Python
A memo with Python2.7 and Python3 on CentOS
Let's read the RINEX file with Python ①
Read and write csv files with numpy
Read Python csv and export to txt
[Automation] Read a Word document with Python
Read a character data file with numpy
[pandas] .csv file reading and display method
Load csv with pandas and play with Index
[python] Read html file and practice scraping
Download Pandas DataFrame as a CSV file
I made a configuration file with Python
[Automation] Read mail (msg file) with Python
Read a Python # .txt file for a super beginner in Python with a working .py
Read the file with python and delete the line breaks [Notes on reading the file]
Read a file containing garbled lines in Python
Building a python environment with virtualenv and direnv
Divide each PowerPoint slide into a JPG file and output it with python
Python --Read data from a numeric data file and find the multiple regression line.
A Python script that reads a SQL file, executes BigQuery and saves the csv
Read table data in PDF file with Python
I tried reading a CSV file using Python
Launch a web server with Python and Flask