I had a hard time handling the csv file with tabs and commas, so make a note of the correspondence at that time. When I opened the csv file with a text editor, it looked like this:
txt
0.2, 1.01, 0.60, -0.68
0.4, 1.00, 0.67, -0.69
0.6, 1.01, 0.61, -0.72
First, read the file with read_csv of pandas. The file name is sample.csv and the file exists in the same directory.
In
import pandas as pd
df = pd.read_csv_("sample.csv",header = None)
#Check the contents of DataFrame
print(df)
print(df.dtypes)
here
df = pd.read_csv_("sample.csv",sep="\t", header = None)
I should have done it, but because I didn't do that, the data with mixed tabs was read so that it would be in the first column.
#### **`Out`**
```ruby
0
0 0.2\t1.01\t0.60\t-0.68
1 0.4\t1.00\t0.67\t-0.69
2 0.6\t1.01\t0.61\t-0.72
0 object
dtype: object
Do the following to separate them into separate columns separated by tabs.
In
df = df[0].apply(lambda x: pd.Series(x.split('\t')))
#Check the contents of DataFrame
print(df)
print(df.dtypes)
Out
0 1 2 3
0 0.2 1.01 0.60 -0.68
1 0.4 1.00 0.67 -0.69
2 0.6 1.01 0.61 -0.72
0 object
1 object
2 object
3 object
dtype: object
Since I want to calculate after this, all are converted to float type.
Get the number of columns of DataFrame with df.shape [1], create a list of the number of columns with the range function, and turn the for statement.
In
for i in range(df.shape[1]):
df[i] = df[i].astype(float)
#Check the contents of DataFrame
print(df.dtypes)
Out
0 float64
1 float64
2 float64
3 float64
dtype: object
That's it.
** Supplement 1 If you want to read the files placed under the folder, you can do as follows.
In
import pandas as pd
import os
#You need to change the directory to the folder that contains the files.
os.chdir("./Folder name")
#Get a list of files.
file = os.listdir("./")
# file[0]Get the file name with (assuming that only one file exists)
df = pd.read_csv(file[0],header=None)
** Supplement 2 It seems that the astype argument can also be a dictionary type. It seems to be convenient when converting to a different type.
In
df.astype({'a': int, 'c': str}).dtypes
** Supplement 3 To enter the column name, to add the column, execute the following.
In
from pandas import DataFrame
#Insert column name
df.columns=['a','b','c','d']
#Add column by specifying column name
df = DataFrame(df, columns=['a','b','c','d','e'])
** 20170410 postscript
As pointed out in the comment, I tried the method of using "sep =" \ t "" as the argument of read_csv, but it was read with commas.
Out
0 1 2 3
0 0.2, 1.01, 0.60, -0.68
1 0.4, 1.00, 0.67, -0.69
2 0.6, 1.01, 0.61, -0.72
However, when I deleted the description of "sep =" \ t "" and performed read_csv, it read it as a float type normally without split or astype processing.
I'm running with pycharm, but if I reset the argument of sep and try again, will the tool automatically determine it? I do not understand.
In
import pandas as pd
import os ##For reading files
#Read the csv file under the sample folder
os.chdir("./sample")
file = os.listdir("./")
df = pd.read_csv(file[0],header=None)
os.chdir("../")
#Check the contents of DataFrame
print(df)
print(df.dtypes)
Out
0 1 2 3
0 0.2 1.01 0.60 -0.68
1 0.4 1.00 0.67 -0.69
2 0.6 1.01 0.61 -0.72
0 float64
1 float64
2 float64
3 float64
dtype: object
So far, four types of events have been confirmed, and the cause is unknown.
Read mixed comma tab data using Pycharm_When reading with csv
① When it is read correctly with float without doing anything
(2) When it is read with tabs mixed if nothing is done
③sep="\t"When is correctly read by float when is specified
④sep="\t"When is read with commas when is specified
Recommended Posts