Introducing a case where I was addicted to the index in the first column when I tried to process the data downloaded by the in-house system with pandas
The data in question (of course not the actual data)
name,population,area
Osaka,2691k,223,
Nara,353k,276,
Kyoto,1472k,827,
Koube,1542k,552,
Wakayama,355k,208,
When you read_csv () the following data that seems to have no problem at first glance, the first column (name) is index.
The cause is that there is a "," at the end of each record, but there is no "," at the end of the Header line. If you try putting "," at the end of the header line. As shown below, an extra column is added, but the index is automatically calculated.
This sample is summarized in a csv file for easy understanding, but it took extra time because it was tsv (tab delimited) that was actually clogged up in the work.
Let's see the data properly without drowning in the tool.
Even so, I feel that the chances of using Excel have decreased since I became able to edit data lightly with pandas. The data this time was also a tsv file of about 50M, but it could be read in a few seconds. (Excel has hung ...)
Recommended Posts