There was a problem of reading the data of 30 seconds cycle collected in one year at multiple bases and calculating the total value of each time, and I could not process it at all if I did it with the straightforward method, so I will make a note of the contents that I devised a little.
The date and time and value are entered in CSV format for each base.
Date and time | value |
---|---|
2018-10-01 00:00:00 | 4 |
2018-10-01 00:00:30 | 1 |
2018-10-01 00:01:00 | 2 |
2018-10-01 00:01:30 | 6 |
2018-10-01 00:02:00 | 7 |
2018-10-01 00:02:30 | 7 |
: | : |
2019-09-31 23:59:30 | 7 |
This data is collected from more than 100 locations. In addition, it was not real-time, but the content was to aggregate after a certain period of time.
By simple calculation, one location is 1,051,200, and 100 locations total 105,120,000 data.
...Billion(-_-;)
Read all the files at once, group by date and time and get the total value!
python
from glob import glob
import pandas as pd
files = glob("data/*.csv")
df = pd.DataFrame()
for file in files:
df = pd.concat([df, pd.read_csv(file)])
df = df.groupby("Date and time").sum()
df.to_csv("Total value.csv")
... RAM usage is steadily using the swap area, and it ends with an error when it exceeds 80GB after half a day.
I tried the method of reducing the number of files and calculating the total value little by little, but it doesn't seem to work.
I tried to read the file and calculate the total value each time.
python
from glob import glob
import pandas as pd
files = glob("data/*.csv")
df = pd.DataFrame()
for file in files:
df = pd.concat([df, pd.read_csv(file)])
df = df.groupby("Date and time").sum().reset_index()
df.to_csv("Total value.csv")
As a result, it took only a few minutes without overwhelming the RAM.
It may be a matter of course, and once I understand it, it's nothing to do, but since I've spent a little time, I thought it would be good if even one person who had the same difficulty could be reduced, so I recorded it.
By the way, my work starts from here. Data analysis, let's do our best ... (-_-;)
Recommended Posts