Data analysis is popular these days, so I'll analyze it by showing a sample of the code.

the code

The execution environment will be Python3.

In this article we will do the following:

--Read CSV --Simple column conversion --Aggregate and draw from various perspectives

Use Seaborn for drawing.

Seaborn: statistical data visualization

Data to be used

The data to be analyzed is as follows.

`target.csv`


datetime, id, value
20170606121314, 1,2
20170606121315, 1,3
20170606121316, 1,4
20170608121616, 1,4
20170608121617, 1,1
20170608121618, 1,2
20170606121540, 2,10
20170606121541, 2,8
20170606121542, 2,11
20170608121543, 2,4
20170606134002, 3,21
20170606134003, 3,10
20170606134004, 3,4
20170608134005, 3,50

datetime is a string of year, month, day, hour, minute, and second. Also assume that a certain value occurs every second for a certain period of time for a few seconds for each id.

Analytical work in Python

Read csv file

`python`


import pandas as pd

#CSV read
df = pd.read_csv("target.csv",sep=",")
df.columns = ["datetime","id","value"]

As a method to check if it was read

df.head()

It will be. Then, the output will be as follows.

	datetime	id	value
0	20170606121314	1	2
1	20170606121315	1	3
2	20170606121316	1	4
3	20170608121616	1	4
4	20170608121617	1	1

The head () method is a method that displays the first 5 lines of data and is often used to check the contents of data.

There is also a method called tail (), which displays 5 lines of data from the end of the data. The display result is as follows.

	datetime	id	value
9	2017-06-08 12:15:43	2	4
10	2017-06-06 13:40:02	3	21
11	2017-06-06 13:40:03	3	10
12	2017-06-06 13:40:04	3	4
13	2017-06-08 13:40:05	3	50

Also, in the following line, the column is set in the dataframe.

`python`


df.columns = ["datetime","id","value"]

datetime column from string to datetime

`python`


from datetime import datetime as dt

df.datetime = df.datetime.apply(lambda d: dt.strptime(str(d), "%Y%m%d%H%M%S"))

The purpose of doing this is to make the date column easier to work with. What we're doing is accessing the value in each row of the datetime column with df.datetime and parse the string with the strptime method. This allows values that were originally Strings to be converted to date and time types.

Aggregate by ID and see the number of records

`python`


df_by_id= df.groupby("id")["value"].count().reset_index()
df_by_id

groupby ("id ") aggregates records by value in the id column. The number of records by id is counted by count ().

The contents of df_byid are as follows.

	id	value
0	1	6
1	2	4
2	3	4

Draw in a histogram with the number of records on the horizontal axis

`python`


import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
id_df = pd.DataFrame(df_by_id)
sns.distplot(id_df.value, kde=False, rug=False, axlabel="record_count",bins=10)

We use a library called seaborn that draws beautiful diagrams.

スクリーンショット 2017-06-25 21.31.56.png

Aggregate by ID and see the total of value columns

`python`


df_value_sum= df.groupby("id")["value"].sum().reset_index()

The part that is count () above is just sum ().

The contents of df_value_sum are as follows.

	id	value
0	1	16
1	2	33
2	3	85

Aggregate by ID and get the time when the data first occurred

`python`


start_datetime_by_id = df.groupby(["id"])["datetime"].first().reset_index()
df_date = pd.DataFrame(start_datetime_by_id)

The contents of df_date are as follows.

	id	datetime
0	1	2017-06-06 12:13:14
1	2	2017-06-06 12:15:40
2	3	2017-06-06 13:40:02

Display how many data occurred on which day of the month with the date on the horizontal axis

`python`


sns.distplot(date_df.datetime.dt.month, kde=False, rug=False, axlabel="record_generate_date",hist_kws={"range": [1,30]}, bins=30)

With the option hist_kws = {"range ": [1,30]}, the horizontal axis draws in the range 0-30. This is where the data occurred out of the data on June 30, 2017. This is for the sake of clarity.

スクリーンショット 2017-06-25 21.44.55.png

Read CSV and analyze with Pandas and Seaborn

Data to be used

target.csv

Analytical work in Python

Read csv file

python

python

datetime column from string to datetime

python

Aggregate by ID and see the number of records

python

Draw in a histogram with the number of records on the horizontal axis

python

Aggregate by ID and see the total of value columns

python

Aggregate by ID and get the time when the data first occurred

python

Display how many data occurred on which day of the month with the date on the horizontal axis

python

`target.csv`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`