The latest version of pandas, 0.14.0, was released on 5/30. To upgrade from the previous version 0.13.1, enter the following command.
pip install --upgrade pandas
It was mentioned earlier that pandas is a high-level and powerful analysis library that provides objects called data frames. It's just (http://qiita.com/ynakayama/items/4362c439d9ea814cbe60), but you can use it to combine multiple datasets like RDBMS.
The read_json function of pandas can treat JSON as it is as a data frame. Here, we will use the API of the open source project House API Project. Here, information such as the number of Wifi client connections and temperature is released as open data using sensors. Let's merge this.
import pandas as pd
#Get the latest 100 Wifi connections on the 1st floor
wifi = pd.read_json('http://house-api-project.org/api/shibuhouse/wifi/clients?limit=100house-api-project.org/api/shibuhouse/wifi/1f/clients?limit=100')
#Get the latest 100 temperatures on the 1st floor
temp = pd.read_json('http://house-api-project.org/api/shibuhouse/1f/temperature?limit=100')
print( wifi.head(5) )
#=>
# _id clients time
# 0 {'$oid': '538c3799c3ca7c08531d645e'} 21 2014-06-02 08:36:36 UTC
# 1 {'$oid': '538c3760c3ca7c08531d6454'} 21 2014-06-02 08:35:37 UTC
# 2 {'$oid': '538c371fc3ca7c08531d6445'} 21 2014-06-02 08:34:36 UTC
# 3 {'$oid': '538c36e4c3ca7c08531d643b'} 21 2014-06-02 08:33:37 UTC
# 4 {'$oid': '538c36a7c3ca7c08531d642f'} 21 2014-06-02 08:32:36 UTC
print( temp.head(5) )
#=>
# _id temperature time
# 0 {'$oid': '538c37cac3ca7c08531d6465'} 34.000 2014-06-02 08:37:29 UTC
# 1 {'$oid': '538c378dc3ca7c08531d6459'} 34.000 2014-06-02 08:36:29 UTC
# 2 {'$oid': '538c3752c3ca7c08531d644d'} 34.000 2014-06-02 08:35:29 UTC
# 3 {'$oid': '538c3715c3ca7c08531d6442'} 34.000 2014-06-02 08:34:29 UTC
# 4 {'$oid': '538c36dbc3ca7c08531d6437'} 34.000 2014-06-02 08:33:30 UTC
#Merge two datasets by index
merged = pd.merge(wifi, temp, left_index=True, right_index=True)
The frequently used arguments are as shown in the table below.
argument | Description |
---|---|
left | Merge to the left |
right | Merge to the right |
how | How to join(inner, outer, left, right) |
on | When you want to explicitly specify the column name to use for merging |
left_on/right_on | Alias used as a join key |
left_index/right_index | When you want to merge by index |
The concat function concatenates datasets. It's similar to merging and confusing, but it's a good idea to read the documentation for more details.
Merge, join, and concatenate http://pandas.pydata.org/pandas-docs/stable/merging.html
In the JSON API example above, it looks like this.
concat = pd.concat([wifi, temp], ignore_index=True, axis=1)
concat.head(10)
#=>
# 0 1 2 \
# 0 {'$oid': '538c4081c3ca7c08531d6616'} 20 2014-06-02 09:14:37 UTC
# 1 {'$oid': '538c4045c3ca7c08531d660a'} 20 2014-06-02 09:13:37 UTC
# 2 {'$oid': '538c4005c3ca7c08531d65fb'} 20 2014-06-02 09:12:37 UTC
# 3 {'$oid': '538c3fcac3ca7c08531d65ef'} 20 2014-06-02 09:11:37 UTC
# 4 {'$oid': '538c3f8ec3ca7c08531d65e4'} 20 2014-06-02 09:10:36 UTC
# 5 {'$oid': '538c3f51c3ca7c08531d65d8'} 20 2014-06-02 09:09:36 UTC
# 6 {'$oid': '538c3f17c3ca7c08531d65cf'} 20 2014-06-02 09:08:36 UTC
# 7 {'$oid': '538c3edbc3ca7c08531d65c2'} 20 2014-06-02 09:07:37 UTC
# 8 {'$oid': '538c3ea1c3ca7c08531d65b9'} 20 2014-06-02 09:06:36 UTC
# 9 {'$oid': '538c3e61c3ca7c08531d65a8'} 22 2014-06-02 09:05:37 UTC
# 3 4 5
# 0 {'$oid': '538c4077c3ca7c08531d6611'} 34.125 2014-06-02 09:14:30 UTC
# 1 {'$oid': '538c403ac3ca7c08531d6606'} 34.125 2014-06-02 09:13:30 UTC
# 2 {'$oid': '538c3fffc3ca7c08531d65fa'} 34.125 2014-06-02 09:12:30 UTC
# 3 {'$oid': '538c3fc4c3ca7c08531d65ee'} 34.125 2014-06-02 09:11:31 UTC
# 4 {'$oid': '538c3f86c3ca7c08531d65e2'} 34.125 2014-06-02 09:10:29 UTC
# 5 {'$oid': '538c3f49c3ca7c08531d65d7'} 34.125 2014-06-02 09:09:29 UTC
# 6 {'$oid': '538c3f0ec3ca7c08531d65cb'} 34.125 2014-06-02 09:08:29 UTC
# 7 {'$oid': '538c3ed1c3ca7c08531d65bf'} 34.125 2014-06-02 09:07:29 UTC
# 8 {'$oid': '538c3e96c3ca7c08531d65b3'} 34.125 2014-06-02 09:06:29 UTC
# 9 {'$oid': '538c3e5bc3ca7c08531d65a7'} 34.125 2014-06-02 09:05:30 UTC
The frequently used arguments are as shown in the table below.
argument | Description |
---|---|
objs | Specify the pandas object to be concatenated with a list etc. |
axis | Row or column |
join | inner or outer, the default is outer |
keys | Values associated between concatenated datasets |
ignore_index | Generate a new index separately from the axis to be connected |
We've found that pandas makes it easy to merge and concatenate external datasets. For example, by plotting from here, you can see the possibilities of various analyzes such as drawing a scatter plot and examining the correlation.
Recommended Posts