Introducing the library "** Scikit-mobility **" for handling human flow data in Python. This time, the content will be introductory so that people who are "what is human flow data in the first place?" Will be interested.
1.First of all 2. What is human flow data? 3. What can you do with Scikit-mobility? 4. Assumptions 5. Library installation 5. Let's look at the movement history data 6. Data set used this time 7. Read data 8. Visualize the movement history on the map 6. Summary
github[https://github.com/scikit-mobility/scikit-mobility]
Do you know a Python library called "** Scikit-mobility **"? Just made last year, which you may not know yet, ** For analysis of human movement data (hereinafter referred to as human flow data) A library with features **. In recent years, there is a background that a large amount of location information is accumulated in map applications and SNS, and algorithms for processing and analyzing human flow data, including evaluation of privacy risk, are in place.
First of all, I would like to briefly introduce "** What is human flow data? " and " What can scikit-mobility do? **".
Scikit-mobility mainly handles ** 2 types ** of data.
** Movement history data (trajectories) ** Latitude / longitude data showing the trajectory of movement. For familiar items, we use GPS to collect and analyze current location information used in map apps and SNS, and long-term behavior in research and surveys.
** Moving flow data (fluxes) ** Data on the flow rate of people moving between locations. It is data showing how many people went from a specific place (starting point / orient) to a specific place (ending point / destination) like an OD survey.
With scikit-mobility, you can easily perform the following analysis on human flow data.
--Data preprocessing --Behavior analysis (measuring) --Data generation (synthesis) --Predicting flow rate --Privacy Risk Assessment (Assessing)
I would like to delve into each content in the future. However, this time, I would like to introduce a little more about "** What is human flow data in the first place? **" just before that.
First, let's install the library.
$ pip install scikit-mobility
Use the Sample Data provided on github. (* Please note that it will be downloaded automatically. It is a text file of about 2MB)
This is Microsoft's [GeoLife GPS Trajectories](https://www.microsoft.com/en-us/download/details.aspx?id=52367&from=https%3A%2F%2Fresearch.microsoft.com % 2Fen-us% 2Fdownloads% 2Fb16d359d-d164-469e-9fd4-daa38f2b2e13% 2F). The Microsoft Research Asia Geolife project collects GPS log data from 2007 to 2012 for 182 users in Beijing.
The sample data contains data for two of them.
Let's read the downloaded data.
The movement history data is read with the data type TrjDataFrame
.
This is an extension of pandas' DataFrame.
#Data reading
tdf = skmob.TrajDataFrame.from_file('geolife_sample.txt.gz'
,latitude='lat'
,longitude='lon'
,user_id='user'
,datetime='datetime'
)
#Check the contents
print(tdf.head())
The contents are like this.
uid lat lng datetime
0 1 39.984094 116.319236 2008-10-23 13:53:05
1 1 39.984198 116.319322 2008-10-23 13:53:06
2 1 39.984224 116.319402 2008-10-23 13:53:11
3 1 39.984211 116.319389 2008-10-23 13:53:16
In order to create TrajDataFrame
, it is necessary to specify the column names corresponding to the three arguments.
-* latitude : latitude - Longitude : Longitude - datetime *: date
These are the basic information of the movement history, such as "when and where you were".
You can also optionally specify the following arguments:
-* user_id *: User ID It shows "who" movement history data. It does not have to be the data for one person, but it is necessary if the data for multiple people is mixed.
-* tid *: Trajectory id An ID is attached to a series of movements. For example, when the means of transportation is switched, such as "walk → bus → train", it is given when you want to distinguish each movement.
Of course, any column other than this can be read without any problem.
It is also possible to convert a data frame to TrajDataFrame
.
import pandas as pd
import skmob
#Preparation of sample data
data_list = [[1, 39.984094, 116.319236, '2008-10-23 13:53:05'],
[1, 39.984198, 116.319322, '2008-10-23 13:53:06'],
[1, 39.984224, 116.319402, '2008-10-23 13:53:11'],
[1, 39.984211, 116.319389, '2008-10-23 13:53:16']]
#Create a data frame
data_df= pd.DataFrame(data_list, columns=['user', 'lat', 'lon', 'datetime'])
print('Before conversion: ', type(data_df))
#Convert to TrjDataFrame
tdf = skmob.TrajDataFrame(data_df, latitude='lat', longitude='lon', datetime='datetime', user_id='user')
print('After conversion: ', type(tdf))
print(tdf.head())
Before conversion: <class 'pandas.core.frame.DataFrame'>
After conversion: <class 'skmob.core.trajectorydataframe.TrajDataFrame'>
uid lat lng datetime
0 1 39.984094 116.319236 2008-10-23 13:53:05
1 1 39.984198 116.319322 2008-10-23 13:53:06
2 1 39.984224 116.319402 2008-10-23 13:53:11
3 1 39.984211 116.319389 2008-10-23 13:53:16
You cannot tell where the latitude / longitude data is by looking at the numbers alone. It is important to check on the map. TrajDataFrame can be easily visualized as follows.
tdf.plot_trajectory(zoom=12, weight=3, opacity=0.9, tiles='Stamen Toner')
-* zoom : You can specify how much to zoom the map. - Weight : You can specify the weight of the line to draw - opacity : You can specify the transparency of the line to draw - tiles *: You can select the type of background map
It automatically color-codes each uid and displays it. If you look on the map, you can see where you have moved, how much activity you have, and where you are going.
You can see how far the user is moving by zooming out until you can see the entire range of activity. One user has gone quite far.
Also, markers are displayed for each user's first log (green) and last log (red). Click to pop up time and latitude / longitude
Visualizing it on a map in this way makes it easier to understand the user's movements.
What did you think. This time, I briefly introduced Scikit-mobility and what kind of data it handles. Since you don't usually see the movement history data, you may have seen it for the first time. We hope that you will take this opportunity to become interested in human flow data analysis. If you are using google map, it may be interesting to download and analyze your location information. (Download Google Map History (timeline)) In the next and subsequent articles, I would like to introduce the flow rate data and specific functions and algorithms. That's all for this time! Thank you for reading!
Recommended Posts