Hello! This article is Kaggle's Advent Calendar, A scatter plot of Python's data visualization library "Plotly" I thought it would go well with competitions such as Kaggle, so I would like to introduce it.
First of all, I would like you to look at a concrete example. ** Specific example 1 ** Plot the distributed representation of words in 3D compression
** Specific example 2 ** Plot rental data of 23 wards of Tokyo on map data
What do you think! The strength is that these plots with a large amount of information can be drawn in just a few lines. There are many Python visualization libraries, I think Plotly has the following unique features.
--Can be written in a few lines --Can be used interactively ――Since you can zoom, you can check the details. --You can check the information of 5 variables at once by size, X, Y, Z, color. --You can check the information of the element by pointing the mouse. --Can be shared
Let's take a closer look at the plot, including the notation.
Python 3.7.4 plotly 4.1.0
We have prepared a dimensionally compressed version of the distributed representation of words. This time, the corpus uses text8, the learning uses gensim's word2vec class, and the dimension compression uses t-sne. Store the prepared distributed representation and words in a pandas DataFrame.
(For text8, I referred to https://hironsan.hatenablog.com/entry/japanese-text8-corpus.)
import plotly.express as px
fig = px.scatter_3d(df, x='x', y='y', z='z',text='word')
fig.show()
You can write in just 3 lines. If you pass the column name to the data frame and x, y, z, text respectively, it will be as above Plots are possible.
I used the data of Mynavi x SIGNATE Student Cup 2019. You can read more about the competition on my blog. http://zerebom.hatenablog.com/entry/2019/11/09/121233?_ga=2.241090371.157833494.1575468424-1743001014.1569899454
This competition was to predict the rent of each property using the rental information of the 23 wards of Tokyo. https://signate.jp/competitions/182
I formatted this data and prepared the following DataFrame.
Each column has the following meanings
--id: serial number --y_train: Correct rent data --oof: Rent forecast data --diff: Predicted value-Correct value --abs: Predicted value-Absolute value of correct answer value --loc_lat / loc_lon: Latitude and longitude
import plotly.express as px
px.set_mapbox_access_token('YOUR_API_KEY')
fig = px.scatter_mapbox(df, lat="loc_lat", lon="loc_lon", color="diff", size="abs",text='id',
color_continuous_scale=px.colors.sequential.Viridis, size_max=30, zoom=10)
fig.show()
This code was used to find out in which areas there are many rentals with large prediction errors after actually learning the data in the competition.
In order to match map data and latitude / longitude with Plotly, you need to register with a service called MapBox in advance and obtain an API key. You can easily get it from this site. (https://account.mapbox.com/)
In order to display it on the map, it is necessary to specify the argument as follows. This time
--color ... Predicted value error --size ... Absolute value of the error of the predicted value --text (character string displayed overlaid on the element) ... property id
color="diff", size="abs",text='id'
Specify the color map selection, maximum element size, and map zoom as follows.
color_continuous_scale=px.colors.sequential.Viridis, size_max=30, zoom=10
It will be changed if you pass the setting in dictionary type in fig.update_layout. There are many examples on plotly's official website, and the code and plot are a set, so If you have a setting that interests you, you may want to look at the official website.
(https://plot.ly/python/text-and-annotations/#text-font-as-an-array--styling-each-text-element)
fig.update_layout(
font={"family":"Open Sans",
"size":16})
I introduced Plotly because I thought that few people were using it for its potential. Especially, it is compatible with 3D data and map data, so please use it!
Load and use the learned Japanese model of Word2Vec https://qiita.com/omuram/items/6570973c090c6f0cb060
Make a Japanese version of text8 corpus and learn distributed expressions https://hironsan.hatenablog.com/entry/japanese-text8-corpus
How to paste a Gif animation captured on a Mac into a Qiita article https://qiita.com/ryosukes/items/b5dd0fac1a059caffbf0
Recommended Posts