Introducing ** Altair **, Python's data visualization library. Since there is little information in Japanese, I will write an article for missionary purposes. If you are comfortable with English, it is easiest to see the Official Page.
You can easily install it with the pip
command. vega_datasets
will be used later, so let's install it together.
pip install altair vega_datasets
I used Google Colaboratory and it worked from the beginning without installation.
Load the library and dataset below.
import altair as alt
from vega_datasets import data
iris = data.iris()
Altair is good at working with Pandas, and ʻiris` is a Pandas DataFrame.
The following is the assumed code to be visualized directly with Jupyter etc. If you want to output in html, add .save ("filename.html ")
to the end. You don't need to have .interactive ()
, but if you write it, you will be able to move the graph. ** This article is a normal image, so if you want to move it around, please use here **.
Specify the x-axis and y-axis values as follows: You can also write ʻalt.X () `like a comment, and use this for complicated visualization.
alt.Chart(iris).mark_point().encode(
x="sepalLength", # alt.X("sepalLength"),
y="sepalWidth", # alt.Y("sepalWidth"),
color="species"
).interactive()
The point is to take the average for each species
with ʻaverage ()`. You can perform various operations other than average, and you can check the list at here.
alt.Chart(iris).mark_bar().encode(
x="average(sepalLength)", # alt.X("sepalLength", aggregate="average"),
y="species", # alt.Y("species"),
).interactive()
The point is like this. If you replace the make_xxxxx
part with make_line
, you can easily draw a line graph. If you have a problem, you can usually solve it by searching for a similar graph from Gallery on the official page.
The information specified by the tooltip
argument is displayed by mouse over.
alt.Chart(iris).mark_point().encode(
x="sepalLength",
y="sepalWidth",
color="species",
tooltip=["sepalLength", "sepalWidth", "petalLength", "petalWidth", "species"]
).interactive()
Quantitative data is basically visualized including 0. The graph above clearly indicates that it does not contain 0 with zero = False
.
alt.Chart(iris).mark_point().encode(
alt.X("sepalLength", scale=alt.Scale(zero=False)),
alt.Y("sepalWidth", scale=alt.Scale(zero=False)),
color="species"
).interactive()
Nominal scales are often integers, aren't they? In that case, specify that it is a nominal scale, such as species_int: N
. By the way, the ordinal scale is : O
, and for quantitative data it is: Q
. Details can be found in the official documentation here.
#Convert to integer value(setosa: 0, versicolor: 1, virginica: 2)
iris["species_int"] = [["setosa", "versicolor", "virginica"].index(x) for x in iris["species"]]
#Correct example
alt.Chart(iris).mark_point().encode(
x="sepalLength",
y="sepalWidth",
color="species_int:N"
).interactive()
By the way, without : N
, it will be as follows.
I get angry when I exceed 5000 lines. Execute the following referring to the information in here.
alt.data_transformers.disable_max_rows()
It can be described simply and is convenient for exploratory analysis. If there is a drawback, html output is easy, but png output seems to be a little difficult. FYI!
Recommended Posts