Last post [I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ①] Same as (https://qiita.com/khigashimoto/items/80f0858b59379d461d97).
Last time, I tried the following environment ①, but this time I will try environment ②. --Environment ① Data analysis using Python + Numpy + Pandas + α --Environment (2) Data analysis using BI tools (Business Intelligence tools)
PC used: Surface LTE Core i5-7300U, memory 8GB PC OS used: Microsoft Windows10 Pro 64bit Browser used: Microsoft Edge
-Usage environment: Microsoft Power BI Desktop Ver 2.84.802.0 64-bit You can get Power BI Desktop from the Microsoft Store. Most of the functions including this work can be used free of charge.
As before, I will borrow a part of Udemy's data science course below. [180,000 people in the world] Practical Python data science
This time, we will use BI tools to perform the following analysis related to the famous "Titanic sinking" as the first step in data analysis.
・ What kind of people were the passengers of the Titanic? (Gender, age, etc.) ・ Causal relationship between the above-mentioned characteristics and their complex relationship and survival rate
Bring local data into the environment and display a summary Start PowerBI Desktop and import the saved CSV file.
-From the main screen after startup, select "Get data icon" ⇒ "Text / CSV"
・ The preview screen looks like this. Press the "Load" button
-When you return to the main screen, nothing is displayed, so press the "data screen icon".
・ You can see the read data.
Since the Power BI desktop is a GUI-based application (of course ...), unlike a CUI-based environment such as Python, you can immediately get a bird's-eye view of the data status on the screen itself.
・ "Try using a set bar graph. When you press the icon, the graph will be displayed on the upper left screen.
-Drag and drop the item called Sex on the right side of the screen to the center axis and value.
It's nice to be able to work graphically with this kind of usability like Excel. As an aside, I felt a little light while doing this work, I checked with the task manager below. Since it will be a desktop application, I felt that it would be more comfortable to have about 16GB of memory for a Win10 PC.
Four. Check the ratio of men and women for each room grade The item P class indicates the grade of the room. I could easily do the same with Power BI around here.
・ Similarly, select the set bar graph. Drag and drop the item "Sex" to "Axis" and "Value" and the item Pclass to "Legend".
It is also possible to easily reverse the axis. ・ Similarly, select the set bar graph. Drag and drop the item "Sex" to "Legend" and "Value" and the item Pclass to "Axis".
Five. Create items (Person) "Men", "Women", "Children (under 16 years old)" using items "Age" and "Sex" First, create a Person column. I tried enthusiastically, but at the moment I have not been able to generate columns well ... We believe that there are two types of column insertion. ・ Query editor This is a dedicated function for processing the original data according to the purpose of analysis without changing the original data. However, as shown below, I mentioned that I would like to add a custom column from this function, but the result will be an error.
-Pure column addition In that case, next time I tried to add a column directly on the data screen, but unfortunately this also results in an error.
Hmm. .. .. At the moment, this is the end of the verification. ..
In addition, the results of a series of analysis trials using Power BI are attached below. I feel that it is a good point of Power BI that you can list the results in this way and get a bird's-eye view side by side.
Here are my impressions of Power BI Desktop that I felt while doing the above work.
Benefits
・ The display method is rich (beautiful)
-Since it is a GUI-based application, even people who do not write programs can easily use it.
・ Multiple results can be viewed side by side.
Disadvantages
-For processing such as missing value processing and data generation, it may be confusing when trying to perform complicated work.
-Since there aren't many references to Power BI itself, you can't feel free to google when you're in trouble.
-Since it is an application that uses a real PC, does it feel a little light depending on the PC specifications?
CaseA) Python that has many references and can create detailed flows for complicated analysis such as preprocessing and generation of additional data. CaseB) Power BI is a display mechanism for actual workers that uses light analysis and the results after analysis in Python. I think it's better to live separately and use it. I think at this point.
Recommended Posts