First, understand the overall flow of data analysis. The data analysis process includes the following, which are advocated as standard processes.
In CRISP-DM (CRoss Industry Standard Process for Data Mining) advocated by Shearer et al. We advocate the process shown in the following figure.
In this process
(1) Clarify business issues through business understanding and plan a data analysis project. (2) Understand the current data by acquiring the data by understanding the data and checking whether the data is ready for analysis. (3) In data preparation, format the data into the format required for subsequent modeling. (4) 5. Evaluate the analysis results obtained by modeling, and if sufficient results are obtained, 6. apply the analysis results to the business. As shown in the figure, these processes are not one-way streets, but go back and forth between the previous and next processes as needed.
The pre-processing learned here is
CRISP-In DM, it corresponds to data understanding and data preparation.
KDD is explained in the next section.
Compared to CRISP-DM, which considers the entire data analysis project in the business KDD (Knowledge Discovery in Databases) advocated by Fayyad et al. We are more focused on the data analysis part. The diagram below shows the KDD process.
Recommended Posts