** "I've heard a lot about" data scientists "and" statistics "recently, and I'm interested, but honestly, I'm not so familiar with that field, so where should I start?" ** It is an article drawn for people.
In other words, it is an article that I would have thought "I wish I had" X years ago.
So I don't publish any books that mention maniac content that makes people happy, "I'm a crunchy data scientist."
** "There is no PRML ... Is it the basic key, are you moguri?" ** If you think that, you may not get it from this article.
This article is aimed at beginners.
[Please also here] ・ It's time to seriously think about the definition and skill set of data scientists. http://qiita.com/hik0107/items/f9bf14a7575d5c885a16
It is divided into three parts, "Introduction", "Programming", and "Statistical Model / Machine Learning". I hope you can see it from where you are interested.
Also, since it is intended for beginners, I drew the "easiness of attachment rate" for each source of information completely at my own discretion.
★ ☆☆☆☆: Difficult for beginners ★★★★★: Easy to get along with even beginners In addition, this rate is set appropriately based on complete discretion. There is no strict basis. Also, this rate has nothing to do with the quality of the information.
To the last, the criterion is whether or not when I saw it when I was a beginner, I thought, "It's a little hard to get along with ...". Please see for reference.
The information in the article is subject to change as appropriate. In particular, regarding the addition, there are many sources that have not been written yet, so we plan to add them one by one.
O'Reilly: Data Science Lecture http://goo.gl/rZqhE5
★★★☆☆ It's pretty well organized as an introduction. A book that covers everything from the current situation surrounding data scientists to the outline of statistical models and practical work. If you have a minimum knowledge of the data analysis area, this is a good book to read first.
That mathematics determines the strategy http://goo.gl/Rkd5q
★★★★★ It's not a study book, but a reading material ... A book with examples of using data and statistical models in a variety of fields, from wine to crime, marketing and film. The author is a university professor, so collecting case studies is not odd.
Sexy Little Numbers http://goo.gl/DMOKrs
★★★★★ It may be a little different from the image of a data scientist that is said in the world. But it's a book that teaches you that sometimes you don't need large amounts of data or difficult statistical models for business-worthy analysis.
O'Reilly: Beautiful Data http://goo.gl/LNvaUW
★★☆☆☆ A collection of examples of how data is applied in what fields The story of Facebook data scientists also comes up, content that people who like it will like
schoo: Data analysis course that can be used in the field https://schoo.jp/teacher/184
★★★★★ Mr. Yoshinaga, a data scientist at Recruit Communications, will give a lecture on the practice of data analysis.
gacco: Data science course for working people http://gacco.org/stat-japan/
★★★★☆ It's thin and wide, so it's a good place to look at it as the first entry. The famous Professor Nishiuchi, who declared "the strongest scholarship = statistics", has also appeared.
Blog of data scientists working in Ginza http://tjo.hatenablog.com/
★★★☆☆ This is the blog of "T.J.Ozaki-san" who is famous in the data scientist industry. There is a considerable amount of information, but since it is a blog, it is not written systematically. Therefore, it may be a good learning method for people with some knowledge to scan and read the articles they care about. However, since it touches not only models and programming but also the flow of the industry, I think that such articles are easy for beginners to read.
Introduction to Data Analysis with Python-Data Processing with Numpy and Pandas http://goo.gl/YflT0M
★★★☆☆ Learn more about Pandas and Numpy, essential tools for analysis in Python
Collective intelligence programming https://www.oreilly.co.jp/books/9784873113647/
★★☆☆☆ Learn while implementing typical machine learning algorithms in Python Because it is for people who can use Python to some extent and have basic knowledge of algorithms Both programming and algorithms may not be suitable as the first book of learning
Udacity: Intro to Data Science http://edmaps.co/udacity/course/ud359.html
★★★☆☆ While learning an introduction to data science, you will learn about data manipulation in Python with coding tests.
Udacity: Data Analysis with R http://edmaps.co/udacity/course/ud651.html
★★★☆☆ R class
Doshisha Data Science Laboratory http://www1.doshisha.ac.jp/~mjin/R/index.html
★★★☆☆ You can use R to learn a wide range of topics, from the basics of statistics to statistical / machine learning models. If you have time and want to use R, you can learn it comprehensively if you study this comprehensively.
Technical Review Let's Get Started Machine Learning http://gihyo.jp/dev/serial/01/machine-learning
★★☆☆☆ You can implement simple machine learning algorithms in Python. It is good to proceed while explaining the theoretical background etc.
A rudimentary summary of data manipulation in Python Pandas http://qiita.com/hik0107/items/d991cc44c2d1778bb82e
Data analysis in Python Summary of sources to look at first for beginners http://qiita.com/hik0107/items/0bec82cc09d0e05d5357
Beautiful graph drawing in python -seaborn makes data analysis and visualization easier http://qiita.com/hik0107/items/3dc541158fceb3156ee0
In terms of programs, it would be better to use SQL, Linux, Hadoop, etc. I don't know a good systematic source of information around here because I learned so messy. If you have any recommendations, please let me know m (_ _) m
An introduction to statistical modeling for data analysis http://goo.gl/mrX8vD As a supplement, http://hosho.ees.hokudai.ac.jp/~kubo/ce/NiigataiLecture2015.html It may be easier to understand if you proceed while looking at it (The link is to the page where the author's lecture handout is placed)
★★☆☆☆ You can systematically learn the basics of statistical models and generalized linear models from scratch. Eventually, we will talk about GLMM (mixed model) and MCMC, but I think we should learn GLM once. For a book of this kind, it's very easy to get along with because it has a rough narrative and doesn't care about rigor.
It is nicknamed "Midoribon", and reading clubs are also held. Audio and video of the commentary are also uploaded, so it may be better for people who are not fit to do it alone to learn such information as well. https://www.youtube.com/watch?v=nD3V4ovqr1A
Gentle statistics by R http://goo.gl/RJDzI
★★★★☆ A book to acquire basic knowledge of statistics while writing code in R It is good to be able to learn coding and statistics at the same time
Coursera: Machine Learning https://www.coursera.org/learn/machine-learning
★★★☆☆ Signboard class that is extremely popular among the online course Coursera Stanford Professor Andrew Ng's Machine Learning Class. The explanation is polite and recommended for beginners Classes are in English, but you can rest assured that there are Japanese subtitles.
Hierarchical Bayes and MCMC commentary https://www.youtube.com/watch?t=5&v=wO8jd0z5YRQ
★☆☆☆☆ A video in which Professor Kubo, the author of "Introduction to Statistical Modeling for Data Analysis" introduced above, himself explained the hierarchical Bayes model. It may be very effective if you study together with the main part of the book
Teradata Marketing Analytics http://goo.gl/t3JoMx
★★★★☆ A site with great detail about data mining models used in the marketing arena Teradata is really nice that this amount of information is completely free
Technical Review Machine Learning Let's Get Started http://gihyo.jp/dev/serial/01/machine-learning
★★☆☆☆ Implemented in Python while learning an overview of machine learning Great for those who want to learn while moving their hands (and those who use Python a little)
Kaggle Titanic Tutorial http://kagglechallenge.hatenablog.com/entry/2015/02/13/193155
★★★★☆ A tutorial on creating a prediction model from scratch based on the famous "Titanic Passenger Survival Prediction" on the data competition site Kaggle. I am happy that it is prepared in Excel, Python, and R respectively.
Recommended Posts