As of 2019, there should be a lot of people out there who can't help but want to be a data scientist. However, the more you want to pretend, the less you know how to pretend. I completely excluded the muddy data scientist side and wondered how I could pretend it. The conclusions you draw can be put into practice immediately from tomorrow. If you want to be a data scientist, give it a try.
<img src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/204712/e8d86648-b700-a30e-7d49-7f427a5325fe.png " width="170")> When you open your MacBook, you'll see VS CODE. What do you like about VSCODE? I will answer like this when asked. "Hmm, first of all, lightness, and abundant extensions, the most attractive thing is remote debugging." Editors are always required to be light. And colleagues and friends must be impressed by the fashionable sounds of extensions and remote debugging.
Visualization is one of the highlights of data scientists. Once you have the data, let's visualize it with haste, even if nothing else. In addition, let's say to a colleague who draws graphs with MatPlotLib, "Now I recommend visualizing with Plotly. After all, it is most convenient to be able to see the data interactively."
<img src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/204712/7544d416-64da-7718-2868-dc0a431fc1b1.png ", width="200"><img src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/204712/a08b87f1-c054-d830-6f2a-657beddf5217.jpeg ", width="200"> Not using the cloud is not data science. Let's bring up the topic of AWS and GCP. And let's fire the keyword scaling. In other words, it would be even better if we could use terms such as S3 and IAM. Show off the size of the field that can be handled both on-premise and in the cloud.
This article is meant to "pretend" data scientists, who are said to be one of the most glamorous jobs at the end of 2019. The trigger was the situation around me when I attended a conference of a very famous IT company. It was very interesting because everyone looked the same. I wrote it a little playfully, but I intend to write something that is correct to some extent. Let's talk a little seriously about each and give some useful links and words.
Personally, I think Windows is fine, but I feel that it is excellent in terms of environment construction and compatibility with Linux. Many people recommend Mac. Of course there are Apple followers too. Think about the question of which is better, Windows or Mac for development What I did before I became a data scientist
I personally think this is an option. I don't even want to write Python outside of VSCODE anymore, and so does Markdown. The draft I'm writing this article is also VS CODE. Personally, I don't really feel the reason for choosing another editor now. Somehow VScode is the strongest for beginners, isn't it? 3 reasons to think 24 Recommended Extensions for VS Code (and Some Tips)
If I do data science, I wonder if I can't remove this now. All machine learning frameworks are provided in Python and are very compatible with the Cloud. Recommended programming languages for 2019
Also, if you use Flask etc., you can easily write a small web application, and various applications are easy to work. I think Python is excellent because I think it's important to have a sense of speed to try a little in a job like data science where trial and error are repeated.
I think visualization is one of the most important items for those who do data science. I wrote it playfully in the upper part, but Matplotlib is a matter of course, and now Plotly and Dash are highly recommended. I think it is important to display data so that humans can see it so that it can be said that what controls visualization controls data. (Personal view) Visualization tool Dash tutorial --Part1: Installation-Drawing- Create a web application that can be easily visualized with Plotly Dash
This area is a bit maniac, but by mastering list comprehensions, Maps, and Lambda, you can achieve what you want with short, clean code. It can also contribute to speeding up. Some people say that it is not readable, but I think it is familiar to some extent. The Hitchhicker's Guide to Python What I did when I wanted to make Python faster Utilization and misuse of list comprehension Introduction to super "practical" Python one-liner starting with list comprehension
After all, I want to create a new library, think about advanced things, and even faster, I need C ++. If you want to write something close to the hardware, you may need C. Of course, there are limits to interpreter languages, so languages like C ++ can't be ridiculous, of course. Needless to say here. Why is python so slow? Comparison of speeds in Python, Java, C ++
It's so major that you can't say in a hiring interview that you're not using the cloud in this era, so it's natural that you need to catch up. Even if you just started data science, it would be convenient if you could use ElasticSearch, Tableau, Jupyter's development environment quickly, and use many functions of SageMaker. Data science can be started in one day. Introduction to Python Data Science with Amazon SageMaker Part 1 Machine Learning: Data Scientist
I don't think it's necessary to participate in the kaggle competition, but there are many references to the visualization methods exchanged in the competition and how to create features, so keep an eye on the competition you care about. I don't think it's a bad thing to let it through.
Especially recently, kaggle's kernel has become easier to use, so you can feel free to touch the data a little. Dive into Kaggle with a powered-up kernel
It's the end of the year, so I made a playful article. I would appreciate it if you could think of it as a little bit. That's it.
Recommended Posts