Think about the analysis environment (Part 1: Overview) * As of January 2017

Preface

――This article is just an individual opinion ――I think there is a more suitable shape depending on the environment --There may be a smarter way due to lack of study

Flow of analysis environment construction

――We will build a virtual environment according to the following flow

No. Contents
1 Overall picture
2 Build a virtual environment with VirtualBox + vagrant
3 pyenv+Anaconda[Python、R]+ jupyter + Rstudio installation
4 postgreSQL+Install Pgadmin 4

What kind of person is it for?

――When thinking about the analysis environment, I thought about the analysis environment by ** what kind of work I usually do ** ――I think there are various ways of working, but this time I will consider the following analysis environment for people.

働き方.png

The result of thinking

――We considered the following analysis environment

環境構築の全体像.png

Thoughts

[1] Why virtualization?

--The analysis execution environment is likely to be SSH to a Linux-based server. --Reduce rework by building a Linux environment locally

There is practice ①

--Laptop environment is Windows or Mac --There is a Linux environment if you SSH to the analysis server ――Although there is an analysis server, I want to develop it locally rather than on the server first. --Reason (1): I don't want to bother people who are using the analysis server by sending strange code. --Reason (2): It is troublesome to transfer the result ――Reason ③: It is troublesome to SSH each time you want to analyze quickly --An error occurs when executing locally analyzed code on the analysis server --Cause (1): The package is not installed --Cause (2): Package version is different --Cause ③: Encoding and line feed code are different --Try & error on analysis server

There is practice ②

--Laptop environment is Windows --I want to run it on the terminal, but MS-DOS is difficult to use and I put Cygwin --Stumbling on library installation ――You can do your best and move it for the time being, but the environment will be messed up. --When I try to put Anaconda in Cygwin, it doesn't work. --Give up and analyze in Windows environment and do not run on the terminal --Try & error on analysis server

[2] Analysis client tool

――I personally like the analysis environment like Rstudio ――The reasons I like are as follows

  1. The execution result of the script is displayed separately
  2. Can be executed line by line
  3. You can write markdown
  4. You can do a slide show
  5. Can work with virtual environment
  6. Easy environment construction --Python also searched for a similar environment and summarized below
software result 1 2 3 4 5 6 Remarks
Rodeo × × Officially it says that you can specify markdown and server, but version 2.0.I couldn't do it at 13.
Spyder × × × × There is a possibility of insufficient investigation.
PyCharm × × × × There is a possibility of insufficient investigation.
Jupyter × R can also be used in a similar environment.
You can make a slide show using RISE.
Jupyter lab × × It seems that you can basically do what you can do with jupyter.
The terminal and script are split screens, but the results are not reflected in the terminal.
I want to look forward to it in the future.

[3] Software license

--Use an open source license --The reasons for using it are as follows ――Considering cooperation with academics, I want to avoid paid environments as much as possible --I don't want to be dependent on paid software

[4] Reasons for choosing other software

software Reason
Windows Because there are not a few MS office users around
(In this case, you can build a similar environment on Mac)
Oracle VirtualBox I wanted to build a virtual environment on an existing PC.
I wanted to create a virtual environment that can be used on both Windows and Mac.
※Detail isLet's summarize the virtual environmentDescribed in
vagrant To distribute the same environment.
I also tried Docker, which is popular recently, but it failed in various ways.
Ubuntu Match the OS of the analysis server
* I feel that there are many recent analysis-related articles on Ubuntu.
Anaconda Because the necessary libraries are organized and easy
Teraterm I've used it until now
WinSCP I'm not sure what else is there

reference

Recommended Posts

Think about the analysis environment (Part 1: Overview) * As of January 2017
Think about the analysis environment (Part 3: Install pyenv + Anaconda [Python, R] + jupyter + Rstudio)
About the virtual environment of python version 3.7
Explanation of the concept of regression analysis using python Part 2
Think about the next generation of Rack and WSGI
Explanation of the concept of regression analysis using Python Part 1
Tweet the probability of precipitation as part of the function of the bot
About the garbled Japanese part of pandas-profiling in Jupyter notebook
About the ease of Python
About the components of Luigi
About the features of Python
Think about why Kubernetes is described as "Linux in the cloud"
I made AI think about the lyrics of Kenshi Yonezu (pre-processing)
I made AI think about the lyrics of Kenshi Yonezu (implementation)
Think about the minimum change problem
About the return value of pthread_mutex_init ()
About the return value of the histogram.
About the basic type of Go
About the upper limit of threads-max
About the behavior of yield_per of SqlAlchemy
About the size of matplotlib points
About the basics list of Python basics
Roughly think about the loss function
About the order of learning programming languages (from beginner to intermediate) Part 2
If you want a singleton in python, think of the module as a singleton
Latin learning for the purpose of writing a Latin sentence analysis program (Part 1)
Think of me as a 5 year old and tell me about Scikit-learn's Permutation_Importance.
The infrastructure shop decided to develop "Web tools" as a theme. .. Until you think about the environment for creating "Web tools".
How to install CatBoost [as of January 2020]
Implement part of the process in C ++
About the behavior of enable_backprop of Chainer v2
A quick overview of the Linux kernel
The importance of Lint as Pythonista thinks
Build the execution environment of Jupyter Lab
Roughly think about the gradient descent method
About the development environment you are using
About the arguments of the setup function of PyCaret
Understand the "temporary" part of UNIX / Linux
The Python project template I think of.
About the Normal Equation of Linear Regression
If you think the PyCharm environment is broken, it's because of the file name
Active engineers personally write about the best development environment, frameworks, etc. [End of 2020]