The title is "[Before Introduction to C Programming](https://www.amazon.co.jp/C%E3%83%97%E3%83%AD%E3%82%B0%E3%83%A9%E3" % 83% 9F% E3% 83% B3% E3% 82% B0% E5% 85% A5% E9% 96% 80% E4% BB% A5% E5% 89% 8D-% E6% 9D% 91% E5% B1% B1-% E5% 85% AC% E4% BF% 9D / dp / 4839920648 / ref = sr_1_2? adgrpid = 51730019485 & gclid = EAIaIQobChMI0Puty8bE5gIVQ7aWCh2ACQjIEAAYASAAEgIQqPD_BwE & hvadid = 338517772944 & hvdev = c & hvlocphy = 1028852 & hvnetw = g & hvpos = 1t1 & hvqmt = e & hvrand = 1357865022882897559 & hvtargid = kwd-333217628374 & hydadcr = 27264_11561112 & jp -ad-ap = 0 & keywords =% E3% 83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0% E5% 85% A5% E9% 96% 80% E4% BB% A5% E5% 89% 8D & qid = 1576856073 & sr = 8-2) Written by: Yukio Murayama. In other words, I don't learn machine learning, but there are many skills required for machine learning. I will talk.
First of all, I would like to introduce myself.
Both the undergraduate and graduate schools were in the artificial intelligence laboratory. At first, I was doing a lot of research based on Boltzmann machines, etc. For some reason, when I was a graduate student, I was hired by a company as a research part-time job. I graduated from the master's thesis after writing the research results there.
I was a person who could write a program relatively, ** I can't search for unknown words in the first place **, so I want to learn technology all the time ...! I couldn't search even though I thought I spent days of suffering.
After entering the laboratory, I was able to know the existence of Qiita and know Python. I was able to learn machine learning.
All thanks to Qiita since I couldn't even search because I didn't understand the words I gradually learned the word, and as a result, I was able to learn machine learning.
Basically, this is the end of "machine learning, how you learned". The rest is like a bonus, but I hope you can read it. (Like gum with toys)
This time, with thanks to Qiita
I will give a brief explanation by listing terms such as. Please go out with me.
Machine learning and Linux are inseparable from each other. It's not enough on Mac, and it's troublesome to put Python on Windows.
Therefore, machine learning is half-forced to use Linux. We will explain how to do this, useful commands, and necessary knowledge.
Windows 10 recently has a feature called * Windows Subsystem for Linux *. By using this, you can use a pseudo Linux environment on windows.
You can find out how to install it by google. Official Microsoft tool.
SSH
Abbreviation for Secured Shell, which is SSH. Think of it as a function for logging in to a remote server. In other words, it allows you to access remote servers.
What makes me happy when using SSH is while using the Mac interface. The calculation itself can be left to Linux. Also, it doesn't matter if your interface is Mac or Windows. You can use either.
You may want to use SSH on your home server or laboratory server while opening it to the outside. In that case, if you observe the following points in the settings on sshd.conf, you will basically have it.
--Set PermitRootLogin to no. --Set PasswordAuthentication to no. -(Even if no, you can log in with password to access directly in front of the main unit) --Set for public key authentication.
If you use SSH on public key authentication, you can securely enter the remote without a password. Let's use public key authentication. If you expose it to the internet, you have no other choice.
The mechanism etc. will not be explained in detail here.
Simply put
--You can create a public / private key pair by doing ssh-keygen
.
--Set the public key to authorized_keys and set various sshd_config
--Set to ~ / .ssh / config
on the accessing side like ʻIdentity File ~ / .ssh / id_rsa`
Then you can log in safely. Please gg for details.
tmux
When doing calculations while connecting remotely with SSH, If the network is disconnected during a long calculation time, ** the calculation result will return to nothing. ** **
You want to keep the state even if SSH is cut off, right? That's actually possible with tmux.
tmux has the concept of a session. It allows the pseudo terminal to remain in the process forever even if the SSH expires.
You probably don't need to install tmux as it's probably included in ubuntu 18.04 LTS.
How to start a tmux session
python
tmux new -s session_name
is. Feel free to name session_name.
tmux operates anything by first pressing the basic prefix key.
The prefix key is ctrl + b
by default, but if you set it to ctrl + a
I recommend it because it makes a lot of progress.
If you want to lose the session itself, use logout
.
This is an operation called detach. Press prefix, d
in that order
Type tmux a
in the terminal
tmux can also split the screen.
Entering prefix,%
will split the screen vertically.
If you enter prefix,"
, the screen will crack horizontally.
In fact, you can also display the clock.
You can do it with tmux clock-mode
.
You can also get it with prefix,?
,
https://qiita.com/nmrmsys/items/03f97f5eabec18a3a18b
I hope you can refer to this article.
~/.tmux.conf
You can also set various settings for tmux. There are various settings, but I referred to the following two articles.
Learn from the master.Basic settings of tmux.conf Show if Prefix key is pressed in tmux
The settings I always use are as follows.
#prefix key C-Change to a<img width="727" alt="Screenshot 2019-12-21 1.33.27.png " src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/182970/b5e6f309-53c3-0174-2b76-682a65156b75.png ">
set -g prefix C-a
# C-Unkey b binding
unbind C-b
#Reload the config file
bind r source-file ~/.tmux.conf \; display "Reloaded!"
# C-a*C to the program in tmux in 2-send a
bind C-a send-prefix
# |Split the pane vertically with
bind | split-window -h
# -Split the pane horizontally with
bind - split-window -v
#Use a 256-color terminal
set -g default-terminal "screen-256color"
#Allows you to see if the prefix key is pressed
set-option -g status-left '#[fg=cyan,bg=#303030]#{?client_prefix,#[reverse],} #H[#S] #[default]'
Basically, this is enough.
htop
htop is a tool that allows you to see resources.
You can see how much CPU resources are actually being loaded with this.
nvtop
nvtop is the GPU version of htop.
Is it like there is htop and there is nvtop? You can also see if you are using a GPU.
In ubuntu, in the case of 19.04, you can put it with ʻapt`, but Basically you need to build the source.
vi/vim
There is a high possibility that you will mess with files on Linux, such as on SSH. Vi and Vim are used in such a case. The difference between vi and vim is vi + various functions = vim. It's a lot of trouble with vi alone.
You can open it with the vi
and vim
commands.
Basically, it's okay if you remember the following.
Basically, moving the cursor, undoing, and searching are also done here. Here is a list of features that are useful to know.
:q Finish
:q!forced termination
:w Overwrite
:100 Move to line 100
/word word search(+n to move to next matching word)
u Undo(ctrl in windows+z-like behavior)
dd Delete current line(ctrl in windows+x-like behavior)
yy Copy current line(ctrl+c-like behavior)
p paste(ctrl+v-like behavior)
Corresponds to hjkl ← ↓ ↑ →(If it is a mac, if you input Japanese and zh, you will get ←)
Press the ʻi key or ʻO
to enter insert mode.
In insert mode, you can enter characters.
Press ʻESC` to return to normal mode.
(The reason why the ESC key was restored on mac is due to Vimmer ...?)
For details, there are as many operation methods as you can if you google. Please check it out
find
It literally looks for the file.
find [start_dir]
In terms of usage
find ~/ |grep File you want to identify
You can search for the location of the file.
tree
Shows the file in Tree format. I always use this when I want to understand the whole structure. The problem is that a lot of logs flow (
wc
Check the number of file lines Useful when you want to know how many lines are in a tsv file Or let me find
df/du
It will measure the file size. df is the size of the entire file size, du will show you the size of each individual file size.
df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 1.5M 3.2G 1% /run
/dev/sdb3 916G 33G 837G 4% /
tmpfs 16G 88K 16G 1% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
The -h
option will display the unit for the capacity.
In the File System section is dev (device) and its specific name.
/ dev / sdb3 is the concrete hardware such as SSD.
You can basically name it sd [x] [n]. Please gg for details.
On the other hand, if you want to see the individual file size, the du
command is effective.
For example, if you want to see the capacity list in the current folder
du -hs ~/*
Then, it will see all the file sizes that are currently individually loaded. It will show you which one is heavier.
grep
Use this when you want to extract only the relevant notation from a large number of logs.
find ~/ | grep filename
I will send the log by pipe processing, Only the part corresponding to filename can be extracted It also supports regex described later.
cat
You can output the file directly. Combine pipes
cat /var/log/auth.log | grep sudo
You can do a file search.
less/head/tail
When a fucking stupid big tsv called 120GB was sent
It takes a long time to die if you do vim logfile.tsv
.
(Let's send tsv at that level by parquet in the first place!)
In such a case, read only a part of the less command, It will be displayed on the screen. head displays the first few lines. tail will display the last few lines.
jq
It makes the json file look nice. For more information, please refer to Introduction to daily use of jq command.
sed
It will replace the string.
s / a / b / g
→ Convert a to b
Even in the engineer area, / of s / is often skipped by / g
or slack.
It is a common language.
I know Python, but how about installing it? Is a person I know a lot, but I'm wondering how to manage the version on Linux ... Recommended for people like.
pyenv
It will install the Python version for each user. Click here for details: [Permanent preservation version] Put pyenv + venv in ubuntu [Don't hesitate anymore]
After installing pyenv,
Enter the desired python version with pyenv install python-version
.
Basically, you should bring the simplest one such as anaconda.
Like pyenv install 3.6.9
That way, it will put python in your personal folder and won't pollute other users' environments.
venv
venv is a python package management tool. Select the basic python with pyenv and It is good to create an environment with venv and use pip. Click here for details: [Permanent preservation version] Put pyenv + venv in ubuntu [Don't hesitate anymore]
vscode
vscode has an ssh function that automatically reads public key settings It feels good to mess with files on the remote server while SSHing with the private key. On the other hand, trying to read the package per venv in a mystery, The impression is that it is not very suitable for coding. (If you can get code candidates while coding with venv + vscode, I'd like information!)
jupyter_notebook
Jupyter_notebook is an IDE that starts on a web browser. Basically start it on a remote server, Writing a notebook is convenient when you want the remote mackerel to do only the calculation.
Google Colaboratory
The environment is an IDE that Google prepares for you. The feature is that you do not have to do anything such as building an environment. You can code Python using Google resources. Thankfully, it also uses GPU and TPU resources. The basics are the same as Jupyter notebook, but the resources are only managed by Google.
tqdm
It will bring up a progress management bar. How far is deep learning and other heavy processing going on? The good point is that you can grasp it immediately. Data science processes basic fucking big files, so If there is a progress bar, it will take about a few minutes (sometimes 30 hours), so play the game with the switch during that time. You can do anything. Essential for machine learning.
Please refer to the official for how to use. It also has functions such as multiprocess and numerical monitoring.
pandas
It is a tool that processes tsv and parquet files like a table. It will be almost indispensable for data science. I don't like how to use it when I use kaggle.
matplotlib
You can display the graph.
Basically, if you do Kaggle, you will encounter it even if you don't like it.
Other means include seaborn
, plotly
.
Pickle
Save any python object.
You want to save the state on the way. Want to save a model made with Keras, or save a model of XGBoost made over a long period of time In such a case, pickle saves the whole thing. The pickle saved as a whole also saves all its functions, so It works even if you want to unzip it and use it for prediction immediately.
regex
It's a regular expression. If you want to search for and get a phone number in a large number of sentences, use this to Yoshi In the case of a phone number
\d{3,4}[-]?\d{3,4}[-]?\d{4}
You can get it. (I don't know what it is, but it's a mysterious document)
Docker
It's not a so-called virtual machine, but it separates middleware such as MySQL. Docker can be made independent so that it does not pollute the environment. If you don't have knowledge of Docker, you can install MySQL on the main unit ... Troublesome things like ** Ah failed ** will occur.
Docker divides services such as MySQL and nginx and pushes them into units called containers. You can throw away as many containers as you like and produce as many as you want. Building the environment is very easy.
For details, if you search with Docker, a very large amount of information will come out, so I think it would be good to refer to that.
Only the calculation is done by letting a huge desktop PC with a strong GPU calculate Only coding and instructions are from your Mac. If you want to use it conveniently like We recommend port forwarding + jupyter Notebook.
I wrote an article about port forwarding before, so I will introduce it there. Summary of access method to Jupyter Notebook (Lab) on remote server that any data scientist can pass
answer
Any strong PC is fine. It's okay to buy a GPU machine for gaming, In most cases, using GCP or AWS is more profitable or easier.
Anything is fine, but the basic thing you need to do is ** Somehow as sshd setting on GPU server If you do port forwarding and display jupyter on localhost, k **
answer
I usually recognize it by mac address, so ** Be sure to specify 172.168.1.22 for this mac address by DHCP ** If you set After that, set SSH, send the public key, and SSH to 172.168.1.22.
answer
First, get security knowledge by googled around sshd_config, then On the router side, you can set which inner port should be sent to which outer port. It depends on the environment and the provider, but since the basic outside IP changes, If you use a technology called ** DDNS **, you can access it from the outside by accessing a fixed domain.
Basic Colab is good to use. It's free However, if you use it too much, it will be cut off or it will be very slow. At that time, let's use GCP. It doesn't cost much, so it's okay ~~~~~
Since pyenv can be installed without the need for sudo rights, permission management is easy. I think it's apt version dependent, so the basics are okay (Since the build runs at the time of installation, it is impossible to do it without it.) It is convenient because it is easy to manage sudo rights.
This article Did you know that Colabratory can be pushed to GitHub and see the difference just on the screen? You can do it with.
AI platform for GCP, SageMaker for AWS!
This depends, but In the case of research, there is an original paper that says that it is like competing for grades with one data set, so you can refer to it. In the case of independent research, you can find out how to do it by searching for ** crawling / scraping **. If you really want to create a new dataset, Cloud also has an annotation feature. Why don't you use it?
What should I do with this? I will answer questions like this in Case here as much as possible. If you have any questions, please do not hesitate to ask. Thank you for staying with us for a long time.
Recommended Posts