__ It doesn't matter if you attach py to anything __
You have to deal with a lot of weird tsv, and pandas seems to be useful? It looked like, so I'm using it as a trial. It is a memo about the environment construction and usage of python.
Do Introduction to python3 of dot installation, Take a quick look at pythonizm and use it as a reference. Dot installation is easy and nice. You can proceed at a good tempo in 3 minutes per video.
Introduction to Python for Computational Science and Technology: Development Basics, Essential Libraries, Acceleration I read this. It was nice that this one was also systematically written.
There is a support page of the book, and the author usually writes it there. I did it on Mac, so I put pyenv in homebrew and I installed anaconda via it.
brew install pyenv
pyenv install anaconda3-4.2.0
Basically, it is common to read a file, format it, and make it a different format file.
--Confirmation and data formatting on jupyter notebook --If OK, a single script (export)
I am doing it in the flow. jupyter notebook, convenient because you can check the data immediately.
jupyter notebook
import pandas as pd
pd.read_csv('FILENAME', delimiter=',', low_memory=False)
#low_memory is used when reading heavy files
dataframe.ix[:, ['A','B','C']]
JOIN
#join,how to left,You can do right etc.
pd.merge(dataframe1, dataframe, how='left')
#0 using numpy~Add one line of 3000 numbers to the data frame
import numpy as np
length = len(dataframe)
dataflame['dummy']=np.random.randint(0,3001,length)
#Drop a row with Nan
dataframe.dropna()
#Convert NaN to a specific character
#Example Convert NaN to 0(Column specification is also possible)
dataframe.fillna(0)
#Check your current shoulder
dataframe.dtypes
#to int(Column specification is also possible)
dataframe.astype('int')
#View rows
dataframe.index
#Display columns
dataframe.columns
#Rename column(X,Y,Change to Z)
dataframe.columns=['X','Y', 'Z']
#Do something for all columns
dataframe.apply(function, axis=1)
#Duplicate deletion select distinct in mysql(column)
dataframe['column'].drop_duplicates()
#groupby Aggregate by specific key(sum,There are various things such as mean)
dataframe.groupby('column').sum()
I brilliantly batting with the existing ansible, so this is the solution How to specify python to be used in ansible
It seems that the int that sticks to the dataflame marge is forcibly changed to float64 type. Be careful when outputting.
Is it something like this for the time being? Next, I want to draw a graph.
Recommended Posts