started python

__ It doesn't matter if you attach py to anything __

You have to deal with a lot of weird tsv, and pandas seems to be useful? It looked like, so I'm using it as a trial. It is a memo about the environment construction and usage of python.

getting started

Do Introduction to python3 of dot installation, Take a quick look at pythonizm and use it as a reference. Dot installation is easy and nice. You can proceed at a good tempo in 3 minutes per video.

reading

Introduction to Python for Computational Science and Technology: Development Basics, Essential Libraries, Acceleration I read this. It was nice that this one was also systematically written.

Environmental setting

There is a support page of the book, and the author usually writes it there. I did it on Mac, so I put pyenv in homebrew and I installed anaconda via it.

brew install pyenv
pyenv install anaconda3-4.2.0

Workflow

Basically, it is common to read a file, format it, and make it a different format file.

--Confirmation and data formatting on jupyter notebook --If OK, a single script (export)

I am doing it in the flow. jupyter notebook, convenient because you can check the data immediately.

Note

How to start jupyter notebook

jupyter notebook

Read pandas data

import pandas as pd
pd.read_csv('FILENAME', delimiter=',', low_memory=False)
#low_memory is used when reading heavy files

Data frame manipulation

Extract only columns A, B, and C.

dataframe.ix[:, ['A','B','C']]

JOIN

#join,how to left,You can do right etc.
pd.merge(dataframe1, dataframe, how='left')

Put the appropriate number in the data frame

#0 using numpy~Add one line of 3000 numbers to the data frame
import numpy as np

length = len(dataframe)
dataflame['dummy']=np.random.randint(0,3001,length)

NaN processing (when nothing is included when reading)

#Drop a row with Nan
dataframe.dropna()

#Convert NaN to a specific character
#Example Convert NaN to 0(Column specification is also possible)
dataframe.fillna(0)

Data type conversion

#Check your current shoulder
dataframe.dtypes

#to int(Column specification is also possible)
dataframe.astype('int')

Row and column references and operations

#View rows
dataframe.index

#Display columns
dataframe.columns

#Rename column(X,Y,Change to Z)
dataframe.columns=['X','Y', 'Z']

#Do something for all columns
dataframe.apply(function, axis=1)

#Duplicate deletion select distinct in mysql(column)
dataframe['column'].drop_duplicates()

#groupby Aggregate by specific key(sum,There are various things such as mean)
dataframe.groupby('column').sum()

Other

I brilliantly batting with the existing ansible, so this is the solution How to specify python to be used in ansible

It seems that the int that sticks to the dataflame marge is forcibly changed to float64 type. Be careful when outputting.

Is it something like this for the time being? Next, I want to draw a graph.