python pandas numpy related

Ipython notebook is convenient for N = 1 data

It is more convenient to use ipython as it is for those who iterate with data of N = 10 or more.

However, it is difficult to do it if you use class etc. to make it object-oriented, so I am looking for a solution

class ClassName:
  def __init__(self, filename):
    self.data = pd.read_csv(filename)
    self.filename = filename

  def method(self):
    data = self.data

if __init_='__main__':
  def method(self):

  method()
  instance = ClassName(filename)
  data = instance.data

It may be easier to handle if you do like this, but in data analysis it may take an unusual amount of time to worry about pollution, so it may be better not to worry about it It may be good because data can be treated as a pandas object

Files to include

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy as sp
import scipy.signal as signal
import logging
import sys
import os
import re

In addition, pyenv and anaconda were very convenient for environment construction. The environment is File execution with% run in vim and iterm Or Do all the work on ipython notebook Looks good

To prevent rework

pd.dataframe.to_csv('default.csv')

I want to save the data of each stage with, but storage will be difficult, so TradeOff

The official tutorial of pandas is good

Convert loops and conditionals to slices and boolean arrays

Loops should only be used to get files

After all, numpy makes effective use of Boolean algebra arrays (validity array), and shifts the values by slicing. Worst of all in pandas, write using apply and lamda functions resample for datetime index

logger for the theory that print statements are not good

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

with this

logging.debug('here')

Is displayed on standard output or ipython

matplotlib related

Interactive drawing of matplotlib.pyplot for mac

%matplotlib osx

Is necessary? Investigation required for plt.ion () etc.

Drawing a cool figure

pd.options.display.mpl_style = 'default'

Draw points

plt.plot(marker='o', linestyle=None)
Or
plt.plot(linestyle='o')

Add options

Dynamic image drawing such as simulation

for x in range(y):
  plt.clf()
  plt.draw() #Not drawing here
  plt.pause(0.1) #Sure to draw here[s]

Need to be redrawn with

Drawing a graph

plt.show()

To prevent mass drawing of graphs

plt.close()
do it
plt.show()

simulation example

fig, ax = plt.subplots()
for i in np.arange(count):
    logging.debug(i)
    ax.clear()
    ax.set_title( self.title )
    self.data.iloc[ i*TIME_RANGE:(i)*TIME_RANGE].plot(ax=ax, legend=False)
    plt.draw()
    plt.pause(PAUSE_TIME)

Drawing 3D diagrams (I'm impressed because it's interactive)

from mpl_toolkits.mplot3d import Axes3D

Time series related

pandas has a great Datetime Index

resample method is convenient

Other

Get pid

print os.getpid()

logger setting logger = logging.getLogger() logger.setLevel(logging.DEBUG)

reload reload('./filename.py')

How to get elements

You can be impressed once with describe

Be careful of slicing with integers

[In] np.array([0, 1, 2])[:2]
Is
[Out] array[0, 1]

If you do not master the index, the processing will often be abnormally heavy.

Convenient to have two indexes, time series and integer

In option index = [Series, Series]

set_index

Index reference method

[: 2,:] etc.

ix iloc, loc iget_value Series only irow, icol DataFrame only

reindex and set_index are convenient

value_counts is convenient

Data merging method

concat is fast

It is better to go where you can go with ordinary substitution

data['name'] = pd.Series, list

na value processing method

+ May be early

dropna is often used

Convenient method of DatetimeIndex

resample

Tab completion and introspection required Overwhelmingly reduces the number of googles

Debugger seems to be useful (investigation required)

%debug
Move with u or d
s
b 12 #breakpoint
c
n
!Variable name

If it is digital, the first derivative has a diff function

Boolean algebra references and apply methods that are useful to get used to

Matplotlib rapper seaborn seems to be good

Statistical library

stats model seems to be easy to handle scipy.stats orange

Other libraries

Machine learning scikit-learn Scraping beautiful soup Natural language processing nltk Image processing opencv

Super good article

Few people, including myself, learn both R and python in earnest (impression-based), and comparisons are unreliable, but this is a reliable article. https://chezou.wordpress.com/2014/01/18/%E7%A7%91%E5%AD%A6%E8%A8%88%E7%AE%97%E3%81%AB%E3%81%8A%E3%81%91%E3%82%8B%E5%9D%87%E8%B3%AA%E5%8C%96%E3%80%81%E3%81%82%E3%82%8B%E3%81%84%E3%81%AF%E3%81%AA%E3%81%9Cpython%E3%81%8C%E7%9D%80%E5%AE%9F/ http://postd.cc/r-vs-python-head-to-head-data-analysis/

Data analysis using python pandas