Data analysis using python pandas

python pandas numpy related

Ipython notebook is convenient for N = 1 data

It is more convenient to use ipython as it is for those who iterate with data of N = 10 or more.

However, it is difficult to do it if you use class etc. to make it object-oriented, so I am looking for a solution

class ClassName:
  def __init__(self, filename):
    self.data = pd.read_csv(filename)
    self.filename = filename

  def method(self):
    data = self.data

if __init_='__main__':
  def method(self):

  method()
  instance = ClassName(filename)
  data = instance.data

It may be easier to handle if you do like this, but in data analysis it may take an unusual amount of time to worry about pollution, so it may be better not to worry about it It may be good because data can be treated as a pandas object

Files to include

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy as sp
import scipy.signal as signal
import logging
import sys
import os
import re

In addition, pyenv and anaconda were very convenient for environment construction. The environment is File execution with% run in vim and iterm Or Do all the work on ipython notebook Looks good

To prevent rework

pd.dataframe.to_csv('default.csv')

I want to save the data of each stage with, but storage will be difficult, so TradeOff

The official tutorial of pandas is good

Convert loops and conditionals to slices and boolean arrays

Loops should only be used to get files

After all, numpy makes effective use of Boolean algebra arrays (validity array), and shifts the values by slicing. Worst of all in pandas, write using apply and lamda functions resample for datetime index

logger for the theory that print statements are not good

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

with this

logging.debug('here')

Is displayed on standard output or ipython

matplotlib related

Interactive drawing of matplotlib.pyplot for mac

%matplotlib osx

Is necessary? Investigation required for plt.ion () etc.

Drawing a cool figure

pd.options.display.mpl_style = 'default'

Draw points

plt.plot(marker='o', linestyle=None)
Or
plt.plot(linestyle='o')

Add options

Dynamic image drawing such as simulation

for x in range(y):
  plt.clf()
  plt.draw() #Not drawing here
  plt.pause(0.1) #Sure to draw here[s]

Need to be redrawn with

Drawing a graph

plt.show()

To prevent mass drawing of graphs

plt.close()
do it
plt.show()

simulation example

fig, ax = plt.subplots()
for i in np.arange(count):
    logging.debug(i)
    ax.clear()
    ax.set_title( self.title )
    self.data.iloc[ i*TIME_RANGE:(i)*TIME_RANGE].plot(ax=ax, legend=False)
    plt.draw()
    plt.pause(PAUSE_TIME)

Drawing 3D diagrams (I'm impressed because it's interactive)

from mpl_toolkits.mplot3d import Axes3D

Time series related

pandas has a great Datetime Index

resample method is convenient

Other

Get pid

print os.getpid()

logger setting logger = logging.getLogger() logger.setLevel(logging.DEBUG)

reload reload('./filename.py')

How to get elements

You can be impressed once with describe

Be careful of slicing with integers

[In] np.array([0, 1, 2])[:2]
Is
[Out] array[0, 1]

If you do not master the index, the processing will often be abnormally heavy.

Convenient to have two indexes, time series and integer

In option index = [Series, Series]

set_index

Index reference method

[: 2,:] etc.

ix iloc, loc iget_value Series only irow, icol DataFrame only

reindex and set_index are convenient

value_counts is convenient

Data merging method

concat is fast

It is better to go where you can go with ordinary substitution

data['name'] = pd.Series, list

na value processing method

+ May be early

dropna is often used

Convenient method of DatetimeIndex

resample

Tab completion and introspection required Overwhelmingly reduces the number of googles

Debugger seems to be useful (investigation required)

%debug
Move with u or d
s
b 12 #breakpoint
c
n
!Variable name

If it is digital, the first derivative has a diff function

Boolean algebra references and apply methods that are useful to get used to

Matplotlib rapper seaborn seems to be good

Statistical library

stats model seems to be easy to handle scipy.stats orange

Other libraries

Machine learning scikit-learn Scraping beautiful soup Natural language processing nltk Image processing opencv

Super good article

Few people, including myself, learn both R and python in earnest (impression-based), and comparisons are unreliable, but this is a reliable article. https://chezou.wordpress.com/2014/01/18/%E7%A7%91%E5%AD%A6%E8%A8%88%E7%AE%97%E3%81%AB%E3%81%8A%E3%81%91%E3%82%8B%E5%9D%87%E8%B3%AA%E5%8C%96%E3%80%81%E3%81%82%E3%82%8B%E3%81%84%E3%81%AF%E3%81%AA%E3%81%9Cpython%E3%81%8C%E7%9D%80%E5%AE%9F/ http://postd.cc/r-vs-python-head-to-head-data-analysis/

Recommended Posts

Data analysis using python pandas
Data analysis using Python 0
Data analysis python
Data analysis with python 2
Data analysis using xarray
Data analysis overview python
Data cleaning using Python
Python data analysis template
Data analysis with Python
My python data analysis container
Python for Data Analysis Chapter 4
[Python] Notes on data analysis
Python data analysis learning notes
Python for Data Analysis Chapter 2
Python for Data Analysis Chapter 3
[Python] [Word] [python-docx] Simple analysis of diff data using python
Data analysis environment construction with Python (IPython notebook + Pandas)
Process csv data with python (count processing using pandas)
[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-
[Python] Loading csv files using pandas
Data acquisition using python googlemap api
Hit treasure data from Python Pandas
Python: Time Series Analysis: Preprocessing Time Series Data
Python Pandas Data Preprocessing Personal Notes
Data visualization method using matplotlib (+ pandas) (3)
Preprocessing template for data analysis (Python)
Recommendation of data analysis using MessagePack
Data analysis starting with python (data visualization 1)
Data visualization method using matplotlib (+ pandas) (4)
Data analysis starting with python (data visualization 2)
[Introduction] Artificial satellite data analysis using Python (Google Colab environment)
Graph time series data in Python using pandas and matplotlib
[Python] Random data extraction / combination from DataFrame using random and pandas
My pandas (python)
Python visualization tool for data analysis work
Data analysis Titanic 2
Read pandas data
Start using Python
Recommendation tutorial using association analysis (python implementation)
Get Youtube data in Python using Youtube Data API
Data analysis Titanic 1
Practice of data analysis by Python and pandas (Tokyo COVID-19 data edition)
[Python] First data analysis / machine learning (Kaggle)
[Python] Various data processing using Numpy arrays
Creating a data analysis application using Streamlit
Python: Negative / Positive Analysis: Twitter Negative / Positive Analysis Using RNN-Part 1
Data analysis starting with python (data preprocessing-machine learning)
Data analysis Titanic 3
I did Python data analysis training remotely
I analyzed cowrie (honeypot) using python pandas
Creating Google Spreadsheet using Python / Google Data API
Python 3 Engineer Certified Data Analysis Exam Preparation
Scraping using Python
python pandas notes
[python] Read data
Time variation analysis of black holes using python
[Examination Report] Python 3 Engineer Certified Data Analysis Exam
[CovsirPhy] COVID-19 Python Package for Data Analysis: Data loading
Read Python csv data with Pandas ⇒ Graph with Matplotlib
Python3 Engineer Certification Data Analysis Exam Self-made Questions
[Python] Get all comments using Youtube Data API