Handles 3D data

The main data structure in pandas is the one-dimensional or line Series There are pandas.Series.html) and a two-dimensional or tabular DataFrame (http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html). This is the main object in pandas and is also detailed in Python for Data Analysis.

But there is actually another major object. That's the third 3D Panel that appears in Intro to Data Structures. .org / pandas-docs / stable / generated / pandas.Panel.html).

This three-dimensional data structure is useful, for example, when you want to extract arbitrary numbers from daily table data and perform statistical analysis on time series logs.

Create a Panel object

Panels can be created by taking a dictionary-formatted DataFrame or a 3D ndarray as arguments. Let's do it concretely.

import pandas as pd
rng = pd.date_range('1/1/2014',periods=100,freq='D')

#Create a data frame with random numbers, index ABCD
df1 = pd.DataFrame(np.random.randn(100, 4), index = rng, columns = ['A','B','C','D'])
df2 = pd.DataFrame(np.random.randn(100, 4), index = rng, columns = ['A','B','C','D'])
df3 = pd.DataFrame(np.random.randn(100, 4), index = rng, columns = ['A','B','C','D'])

#Create a Panel object by combining these data frames
pf = pd.Panel({'df1':df1,'df2':df2,'df3':df3})

pf
#=>
# <class 'pandas.core.panel.Panel'>
# Dimensions: 3 (items) x 100 (major_axis) x 4 (minor_axis)
# Items axis: df1 to df3
# Major_axis axis: 2014-01-01 00:00:00 to 2014-04-10 00:00:00
# Minor_axis axis: A to D

The Panel object was created like this. Each dimension is called Items axis, Major_axis, Minor_axis.

See the documentation (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Panel.html) to see what methods this object has.

Key operations on the Panel object

First of all, it will be popular to access by index.

pf.ix[0] #Access to df1
pf.ix[1] #Access to df2
pf['df1'] #This is also access to df1

In this way, you can access each table that Panel has.

#Add new column to table
pf['df1']['E'] = pd.DataFrame(np.random.randn(100, 1), index = rng)
pf['df2']['E'] = pd.DataFrame(np.random.randn(100, 1), index = rng)

#Check the data structure
pf.shape
#=> (3, 100, 4)

#df1 Access 10 columns in column E of the table
pf.ix['df1',-10:,'E']
#=>
# 2014-04-01   -1.623615
# 2014-04-02    1.878481
# 2014-04-03   -0.890555
# 2014-04-04    0.736037
# 2014-04-05   -1.451665
# 2014-04-06    0.126473
# 2014-04-07    0.997485
# 2014-04-08   -1.252981
# 2014-04-09   -1.136791
# 2014-04-10   -1.873199

Panel can also be converted to stacked data frames with to_flame (). Statistical functions can be used for this stacked data frame. In addition, this object can be reconverted to the original Panel with to_panel ().

pf.to_frame().to_panel()
#=>
# <class 'pandas.core.panel.Panel'>
# Dimensions: 3 (items) x 100 (major_axis) x 4 (minor_axis)
# Items axis: df1 to df3
# Major_axis axis: 2014-01-01 00:00:00 to 2014-04-10 00:00:00
# Minor_axis axis: A to D

Use Panel to analyze log data

Suppose your application log files are generated daily in a directory, for example Fluentd. When analyzing this log file across dates, it is very convenient because you can analyze the time series by tabulating the data for one day and using a three-dimensional data structure.

The other day's article I will rewrite and apply the program to get the file list of the directory that came out as a sample.

import sys
import os
import pandas as pd

def list_files(path):
    dic = {}
    for root, dirs, files in os.walk(path):
        for filename in files:
            fullname = os.path.join(root, filename)
            if filename.startswith("fluent") \
               and filename.endswith(".log"):
                try:
                    print("Reading: %(filename)s" % locals())
                    df = pd.read_table(
                        os.path.join(path, filename), header=None)
                    dic[filename] = df
                except pd.parser.CParserError:
                    print("Skip: %(filename)s" % locals())
    return pd.Panel(dic)

Since the Panel object returned by this method is a three-dimensional data structure that collects multiple log files, you can use statistical functions to analyze time-series data.

Summary

You can use Panels in pandas to work with 3D data structures. By adding another dimension in addition to the row and column data structure, it is useful for time series data analysis.

Working with 3D data structures in pandas

Handles 3D data

Create a Panel object

Key operations on the Panel object

Use Panel to analyze log data

Summary