Occasionally, the content of writing code properly. It took a lot of time to get confused when specifying the dataframe in pandas. I'd like to organize it. It's compiled by pandas beginners, so if you make a mistake or have a better summary, please give me some advice.
Click here for the blog that we run: Effort 1mm
Each version is as follows.
pandas (0.18.1)
numpy (1.11.0)
Python 2.7.10
There are three ways to specify elements in pandas DataFrame.
The prerequisite DataFrame was created with the following code by referring to 10minites to pandas.
dates = pd.date_range('20130101', periods=6) df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
The resulting DataFrame looks like this.
A B C D
2013-01-01 -0.682002 1.977886 0.348623 0.405755
2013-01-02 0.085698 2.067378 -0.356269 1.349520
2013-01-03 0.058207 -0.539280 0.023205 1.154293
2013-01-04 -0.319075 1.174168 -1.282305 0.359333
2013-01-05 -2.557677 0.922672 0.202042 0.171645
2013-01-06 1.039422 0.300340 0.701594 -0.229087
It seems that it can be processed in the row direction or the column direction by the specified method.
#Specify a single column
df[‘A’] #Column name=Specify A
df.A #Same as above
#Row direction slice: df[ 0:3 ]
df[0:3] #Specify from line 0 to line 3
df[‘20130102’:’20130104’] #Index is 2013-01-02~2013-01-Specify up to 04
The first of the specified arguments (may I call it?) Is the operation on the index, and the second is the operation on the column.
#Get the corresponding index
# A 0.469112
# B -0.282863
# C -1.509059
# D -1.135632
# Name: 2013-01-01 00:00:00, dtype: float64
df.loc[dates[0]]
#Specify the index column at the same time.
# A B
# 2013-01-01 0.469112 -0.282863
# 2013-01-02 1.212112 -0.173215
# 2013-01-03 -0.861849 -2.104569
# 2013-01-04 0.721555 -0.706771
# 2013-01-05 -0.424972 0.567020
# 2013-01-06 -0.673690 0.113648
df.loc[:, [‘A’, ‘B’]]
#Get index by index name
# A B
# 2013-01-02 1.212112 -0.173215
# 2013-01-03 -0.861849 -2.104569
# 2013-01-04 0.721555 -0.706771
df.loc[‘20130102’:’20130104’, [‘A’, ‘B’]] #When specifying multiple columns, pass them as a list
#If you want to specify only one, it is faster to use at than loc
#0.46911229990718628
df.at[dates[0],'A']
How to specify by the position of the element when replacing it with a matrix. Of course, you can select more than one.
#Specified by index position (3rd line this time)
# A 0.721555
# B -0.706771
# C -1.039575
# D 0.271860
# Name: 2013-01-04 00:00:00, dtype: float64
df.iloc[3]
#index/Simultaneous specification of columns(Here, 3rd to 4th rows, 0th to 1st columns)
# A B
#2013-01-04 0.721555 -0.706771
#2013-01-05 -0.424972 0.567020
df.iloc[3:5,0:2]
#Specify specific elements that are skipped
# A C
# 2013-01-02 1.212112 0.119209
# 2013-01-03 -0.861849 -0.494929
# 2013-01-05 -0.424972 0.276232
df.iloc[[1,2,4],[0,2]]
I have no choice but to get used to it!
Click here for the blog that we run: Effort 1mm
Recommended Posts