Some ways to add new columns and rows to an created table.
There are other functions and methods that can be used, but for the time being, only the basics.
Base table creation
row0 = [1, 2, 3]
row1 = [10, 20, 30]
row2 = [100, 200, 300]
df = pd.DataFrame([row0,row1,row2], columns=['col0','col1','col2'])
df.index = ['row0', 'row1', 'row2']
df
・ Df ['A'] -Assign method ・ Join method ・ Concat function
■df['A']
df['A'] = B
└ "df": Original table
└ "A": Name of the column to add
└ "B": What to add
The content to be added can be specified relatively freely. Numerical values, values, table data, formulas, lists, etc. can be used.
** ▼ Case Study ** ① Add column (list) ② Addition of columns (numerical value) ③ Add column (table) ④ Add column (formula)
Add column (list)
df['col3'] = [4, 40, 400]
df
Add column (number)
df['col3'] = 4
df
Add column (table)
df['col3'] = df['col2']
df
Add column (formula)
df['col3'] = df['col2'] * 100
df
・ Not overwritten ・ Column names do not require quotation marks -If the element is specified by list, it must match the number of rows in the base table.
・ No column name TypeError: assign() takes 1 positional argument but 2 were given
・ The number of elements in list does not match ValueError: Length of values does not match length of index
1 row added
df.assign(A=[1,2,3])
1 column added (table data / formula)
df.assign(A=df['col0']*100)
2 columns added
df.assign(A=[1,2,3], B=100)
If you add "column name = contents", the number of columns will increase.
join(dfA, rsuffix='_a')
└ "dfA": Table to join
└ "rsuffix ='_a'": If the column name is duplicated, add "_a" (optional) to the added column name.
(1) Add a table with the same row name (no duplicate column name) (2) Add a table with row names that do not exist ③ When the column name is duplicated ④ When some column names are duplicated
The row name matches the row name of the base table and the column name does not duplicate an existing one.
Table with the same row name
dfA = pd.DataFrame([100,200,300])
dfA.index = ['row0', 'row1', 'row2']
dfA
** ▼ Add **
add to
df.join(dfA)
A new column has been added.
If there is a row name that does not exist in the base table └ ** No new rows are added to the base table **. └ ** Data whose row names do not match will be NaN ** (missing value).
Table with row names that do not exist
dfB = pd.DataFrame([100,200,300])
dfB.index = ['row0', 'XXX', 'YYY']
dfB
** ▼ Add **
add to
df.join(dfB)
If the column name is duplicated, specify the last character to be added to the duplicate column name with "rsuffix ='_a'".
** ▼ Options ** -"Rsuffix ='_a'": ** Add the specified character to the duplicate column name ** to be added (abbreviation of right suffix)
-"Lsuffix ='_a'": ** Add the specified character to the existing duplicate column name ** (abbreviation of left suffix)
Duplicate column name
dfC = pd.DataFrame([100,200,300], columns=['col0'])
dfC.index = ['row0', 'row1', 'XXX']
dfC
** ▼ Add (for rsuffix) **
rsuffix
df.join(dfC, rsuffix='_@')
rsuffix
df.join(dfC, lsuffix='_@')
-Duplicate columns have the suffix specified in the column name. -Column names that do not overlap are added as they are.
Partially duplicate column names
list1 = [100,200,300]
list2 = ['A','B','C']
list3 = ['AAA','BBB','CCC']
list4 = ['10A','20B','30C']
dfD = pd.DataFrame([list1,list2, list3, list4], columns=['col0', 111,'col1'])
dfD.index = ['row0', 'row1', 'XXX', 'YYY']
dfD
rsuffix
df.join(dfD, rsuffix='_@')
--The specified character string is added to the duplicate column name --Rows that do not exist in the base table are not added --In the column to be added, NaN will be entered in the row that did not exist
pd.concat([df, dfA], axis=1)
└ "df": Base table
└ "dfA": Table to add
└ "axis = 1": Instruction to add column
-Add as it is even if the column name is duplicated -If the line name is different, add a new line (default: join ='outer') -With join ='inner' option, leave only those with matching line names
・ Meaning of "concat" concatenate: concatenate. connect.
-Columns "col0" and "col1" overlap with the base table. -The row "xxx" is not in the base table
Table to join
list1 = [1,100,'AAA']
list2 = [2,200,'BBB']
list3 = [3,300,'CCC']
dfE = pd.DataFrame([list1,list2, list3], columns=['col0', 'col1', 'aaa'])
dfE.index = ['row0', 'row1', 'XXX']
dfE
** "axis = 1" is required when adding columns **. If it is not attached, it will be added in the row direction and the result will be significantly different.
「axis=1 ”Yes
pd.concat([df, dfA], axis=1)
・ Add columns -Add as it is even if the column name is duplicated (different from join method) -The value of the row that is not in the table to be added is NaN. -Added new rows that are not in the base table.
「axis=1 "none
pd.concat([df, dfA])
-If the column names do not match, a new column is added. -Even if the line name is duplicated, all are newly added.
** * Columns are added, but the rows are not integrated, so the content is different from what you want to execute. ** **
sort=True
pd.concat([df, dfA], axis=1)
The line "XXX" comes up with automatic sorting.
sort=False
pd.concat([df, dfA], axis=1, sort=False)
If you write "sort = False", the added line will be added at the end.
sort=False
pd.concat([dfA, df], axis=1, sort=False)
A line that does not exist in the base "dfA" is added to the end.
join='outer'
└ Default setting.
└ Leave a line that does not exist.
join='inner'
└ Leave only duplicate lines.
join='outer'
pd.concat([dfA, df], axis=1)
The line "XXX" that does not exist before and after the merge is added.
join='outer'
pd.concat([dfA, df], axis=1, join='inner')
Deleted rows "row2" and "XXX" that exist only in either one.
・ Loc method ・ Concat function
df.loc['A']=B
└ "df": Original table
└ "A": Name of the line to add
└ "B": What to add
The content to be added can be specified relatively freely. Numerical values, values, table data, formulas, lists, etc. can be used.
① Add line (list) ② Add line (numerical value) ③ Add row (table) ④ Add line (formula)
Add line (list)
df.loc['AAA'] = [4, 40, 400]
df
Add line (number)
df.loc['AAA'] = 4
df
Add row (table)
df.loc['AAA'] = df.loc['row2']
df
Add line (formula)
df.loc['AAA'] = df.loc['row2'] * 100
df
pd.concat([df, dfA])
└ "df": Base table
└ "dfA": Table to add
-Add as it is even if the line name is duplicated -Add a new column if the column name is different (default: join ='outer') -With join ='inner' option, leave only those with matching column names
・ Meaning of "concat" concatenate: concatenate. connect.
Table to join
list1 = [1,100,'AAA']
list2 = [2,200,'BBB']
dfA = pd.DataFrame([list1,list2], columns=['col0', 'col1', 'aaa'])
dfA.index = ['row0', 'XXX']
dfA
Default
pd.concat([df, dfA])
-Even if the line name is duplicated, all are newly added. -A new column is added for columns that do not match. ・ Not applicable cells are filled with NaN (missing value)
sort=True
pd.concat([df, dfA])
The "aaa" column added later comes to the top by automatic sorting.
sort=False
pd.concat([df, dfA], sort=False)
The "aaa" column added later is joined at the end.
join='outer'
└ Default setting.
└ Leave a column that does not exist.
join='inner'
└ Leave only duplicate columns.
join='outer'(Default)
pd.concat([df, dfA])
The column "aaa" that does not exist in the table before and after the join remains.
join='inner'
pd.concat([df, dfA], join='inner')
Only the columns that exist in both the tables before and after the join remain.
Recommended Posts