In conclusion, if you want to vertically pd.concat two data frames with different columns or different column order, you must put sort = True or sort = False. Otherwise, the following warning will be issued.
pd.concat([df_1, df_2])
=============================================
FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
= pd.concat([df_1, df_2])
What's wrong after all? At first, I didn't get a lot of images, so I'd like to give a simple concrete example here. Prepare two data frames, df_1 and df_2.
df_1 = pd.DataFrame({"b": ["kiwi", "avocado", "durian"],
"a": ["NY", "CA", "Seattle"]
})
df_2 = pd.DataFrame({"a": ["Tokyo", "Osaka", "Sapporo"],
"b": ["apple", "banana", "orange"]
})
df_1:
b | a | |
---|---|---|
0 | kiwi | NY |
1 | avocado | CA |
2 | durian | Seattle |
df_2:
a | b | |
---|---|---|
0 | Tokyo | apple |
1 | Osaka | banana |
2 | Sapporo | orange |
df_1 is in the order of b and a, and df_2 is in the order of a and b.
Let's pass it first with sort = False.
concat_false = pd.concat([df_1, df_2], sort=False)
concat_false:
b | a | |
---|---|---|
0 | kiwi | NY |
1 | avocado | CA |
2 | durian | Seattle |
0 | apple | Tokyo |
1 | banana | Osaka |
2 | orange | Sapporo |
It is the same as df_1 and is lined up with columns b and a.
If sort = True is set here, it will be as follows.
concated_true = pd.concat([df_1, df_2], sort=True)
concated_true:
a | b | |
---|---|---|
0 | NY | kiwi |
1 | CA | avocado |
2 | Seattle | durian |
0 | Tokyo | apple |
1 | Osaka | banana |
2 | Sapporo | orange |
In this case, the order is columns a and b. If you don't pass the sort argument, it will (for now) assume sort = True and combine. Instead, a warning will occur.
concated = pd.concat([df_1, df_2])
#Concated and concated using the equals function_Check if true is the same
print(concated.equals(concated_true))
# True
=============================================
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
"""Entry point for launching an IPython kernel.
By using the equals function, we can see that the two dfs, concated_true (sort = True) and concated (without the sort argument), are equal.
It is almost the same even if the columns are different.
df_1 = pd.DataFrame({"a": ["Tokyo", "Osaka", "Sapporo"],
"b": ["apple", "banana", "orange"],
"c": [3, 2, 1],
"e": [2, 4, 8]})
df_2 = pd.DataFrame({"b": ["kiwi", "avocado", "durian"],
"c": [1, 3, 5],
"a": ["NY", "CA", "Seattle"],
"d": [2, 20, 1]})
df_1:
a | b | c | e | |
---|---|---|---|---|
0 | Tokyo | apple | 3 | 2 |
1 | Osaka | banana | 2 | 4 |
2 | Sapporo | orange | 1 | 8 |
df_2:
b | c | a | d | |
---|---|---|---|---|
0 | kiwi | 1 | NY | 2 |
1 | avocado | 3 | CA | 20 |
2 | durian | 5 | Seattle | 1 |
The columns that are in common are columns a, b, and c. The difference is column d and column e.
concat_false = pd.concat([df_1, df_2], sort=False)
a | b | c | e | d | |
---|---|---|---|---|---|
0 | Tokyo | apple | 3 | 2.0 | NaN |
1 | Osaka | banana | 2 | 4.0 | NaN |
2 | Sapporo | orange | 1 | 8.0 | NaN |
0 | NY | kiwi | 1 | NaN | 2.0 |
1 | CA | avocado | 3 | NaN | 20.0 |
2 | Seattle | durian | 5 | NaN | 1.0 |
Looking at the columns, they are not in alphabetical order: a, b, c, e, d. It is the column a, b, c, e of df_1 with the d column of df_2 attached from the right.
If you concat these two dataframes without sort, you get:
concat = pd.concat([df_1, df_2])
=============================================
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
"""Entry point for launching an IPython kernel.
concat:
a | b | c | d | e | |
---|---|---|---|---|---|
0 | Tokyo | apple | 3 | NaN | 2.0 |
1 | Osaka | banana | 2 | NaN | 4.0 |
2 | Sapporo | orange | 1 | NaN | 8.0 |
0 | NY | kiwi | 1 | 2.0 | NaN |
1 | CA | avocado | 3 | 20.0 | NaN |
2 | Seattle | durian | 5 | 1.0 | NaN |
This is in alphabetical order as a, b, c, d, e. Nothing has changed regarding the content of the data.
This has the same result as doing sort = True.
concat_true = pd.concat([df_1, df_2], sort=True)
# concat_Check if true and concat are the same
concat_true.equals(concat)
# True
# concat_true and concat_Check if false is the same
concat_false.equals(concat_true)
# False
concat_true:
a | b | c | d | e | |
---|---|---|---|---|---|
0 | Tokyo | apple | 3 | NaN | 2.0 |
1 | Osaka | banana | 2 | NaN | 4.0 |
2 | Sapporo | orange | 1 | NaN | 8.0 |
0 | NY | kiwi | 1 | 2.0 | NaN |
1 | CA | avocado | 3 | 20.0 | NaN |
2 | Seattle | durian | 5 | 1.0 | NaN |
This pandas concat warning doesn't hurt if you leave it alone, but it's moyamoya. It doesn't affect the data itself, only the order of the columns matters, so sort = True may be fine for the sake of readability.
The reference stackoverflow is as follows.
https://stackoverflow.com/questions/50501787/python-pandas-user-warning-sorting-because-non-concatenation-axis-is-not-aligne
Recommended Posts