sort warning in the pd.concat function

If you want to join vertically, put sort, and for now, sort the columns.

In conclusion, if you want to vertically pd.concat two data frames with different columns or different column order, you must put sort = True or sort = False. Otherwise, the following warning will be issued.

pd.concat([df_1, df_2])

=============================================
FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  = pd.concat([df_1, df_2])

Practical example: When only the column order is different

What's wrong after all? At first, I didn't get a lot of images, so I'd like to give a simple concrete example here. Prepare two data frames, df_1 and df_2.

df_1 = pd.DataFrame({"b": ["kiwi", "avocado", "durian"],
                     "a": ["NY", "CA", "Seattle"]
                    })
df_2 = pd.DataFrame({"a": ["Tokyo", "Osaka", "Sapporo"],
                     "b": ["apple", "banana", "orange"]
                    })

df_1:

	b	a
0	kiwi	NY
1	avocado	CA
2	durian	Seattle

df_2:

	a	b
0	Tokyo	apple
1	Osaka	banana
2	Sapporo	orange

df_1 is in the order of b and a, and df_2 is in the order of a and b.

Concat with sort = False as an argument

Let's pass it first with sort = False.

concat_false = pd.concat([df_1, df_2], sort=False)

concat_false:

	b	a
0	kiwi	NY
1	avocado	CA
2	durian	Seattle
0	apple	Tokyo
1	banana	Osaka
2	orange	Sapporo

It is the same as df_1 and is lined up with columns b and a.

Concat with sort = True (same as concat without sort argument)

If sort = True is set here, it will be as follows.

concated_true = pd.concat([df_1, df_2], sort=True)

concated_true:

	a	b
0	NY	kiwi
1	CA	avocado
2	Seattle	durian
0	Tokyo	apple
1	Osaka	banana
2	Sapporo	orange

In this case, the order is columns a and b. If you don't pass the sort argument, it will (for now) assume sort = True and combine. Instead, a warning will occur.


concated = pd.concat([df_1, df_2])
#Concated and concated using the equals function_Check if true is the same
print(concated.equals(concated_true))
# True
=============================================
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  """Entry point for launching an IPython kernel.

By using the equals function, we can see that the two dfs, concated_true (sort = True) and concated (without the sort argument), are equal.

Practical example: When columns are different

It is almost the same even if the columns are different.

df_1 = pd.DataFrame({"a": ["Tokyo", "Osaka", "Sapporo"],
                     "b": ["apple", "banana", "orange"],
                     "c": [3, 2, 1],
                     "e": [2, 4, 8]})
df_2 = pd.DataFrame({"b": ["kiwi", "avocado", "durian"],
                     "c": [1, 3, 5],
                     "a": ["NY", "CA", "Seattle"],
                     "d": [2, 20, 1]})

df_1:

	a	b	c	e
0	Tokyo	apple	3	2
1	Osaka	banana	2	4
2	Sapporo	orange	1	8

df_2:

	b	c	a	d
0	kiwi	1	NY	2
1	avocado	3	CA	20
2	durian	5	Seattle	1

The columns that are in common are columns a, b, and c. The difference is column d and column e.

concat with sort = False as an argument

concat_false = pd.concat([df_1, df_2], sort=False)

	a	b	c	e	d
0	Tokyo	apple	3	2.0	NaN
1	Osaka	banana	2	4.0	NaN
2	Sapporo	orange	1	8.0	NaN
0	NY	kiwi	1	NaN	2.0
1	CA	avocado	3	NaN	20.0
2	Seattle	durian	5	NaN	1.0

Looking at the columns, they are not in alphabetical order: a, b, c, e, d. It is the column a, b, c, e of df_1 with the d column of df_2 attached from the right.

Concat without sort argument (same as concat with sort = True)

If you concat these two dataframes without sort, you get:

concat = pd.concat([df_1, df_2])

=============================================
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  """Entry point for launching an IPython kernel.

concat:

	a	b	c	d	e
0	Tokyo	apple	3	NaN	2.0
1	Osaka	banana	2	NaN	4.0
2	Sapporo	orange	1	NaN	8.0
0	NY	kiwi	1	2.0	NaN
1	CA	avocado	3	20.0	NaN
2	Seattle	durian	5	1.0	NaN

This is in alphabetical order as a, b, c, d, e. Nothing has changed regarding the content of the data.

This has the same result as doing sort = True.

concat_true = pd.concat([df_1, df_2], sort=True)
# concat_Check if true and concat are the same
concat_true.equals(concat)
# True

# concat_true and concat_Check if false is the same
concat_false.equals(concat_true)
# False

concat_true:

	a	b	c	d	e
0	Tokyo	apple	3	NaN	2.0
1	Osaka	banana	2	NaN	4.0
2	Sapporo	orange	1	NaN	8.0
0	NY	kiwi	1	2.0	NaN
1	CA	avocado	3	20.0	NaN
2	Seattle	durian	5	1.0	NaN

Finally: If the warning is noisy, why not sort = True for the time being?

This pandas concat warning doesn't hurt if you leave it alone, but it's moyamoya. It doesn't affect the data itself, only the order of the columns matters, so sort = True may be fine for the sake of readability.

The reference stackoverflow is as follows.

https://stackoverflow.com/questions/50501787/python-pandas-user-warning-sorting-because-non-concatenation-axis-is-not-aligne