I have the following DF and want to do One-Hot encoding only for the Country column.
Country | Age
--------------------------
Germany | 23
Spain | 25
Germany | 24
Italy | 30
Up to scikit-learn version 0.20, you only had to specify the index of the column you want to do One-Hot encoding in categorical_features
.
In other words, it looks like this.
from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X)
From scikit-learn version 0.23, ColumnTransformer
is used for such patterns that are processed differently for each column.
Don't forget to specify remainder =" passthrough "
to leave the columns that are not covered.
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
column_trans = ColumnTransformer(transformers=[('categorical', OneHotEncoder(), [0])],
remainder="passthrough")
X = column_trans.fit_transform(X)