For example, suppose that there is the following "function that processes DataFrame".
import pandas as pd
def preprocess(df: pd.DataFrame) -> pd.DataFrame:
df["full_name"] = df["first_name"] + " " + df["last_name"]
return df
The DataFrame argument for this function is expected to contain the columns first_name
and last_name
, but you may want to check this at the beginning of the function.
This can be easily written using the set type [^ set] operation.
import pandas as pd
def preprocess(df: pd.DataFrame) -> pd.DataFrame:
required_columns = {"first_name", "last_name"}
if not required_columns <= set(df.columns):
raise ValueError(f"missing columns: {required_columns - set(df.columns)}")
df["full_name"] = df["first_name"] + " " + df["last_name"]
return df
If you write it like this, it will throw a ValueError if the required column is missing.
df = pd.DataFrame([{"first_name": "John", "age": 30}]) # 'last_name'DataFrame with missing columns
preprocess(df) #=> ValueError: missing columns: {'last_name'}
Recommended Posts