Code reduction-Pipeline and Function Transformer-

Trigger

With make_pipeline that appeared in Code of 1st place in Mercari competition I didn't really understand the Function Transformer.

Summary

Make_pipeline → Convert code such as [preprocessing + learning + estimation] into one estimator. Code reduction is possible.

Function Transformer → Convert any function to a transformer. Because the argument of Pipeline needs to be a transformer. The minimum requirement for any function is that fit and transform exist.

Example of use

Usage example ①

In the example below, SVC is executed after PCA () is performed. Preprocessing and classification can be executed in a series of operations.

Reference site for the example below

qiita.rb


from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn import datasets

#Preparation of sample data
iris = datasets.load_iris()
X, y = iris.data, iris.target

#Creating a pipeline
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(steps=estimators)

#Learning
pipe.fit(X, y)

#Forecast
pipe.predict(X)

Usage example (2) Usage example of Mercari competition code

Partial excerpt from Mercari Competition 1st Code

qiita.rb


from sklearn.pipeline import make_pipeline, make_union, Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer as Tfidf

def on_field(f: str, *vec) -> Pipeline:
    return make_pipeline(FunctionTransformer(itemgetter(f), validate=False), *vec)

 vectorizer = make_union(
        on_field('name', Tfidf(max_features=100000, token_pattern='\w+')),
        on_field('text', Tfidf(max_features=100000, token_pattern='\w+', ngram_range=(1, 2))),
        on_field(['shipping', 'item_condition_id'],
                 FunctionTransformer(to_records, validate=False), DictVectorizer()),
        n_jobs=4)

I'm pipelined instances of itemgetter and Tfidf with make_pipeline. I am creating my own converter by converting itemxetter to a transformer with FunctionTransformer. This makes it possible to identify important character strings in itemgetter (extracting character strings) in a series of steps. Click here for item getter

Recommended Posts

Code reduction-Pipeline and Function Transformer-
Quicksort details and code examples
Adam Paper Summary and Code
About fork () function and execve () function
Introduction and implementation of activation function
Function pointer and objdump ~ C language ~
[Code] Module and Python version output
Function synthesis and application in Python
[Python] Difference between function and method
[Python] Function arguments * (star) and ** (double star)