Parameter tuning with luigi (2)

For me, who wrote only niche things, I felt that the reaction was great, so I tried a little more. It is a story.

Let's create a general-purpose task

Click here for this code. What I'm doing is the same as last time. https://github.com/keisuke-yanagisawa/study/blob/20151205/luigi/param_tuning.py

The theme this time is ** I want to create a general-purpose task **. I make various tasks, but I'm happy to make a general-purpose one everywhere. Therefore,

--Aggregate the results of parameter tuning --Only the parameters with the best values are output together in csv format.

I made a task that does that for general purposes.

class param_tuning(luigi.Task):
    tasks        = luigi.Parameter()              # luigi.One-dimensional array of Task
    text_format  = luigi.Parameter()              #Pass a python "variable name description" regular expression
    reduce_pivot = luigi.Parameter()              #Which variable to use for aggregation
    reduce_rule  = luigi.Parameter(default="min") #Specify the function to aggregate, min or max
    out_file     = luigi.Parameter()              #Output file name

    def requires(self):
        return self.tasks;
    def output(self):
        return luigi.LocalTarget(self.out_file)

    def run(self):

        # making pandas dataframe
        results = []
        for task in self.requires():
            with task.output().open() as taskfile:
                string = taskfile.read()
                groupdict = re.search(self.text_format, string).groupdict()
                results.append(groupdict)
        df = pd.DataFrame.from_dict(results);
        df[self.reduce_pivot] = convert2num(df[self.reduce_pivot])
        values = df[self.reduce_pivot]

        # Aggregation of parameter tuning results
        if self.reduce_rule == "min":
            best_val = min(values)
        elif self.reduce_rule == "max":
            best_val = max(values)
        else:
            print("reduce_rule must be min or max. your input is %s" % self.reduce_rule)
            exit(1);

        # Rearrangement of column order
        column_order = filter(lambda key: key != self.reduce_pivot, df.columns) + [self.reduce_pivot]
        df = df[column_order]

        # Outputting results as csv formatted data
        df[df[self.reduce_pivot] == best_val].to_csv(self.output().fn, index=False);

Coding was troublesome for various aggregation relationships, so I left it to pandas. It ’s very easy to do,

  1. Execute the calculation with each parameter with requires ()
  2. Aggregate all results into pandas dataframe
  3. Find the best value in the aggregate part
  4. (Edit so that the value used in the aggregation comes at the end of csv)
  5. Output only the one with the best value (in some cases, multiple)

It is a mechanism.

Input is a little confusing. I think you can basically understand tasks, reduce, and output, but for general purpose, it became awkward to make an interface that inserts regular expressions.

How to use

How to use the code itself posted on github

python param_tuning.py main_task --local-scheduler

I feel like it will work if you do something like that.

Regarding how to use this general-purpose task, we will prepare a separate "task for calculating with parameters" and "task for main function".

The regular expression you write in the main task is probably the biggest difficulty (I haven't used it too much) and I'll explain it. The calculation execution task task_param_eval this time outputs a one-line csv format file called cost, gamma, error, so specify it as follows.

s = "[-+]?\d*\.\d+|\d+" ## float or int expression
text_format  = "(?P<cost>"+s+"),(?P<gamma>"+s+"),(?P<error>"+s+")"

You can specify the name by using ? P <name> . This is used as the name of the header and pivot of the output csv, so please specify it properly.

At the end

I'm a person who is desperately bad at modularization for coding, but I think it's very good to be able to cut it "unavoidably" when such coercive force works. This time, I tried to create a general-purpose article, which is one of the benefits obtained by the modularization. ... I would appreciate your guidance and encouragement that the coding is not good in the first place.

Reference material

For how to find numbers with regular expressions, borrow from stack overflow below. http://stackoverflow.com/questions/4703390/how-to-extract-a-floating-number-from-a-string-in-python

Recommended Posts

Parameter tuning with luigi (2)
Parameter tuning with luigi
Light GBM parameter tuning
Parameter tuning with GridSearchCV / RandomizedSearchCV while using Voting Classifier
Tuning hyperparameters with LightGBM Tuner
Tuning Keras parameters with Keras Tuner
Parameter estimation with Kalman filter
Various Fine Tuning with Mobilenet v2
Controlling test reruns with Luigi + pytest
Data pipeline construction with Python and Luigi
Parameter optimization automation with Keras with GridSearch CV
Tuning hyperparameters with GridSearch using pipeline with keras
Smoother pipeline processing with Luigi! Introducing gokart
I tried CNN fine tuning with Resnet