Metaflow Metaflow | https://metaflow.org/ スクリーンショット 2020-06-16 11.32.58.png

Pipeline implementation

`train.py`



from metaflow import FlowSpec, step, Parameter

class TrainingPipeline(FlowSpec):
    param_config_str = Parameter('config',
                             help='Training config json str.',
                             default='{}')

    @step
    def start(self):
        self.config = json.loads(self.param_config_str)
        self.a = 0
        self.next(self.step1)

    @step
    def step1(self):
        self.a = 1
        self.next(self.step2)

    @step
    def step2(self):
        self.a = 2
        self.next(self.end)

   @step
    def end(self):
        pass

if __name__ == '__main__':
    TrainingPipeline()

Run

python train.py

debug

When you run it, you will see that a ".metaflow" directory has been created in the run folder. Prepare the following script in the hierarchy where the .metaflow directory is located.

`debug.py`


from metaflow import Flow, namespace, Step

namespace(None)
data_start = Step('TrainingPipeline/[RUN_ID]/start').task.data
print('Step start : a -> {}'.format(data_start.a))

data1 = Step('TrainingPipeline/[RUN_ID]/step1').task.data
print('Step step1 : a -> {}'.format(data1.a))

data2 = Step('TrainingPipeline/[RUN_ID]/step2').task.data
print('Step step2 : a -> {}'.format(data2.a))

Run

python debug.py

result

Step start : a -> 0
Step step1 : a -> 1
Step step2 : a -> 2

By the way, you can also save DataFrame etc. properly.

Debugging pipelines with metaflow

Pipeline implementation

train.py

Run

debug

debug.py

Run

result

`train.py`

`debug.py`