When converting various data, for example, if conversion process A depends on conversion process B and conversion process B depends on conversion process C, the dependency is checked and processing is executed in order from conversion process C. A handy library. It is effective in dealing with an error that occurs in the middle of a series of processes and skipping the already executed part when re-executing. The part that has no dependency will be processed in parallel.
For more information, see Building a data pipeline with Python and Luigi.
If the number of parallels is 2 or more
python
PicklingError: Can't pickle <function update_tracking_url at 0x0000000001E100B8>: it's not found as luigi.worker.update_tracking_url
It will be an error like this. It seems that the compatibility between multiprocessing and pickle is bad only for windows. I'm not sure.
Reduce the luigi version to 1.2.1. If you don't specify a version in pip, the latest version 2.3.0 will be included, which is a trap.
Postscript (2016/8/27) The version provided by conda has been updated to 2.3.0. Also, it seems that there is only this version for windows, so there seems to be no choice but to install it with pip.
~~ For anaconda ~~ ~~conda install luigi~~
For pip
python
pip install luigi==1.2.1
Will contain version 1.2.1.
This solved it in my environment. However, in Pickle crashing when trying to pickle "update_tracking_url" in luigi.worker?, the person who said that it was solved by upgrading to version 2.0.1. There are also, so you may need to try which version is better.
If you're willing to tweak your package, it's available up to version 2.1.1. See Pickle crashing when trying to pickle "update_tracking_url" in luigi.worker? for edits.
Version 2 seems to be basically more sophisticated. In particular,
--Increased types of parameter data types that can be specified explicitly --The UI of the global scheduler has been cleaned up. --The display of the dependency graph available in the UI of the global scheduler no longer fails. --As far as I can tell, in version 1 I couldn't see if the task had parameters.
And so on.
Mario went to the Olympics, but I'm sorry, luigi.
Recommended Posts