Hi!
What would be “the right way” in Prefect to do what Luigi refers to as ExternalTask (e.g. a file in S3 that gets created by some external process, and the rest of the pipeline can not proceed without it) ?
k
Kevin Kho
05/18/2021, 6:54 PM
Hey @Marko Mušnjak! Just a couple of questions. Are you expecting this file to start the Flow run? What do you want to happen if the file doesn’t appear? Do you think the writing of that file can be a Prefect task?
m
Marko Mušnjak
05/18/2021, 6:58 PM
the setup with luigi is that the pipeline is started, and retried some time later if the file is not available. There could be some time between the file appearing and the pipeline continuing (it’s not time-critical)
Marko Mušnjak
05/18/2021, 6:59 PM
For some cases we could consider migrating the entire flow to Prefect, but in other cases it’s more like “customer needs to upload file for hour X before we can continue”
k
Kevin Kho
05/18/2021, 6:59 PM
Gotcha so the Prefect approach would just be to check for the file existence, and if not available raise the SKIP signal and this will skip all downstream flows and treat them as successes
Kevin Kho
05/18/2021, 7:00 PM
You can also use the
StartFlowRun
to schedule another flow at the specified time if that file that does exist before you raise the SKIP
m
Marko Mušnjak
05/18/2021, 7:02 PM
let me check if I understood that correctly:
Run is scheduled at some time
we check for existence of file
If it’s not there, we start another flow run at some later time, and set status as SKIP
if it’s there, we continue with processing
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.