Hi everyone, I was hoping I would be able to get some help with a problem I have with Prefect. We h...
w
Hi everyone, I was hoping I would be able to get some help with a problem I have with Prefect. We have a custom built job management and scheduling system which we are replacing with Prefect. I have set up a work pool with ECS and am able to create and run deployments with the worker on Fargate. These jobs fetch data from an external API and drop it into S3. The deployment takes and API key and a start time and end time for requesting data within a time period for the api. Currently I am manually entering this start time and end time, but would like to use some scheduling so each day a flow run for the deployment would run with the parameters for start and end time set to current time - 1 day and current time respectively. I have looked into the deployment scheduling but I can’t see where you could pass in parameters with scheduling. What would you recommend we do to achieve a system that runs these jobs every day, and if a job fails I can rerun it with the same parameters it started with? Thanks
b
Hey William, based on what you've described, I think I have a proposal. You could have an orchestrator flow (or parent flow, if you will), which is scheduled to run on a regular basis (once a day if you'd like). Whenever this parent flow executes, it can dynamically generate the start time and end time for requesting data from the API (maybe using something like pendulum). Once it generates the start and end time, the orchestrator flow can then trigger an instance of a child flow run using run_deployment. This function can take a deployment name, parameters, and a scheduled run time as input. After the parent calls
run_deployment()
, the result would be a child flow run, set to run at the defined scheduled time, with the parameters that were passed in by the parent. The child flow run can then request data from the API, and then drop the data into S3.
👍 3
To achieve this, I'd think you'd need two flow scripts: 1) the parent that generates the start and end time and schedules the child flow run, and 2) the child flow that is responsible for executing the api request and loading the data to s3. You'll need a deployment for each as well. The deployment for the parent flow will have a schedule, so that it runs on a defined interval. The deployment for the child flow would not need a schedule, since its execution is governed by the parent.
👍 1
n
hey @William Fitzmaurice - here's an example that may be helpful!
blob attention gif 1
w
Wow thanks for the help!
catjam 1