Hi all. I'm using run_deployment on existing deplo...
# ask-community
a
Hi all. I'm using run_deployment on existing deployments (Prefect 2.7.1) but I notice a 5 to 15 sec delay in the start of the flow run. I dont use scheduled runs so it should start right away. Should I look for parameters or config to remove the delay or is it something enviroment related?
1
forgot to mention that the flow runs appear in the UI as 'scheduled' before they are executed.
j
Hi Anco, The default is for the agent to poll the work queue every ten seconds, but that can be overridden if you want a shorter polling interval. See the docs section on settings. Here’s the one you want:
PREFECT_AGENT_QUERY_INTERVAL='10.0'
If you are spinning up infrastructure it could take some time. If you are communicating with anything external, there could be some latency from your network. If you want to schedule your flow runs you can now prefetch to get things started ahead of time.
a
Thanks Jeff. I did play around with
PREFECT_AGENT_QUERY_INTERVAL='10.0'
a bit and set it to '0.1' but the delay is still there. I'm using DaskRunner for execution, but its connecting to an already running cluster. The logging output of the distributed.core starts at the soones 7 seconds after I trigger the run_deployment. Any ideas how to shorten this time?
j
I’m not sure. If you can prefetch with a schedule that might be able to get things moving sooner, but I get that might not work for your use case.
a
You're right about the prefetch. It also make no difference. I did see a an improvement if I invoke run_deployment once or even 3 times in a loop. In that case, the first flow run started with a 3.5sec delay.
When I increase the number to 10, the delay also increases
@Jeff Hale I have attached the agent logging. This is the output of loop of 3 run_deployment calls. The first 9 lines appeared instantly, without the delay. After that I see some warnings. Could these warnings be the cause of the delay?
j
Hi Anco. Maybe. That looks like a new warning with 2.7.1. There’s now an open issue where you can add any information and follow along.
a
I downgraded to prefect 2.6.9 and the warning was gone. The delay was not 😟
j
Alright. Nice process of elimination! EDIT - sorry, I was confusing this with another issue.
a
My setup was all local, no kubernetes or docker involved
r
Hi Anco! It sounds like this just might be a result of the time needed to prepare everything to run a flow. When you call
run_deployment
, quite a few things happen: • A call gets set to the Prefect API to create a new flow run. • The flow run gets picked up by an agent. • The agent opens a new subprocess and runs the flow's command (usually
python -m prefect.engine
) • The new process has to start Python, when loads, parse, and initialize
prefect
and all its dependencies. • If using remote storage, Prefect downloads all the flow code from storage. Timing can vary depending on where the code is stored and how many files get get downloaded. If the flow code is on the local filesystem, it gets copied (along with any other code in its directory or subdirectories) into a temporary subdirectory. • The flow starts, and has to connect to the already-running Dask cluster; usually quick, but can add a bit of a delay. 5 to 15 seconds sounds is within the normal range of what I'd expect in this scenario. If you want the flows to run in-process immediately, calling the flow functions directly instead of using
run_deployment
will accomplish that, though this won't help when you need to run separate deployments with different settings or task runners.
🙌 1
a
Thanks Ryan. We have already 3 or so years of experience with prefect and currently running prefect 1 with dask.distributed. (and we are quite happy). We run on our production environment 200K flow executions daily with 15 flows. The reason I'm interested in deployments and run_reployment() is because it allows me to efficiently 're-run' flows only by changing the parameter values. I had great hope that this will minimize data transfers reduce the ever growing data storage of the dask workers (we restart all workers once every 4 hours because of this). Also the concurrency of the work-queue seems a very interesting feature for controlling large peak loads.
I will try a setup on some bigger machines because it seems that there is no deliberate delay or waiting built in, so perhaps there is another second to gain. But if I'm not able to reduce the start time of the flowrun to less then 1 second, it seems we are stuck with the old setup for a bit more.
r
I'll check with a few of my colleagues to see if we can suggest some deployment/run patterns that will help you meet all your objectives. 🙂
🙌 1
a
That would be great. Thanks!
a
Hey! Any news on this? is it possible to have deployments without 10-15 seconds delay (of downloading the code)? @Ryan Peden