Hi community,
Within one of my flow, data is sent over and stored in a postgres database. I noticed than when run within a prefect flow the task is 20 times slower than it usually is. I use docker storage and docker agents. When I run the comparison without prefect I used the exact same docker image.
I was looking at the documentation but I couldn't figure out if throttling is used at some point by prefect to limit the amount of data sent or stored maybe?
k
Kevin Kho
08/16/2021, 2:00 PM
Hey @Marie,when you use the image, is that with Prefect also like
flow.run()
? Are you setting limits on the container for cpu and mem? Is the Flow memory intensive?
m
Marie
08/16/2021, 2:26 PM
Hey @Kevin Kho! When I use the image, I run the comparison without
flow_run()
. I'm not setting any limit on cpu or memory for the container (I was curious to see if there was any prefect default limit but I didn't find any?). No the flow is not really memory intensive
k
Kevin Kho
08/16/2021, 2:32 PM
So if the flow was very fast before, then 20x longer might be possible. Prefect will have some overhead, but this is constant rather than being a factor. Prefect tracks the state of each of your task and each state change is a request to our API. That’s how the observability is provided. For one task, it takes 3 API requests so that is where the overhead can come from.
Kevin Kho
08/16/2021, 2:33 PM
The Docker limits would be on your Docker. Prefect doesn’t set limits by default.
m
Marie
08/16/2021, 2:55 PM
It was not that fast before, around ~1min before, now ~20min. This is for one task within a flow
Marie
08/16/2021, 2:56 PM
Yes, that makes sense but I'm guessing prefect overhead is in the order of magnitude of ~seconds?
Marie
08/16/2021, 2:58 PM
I'll need to deepdive into memory usage to see if there isn't anything weird going on here
Marie
08/16/2021, 2:59 PM
But I can't really make sense of why it's faster when run as a standalone versus packaged in a prefect task
k
Kevin Kho
08/16/2021, 3:00 PM
Yeah 1 min to 20 mins sounds drastic. Not seeing anything immediate. Could you try just using
task.run()
and timing that? Cuz that would just be running the Python function underneath. Is this one task without mapping?
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.