Hi Team we noticed a problem with our production flow lately Prefect Community #ask-community

Hi Team, we noticed a problem with our production ...

Charles

11/12/2024, 8:30 PM

Hi Team, we noticed a problem with our production flow lately (Prefect 2.18.3 server) We have a machine (

) spun up that hosts the UI (server) and the associated process workers. We also have another machine (

) that hosts process workers. These process workers are backups for the server that was spun up for machine A. We did this to account for scalability and redundancy so when jobs are submitted to the work pool, workers from machine A and B can poll from it. When we run a deployment with 5 tasks, and it encounters an error on task 4. Sometimes it retries from task 4, but sometimes it retries from task 1. Upon closer inspection, it seems to be due to the fact that the retry worker is located on a different machine than the original worker that completed the first attempt. Is this expected? Is there a way to resolve this? Or, is there a change to the architecture to get the scalable result we desire? We need to have the flow restart from the point of failure whenever possible.

Kevin Grismore

11/12/2024, 8:34 PM

This is expected if you aren't storing task results in a remote location, like an S3 bucket. By default, task results are stored locally where the flow is running - in this case, the machine the process worker is on. Setting up remote result storage should resolve this! https://docs.prefect.io/3.0/develop/results#result-storage

Charles

11/12/2024, 8:38 PM

Ah I see thanks Kevin. It must require a remote result storage location? I currently have a Postgres db on machine A. Is it possible to still use this postgres db even if it is architected like this?

Kevin Grismore

11/12/2024, 8:41 PM

it isn't, task results are stored as files, so they need some kind of filesystem for storage

upvote 1

Charles

11/12/2024, 8:45 PM

That makes sense! In that case, is it also possible to alternatively find a location where both machines can access the file system?

Copy code

prefect config set PREFECT_LOCAL_STORAGE_PATH=\\common\network\path

Kevin Grismore

11/12/2024, 8:46 PM

yep!

Charles

11/12/2024, 8:46 PM

Beautiful, thank you for the quick response! Ill give that shot

Charles

11/13/2024, 4:40 AM

Hey @Kevin Grismore @Prefect Tried doing this with setting

PREFECT_LOCAL_STORAGE_PATH = \\network\path

and its still not working as expected. I created a dummy job to investigate this issue and the logs are returning as follows: not sure if im missing something? It looks like they are both persisting to the same file location

Charles

11/14/2024, 8:23 PM

Hey @Prefect was wondering if you guys can take a look at this. From the docs, this seems like the right way to do this, but clearly its not recognizing the states.

Charles

11/20/2024, 4:02 PM

Hey @Prefect wanted to see if there’s an update to this again, I filed a github issue https://github.com/PrefectHQ/prefect/issues/16059, but wanted to bring it here to see if there was any additional help. Thanks again

Charles

11/21/2024, 9:57 PM

Hey @Prefect wanted to see if there’s an update to this again, I filed a github issue https://github.com/PrefectHQ/prefect/issues/16059, we've been struggling with finding a solution for a while, and pretty much exhausted every avenue we could think of... could anyone see if this is reproducable?

Nate

11/21/2024, 9:59 PM

hi @Charles - the default cache policy considers the current run id, ie cache invalidates between runs https://github.com/PrefectHQ/prefect/discussions/15560

Nate

11/21/2024, 9:59 PM

perhaps you want a cache policy like

cache_policy=INPUTS + TASK_SOURCE

Charles

11/21/2024, 10:04 PM

Thanks @Nate - is this a PRefect 3.x thing? Im on 2.18.3

Nate

11/21/2024, 10:04 PM

ahh sorry, yeah it is. i missed your version above

Charles

11/21/2024, 10:13 PM

Is there an equivalent on 2.x i can try out?

5 Views

Open in Slack

Previous Next