Hi Team, we noticed a problem with our production ...
# ask-community
c
Hi Team, we noticed a problem with our production flow lately (Prefect 2.18.3 server) We have a machine (
A
) spun up that hosts the UI (server) and the associated process workers. We also have another machine (
B
) that hosts process workers. These process workers are backups for the server that was spun up for machine A. We did this to account for scalability and redundancy so when jobs are submitted to the work pool, workers from machine A and B can poll from it. When we run a deployment with 5 tasks, and it encounters an error on task 4. Sometimes it retries from task 4, but sometimes it retries from task 1. Upon closer inspection, it seems to be due to the fact that the retry worker is located on a different machine than the original worker that completed the first attempt. Is this expected? Is there a way to resolve this? Or, is there a change to the architecture to get the scalable result we desire? We need to have the flow restart from the point of failure whenever possible.
k
This is expected if you aren't storing task results in a remote location, like an S3 bucket. By default, task results are stored locally where the flow is running - in this case, the machine the process worker is on. Setting up remote result storage should resolve this! https://docs.prefect.io/3.0/develop/results#result-storage
c
Ah I see thanks Kevin. It must require a remote result storage location? I currently have a Postgres db on machine A. Is it possible to still use this postgres db even if it is architected like this?
k
it isn't, task results are stored as files, so they need some kind of filesystem for storage
upvote 1
c
That makes sense! In that case, is it also possible to alternatively find a location where both machines can access the file system?
Copy code
prefect config set PREFECT_LOCAL_STORAGE_PATH=\\common\network\path
k
yep!
c
Beautiful, thank you for the quick response! Ill give that shot
Hey @Kevin Grismore @Prefect Tried doing this with setting
PREFECT_LOCAL_STORAGE_PATH = \\network\path
and its still not working as expected. I created a dummy job to investigate this issue and the logs are returning as follows: not sure if im missing something? It looks like they are both persisting to the same file location
Hey @Prefect was wondering if you guys can take a look at this. From the docs, this seems like the right way to do this, but clearly its not recognizing the states.
Hey @Prefect wanted to see if there’s an update to this again, I filed a github issue https://github.com/PrefectHQ/prefect/issues/16059, but wanted to bring it here to see if there was any additional help. Thanks again
Hey @Prefect wanted to see if there’s an update to this again, I filed a github issue https://github.com/PrefectHQ/prefect/issues/16059, we've been struggling with finding a solution for a while, and pretty much exhausted every avenue we could think of... could anyone see if this is reproducable?
n
hi @Charles - the default cache policy considers the current run id, ie cache invalidates between runs https://github.com/PrefectHQ/prefect/discussions/15560
perhaps you want a cache policy like
cache_policy=INPUTS + TASK_SOURCE
?
c
Thanks @Nate - is this a PRefect 3.x thing? Im on 2.18.3
n
ahh sorry, yeah it is. i missed your version above
c
Is there an equivalent on 2.x i can try out?