Charles
11/12/2024, 8:30 PMA
) spun up that hosts the UI (server) and the associated process workers.
We also have another machine (B
) that hosts process workers. These process workers are backups for the server that was spun up for machine A. We did this to account for scalability and redundancy so when jobs are submitted to the work pool, workers from machine A and B can poll from it.
When we run a deployment with 5 tasks, and it encounters an error on task 4. Sometimes it retries from task 4, but sometimes it retries from task 1. Upon closer inspection, it seems to be due to the fact that the retry worker is located on a different machine than the original worker that completed the first attempt. Is this expected? Is there a way to resolve this? Or, is there a change to the architecture to get the scalable result we desire? We need to have the flow restart from the point of failure whenever possible.Kevin Grismore
11/12/2024, 8:34 PMCharles
11/12/2024, 8:38 PMKevin Grismore
11/12/2024, 8:41 PMCharles
11/12/2024, 8:45 PMprefect config set PREFECT_LOCAL_STORAGE_PATH=\\common\network\path
Kevin Grismore
11/12/2024, 8:46 PMCharles
11/12/2024, 8:46 PMCharles
11/13/2024, 4:40 AMPREFECT_LOCAL_STORAGE_PATH = \\network\path
and its still not working as expected. I created a dummy job to investigate this issue and the logs are returning as follows:
not sure if im missing something? It looks like they are both persisting to the same file locationCharles
11/14/2024, 8:23 PMCharles
11/20/2024, 4:02 PMCharles
11/21/2024, 9:57 PMNate
11/21/2024, 9:59 PMNate
11/21/2024, 9:59 PMcache_policy=INPUTS + TASK_SOURCE
?Charles
11/21/2024, 10:04 PMNate
11/21/2024, 10:04 PMCharles
11/21/2024, 10:13 PM