Hi all,
been using prefect for a while now, and big fan of it. I have the following use case and am not sure if it's supported by prefect.
I have a flow that first downloads quite a large file and then processes it using a GPU.
I would like to set this up in a parallel cluster of some sort. However, I wonder how can I ensure that the file is downloaded and processed on the same node.
To my understanding, a task can be executed on any machine if it is ready to be processed and only waiting for the upstream tasks to finish. Meaning, it might happen that the file downloaded on one machine, the gpu processing is waiting but might start on another machine and then can obviously not find the file.
One workaround would be to make this one big task and request a gpu for it, but then if the download takes quite long the GPU could have been utilised elsewhere.
Are there any other options. Basically enforcing that two tasks are executed on any yet the same machine?
Thank you