Leo Meyerovich (Graphistry)

04/11/2020, 4:56 AM
We are close getting our initial orchestration pipeline ported 🙂 we're a bit confused on how to get long jobs running. tips appreciated. setup: -- server 'ui': running the ui container -- server 'gpu': running a prefect agent as well. registers with ui so it can pick up gpu jobs. -- server 'nb': jupyter notebooks we're using to submit jobs. has a local prefect agent installed that points to 'ui' so we can submit jobs. notebooks often die we can do quick one-offs fine. hurray! tricky case 1: long historic job we want to do a ~3 day job that processes 200 files, one at a time sequentially in sorted order. the problem is notebook server that runs the job will periodically stop, so we really want to submit a job like
seq([ task_1(file_1), task_2(file_2), ... task_n(file_n)])
. as soon as the meta-task is submitted, the notebook (and its local agent) can stop. however, for the next 3 days, we want those tasks to run one at a time, and we see status in the ui (incl. fails/retries). if we ever want to, we can rerun the flow to add/swap tasks.

Braun Reyes

04/11/2020, 2:54 PM
It possible to run these as 200 ephemeral notebook servers? Do they have to be sequential?

Leo Meyerovich (Graphistry)

04/11/2020, 4:07 PM
No, we need ~one notebook to kick off one chain of jobs