hi guys. I have a question regarding management of...
# ask-community
l
hi guys. I have a question regarding management of Prefect flow schedulers via Prefect API. We decided to use Prefect to handle uploading user data into our db. we support multiple different ways to upload the data (each of them is a separate Prefect Flow). In our system, we allow user to schedule a periodic job that downloads user data from external source and uploads them into our database. User should be able to specify different run periodicity for different datasets (and modify this scheduling interval). Since schedulers are stored inside a JSONB column attached to flow group without a specific schedule_id, it is very hard to update a specific schedule (unless you use one Flow parameter as a unique identifier - this feels a bit hacky). Also removing a schedule from the flow group is slightly uncomfortable (Prefect UI simply sends the whole schedule array without the deleted schedule back to graphql). At the moment, we are exploring the possibility of registering Prefect flows per dataset (using dataset ID in flow name) so we can reliably control the scheduler. This should give us better control over the scheduler, but we also get more granular view of the flows (which we dont really need or want). Is there a better way of approaching this particular problem? Any suggestions are welcomed.
k
Hi @Lukáš Polák! I don’t think I’m understanding 100%. Is the general case that you have a flow responsible for multiple datasets and you want them ran at different schedules? I am thinking using a combination of Clocks and Parameters can get this done since you can attach Parameters to a Clock. You can set the clocks differently for the datasets (after typing this, it seems like that’s what you said feels hacky)
l
hi @Kevin Kho. you understood it correctly. yep. I would like to have one Flow (i.e. "Process URL upload") and use it to upload data into multiple datasets (among other Parameters, there is dataset_id and url_link). For each combination of dataset_id & url_link, I want to be able to set a different schedule (Clock).
You can set the clocks differently for the datasets (after typing this, it seems like that’s what you said feels hacky)
yep. If the Clock had some kind of ID, then updating it would be easy. Otherwise, I have to load all schedules for the flow and look for the correct combination of parameters (that's what you were suggesting, am I right?). Moreover, there is still possibility of not being able to remove the Clock, if you always remove it by sending the newly update array without that particular Clock (race condition when 2 requests try to remove a different Clock attached to the same Flow may not have a deterministic and desired outcome). Or am I missing something?
k
Oh wow I see. I was thinking of just one registration, not updating the clock. How often do you need to update schedules?
l
Not very often. It should be something that users can update via our UI. At the moment,we don't expect that scripts should be updating that.
k
Yeah this is tricky, I think re-registration is the best bet honestly, but I think you really have explored all possible options. I don’t think I have any ideas to add.
l
not a problem. I ended up using project per dataset since that is easier to setup on our side. Thanks for the help anyway
👍 1