Hi there! I'm wondering what is the best way to use Dask primitives, such as semaphore to orchestrate things as the API rate limits. I'm having a restriction on 5 API calls being executed at a single moment. Hence, I'm using a semaphore shared among tasks that acquire it before an API call. This semaphore is created in the starting task and then is passed to the downstream tasks for use. My Dask cluster has 8 workers / 8 threads if that's relevant.
Is it good practice to use these https://docs.dask.org/en/latest/futures.html within my Prefect tasks?
k
Kevin Kho
07/01/2021, 2:28 PM
Hi @Dmitry Lyfar, for stuff like limiting database calls, we have task concurrency limiting in Prefect cloud that work across Flows. I assume you’re not on Cloud though. I am not 100% sure on using the semaphore, but there are two ways to go about this. One is that you can just use the Resource Manager to write Dask specific code and run it on the cluster. Also this might be relevant. The second one is by using the Prefect map. I expect the problem here will be that the sempahore might not be serializable, and task inputs and outputs should be. If it is though, it might work with the Prefect map.
You can use dask worker resources to limit the number of tasks running with a given tag. Check the DaskExecutor docs
d
Dmitry Lyfar
07/01/2021, 10:16 PM
@Kevin Kho thank you! Will have a look and give it a go. You are correct I'm not on the cloud.