Issues when using DaskTaskRunner at scale - running out of available connections when connecting to ...
j

Jean

almost 3 years ago
Hey guys, I’m running a flow with a DaskTaskRunner that spawns a task that takes around 20seconds and apparently it’s running sequentially?! My machine has 16 threads and I see in the UI each task only being run after another one finishes. Any inputs?
23:04:54.057 | INFO    | prefect.engine - Created flow run 'optimal-snail' for flow 'test1'
23:04:54.057 | INFO    | prefect.task_runner.dask - Creating a new Dask cluster with `distributed.deploy.local.LocalCluster`
23:04:55.980 | INFO    | prefect.task_runner.dask - The Dask dashboard is available at <http://127.0.0.1:8787/status>
23:05:00.371 | INFO    | Flow run 'optimal-snail' - Created task run 'Execute values of the query-8165e3c8-0' for task 'Execute values of the query'
23:05:00.372 | INFO    | Flow run 'optimal-snail' - Executing 'Execute values of the query-8165e3c8-0' immediately...
23:05:24.612 | INFO    | Task run 'Execute values of the query-8165e3c8-0' - Finished in state Completed()
23:05:26.370 | INFO    | Flow run 'optimal-snail' - Created task run 'Execute values of the query-8165e3c8-1' for task 'Execute values of the query'
23:05:26.370 | INFO    | Flow run 'optimal-snail' - Executing 'Execute values of the query-8165e3c8-1' immediately...
23:05:42.533 | INFO    | Task run 'Execute values of the query-8165e3c8-1' - Finished in state Completed()
23:05:44.295 | INFO    | Flow run 'optimal-snail' - Created task run 'Execute values of the query-8165e3c8-2' for task 'Execute values of the query'
23:05:44.296 | INFO    | Flow run 'optimal-snail' - Executing 'Execute values of the query-8165e3c8-2' immediately...
23:06:09.538 | INFO    | Task run 'Execute values of the query-8165e3c8-2' - Finished in state Completed()
As you see in these logs it’s not really spawning more tasks. I was under the impression that the call to a function with the
@task
decorator would be non-blocking if made within a flow
@flow
which uses DaskTaskRunner
1
Hi all, after successfully utilizing an ecs:push workpool with my own provisioned AWS infrastructure...
m

mira

over 1 year ago
Hi all, after successfully utilizing an ecs:push workpool with my own provisioned AWS infrastructure to deploy my flow to and run it on the ecs cluster, I was curious about the new feature to let prefect provision the infrastructure 😃 Unfortunately, it doesn't work as expected. I created the ecs: push work pool as suggested (https://github.com/PrefectHQ/prefect/pull/11267) and after that I deployed my flow to it as usually, but without job-variables (since I don't have to ingest the AWS infrastructure and role arns now). I want to push the flow image to my ecr repo, so I give this with the name of the image to my Deployment like this:
flow.deploy(
    ...
    image=DeploymentImage(
        name=os.getenv("ECR_REPO_URL", ""),
        tag=os.getenv("IMAGE_TAG"),
        dockerfile=cfd / "Dockerfile",
    ),
    ...
)
But then I get the error: Flow run could not be submitted to infrastructure: An error occurred (ClientException) when calling the RegisterTaskDefinition operation: Fargate requires task definition to have execution role ARN to support ECR images. Why do I still have to provision the deployment with an execution role, shouldn't it (or the work pool) create one? Or is it because it is a my own ECR Repo? Where do you usually push / save the flow image to run it on the ecs cluster (especially in the frame of the ecs:push work pool with infra provisioning)? Thank you and best regards!
🤔 1
1