Nelson
09/10/2020, 11:00 AMLocalExecutor
is not capable of parallelism” so I presume that is the issue https://docs.prefect.io/api/latest/engine/executors.html#executor but the other executors require a separate environment in Dask. Someone mentioned a “Fargate Executor” in this Slack but I assume that’s an error and this doesn’t exist from what I can find.
I see others using the Fargate Agent and the FargateTaskEnvironment so I wonder if that’s related. The Fargate Agent “deploy[s] flows as Tasks using AWS Fargate”. And “FargateTaskEnvironment is an environment which deploys your flow as a Fargate task”, so I don’t understand the use of it with the Fargate Agent? Also in your docs: “we recommend all users deploy their Flow using the LocalEnvironment configured with the appropriate choice of executor”.
How do we get tasks to run in parallel inside a Flow run in Fargate without Dask?
Also, @Marvin 🙂emre
09/10/2020, 11:46 AMFargateTaskEnvironment
and DaskExecutor
are compatible. Prefect will spin up a fargate task with your flow in it, and will create a dask cluster fully within your fargate task.
From LocalEnvironment
docs:
from prefect import Flow
from prefect.environments import LocalEnvironment
from prefect.engine.executors import DaskExecutor
flow = Flow(
"Dask Executor Example",
environment=LocalEnvironment(executor=DaskExecutor())
)
I believe you can safely specify an executor for FargateTaskEnvironment
as well.Nelson
09/10/2020, 11:54 AMemre
09/10/2020, 12:00 PMflow.register()
, or where you are setting up the FargateTaskEnvironment
?
Because simply changing the environments executor parameter to DaskExecutor()
should parallelize the tasks running inside your Fargate instance.Nelson
09/10/2020, 12:13 PMFargateTaskEnvironment
as I’m not sure what it brings [in addition to using the Fargate Agent], that was one question. But the main goal was to run more than a task at a time, but if this is only possible via Dask (unless we implement our own executors I guess?), that answers it!
The flow snippet is just:
with Flow("Test") as flow:
...
flow.storage = S3(bucket="...")
flow.environment = LocalEnvironment(metadata={"image": "..."})
register_params = {}
register_params["project_name"] = environment
register_params["labels"] = [environment]
flow.register(**register_params)
emre
09/10/2020, 1:48 PMexecutor=DaskExecutor()
to your LocalEnvironment
should let you seamlessly parallelize your flow, all within the fargate instance.
You are right, you will need your own executor if you wish to avoid dask. I believe that is a pretty giant thing to try though.Dylan
09/10/2020, 1:52 PMNelson
09/10/2020, 1:54 PM