https://prefect.io logo
#prefect-community
Title
# prefect-community
m

Matt Alhonte

05/03/2022, 9:35 PM
Hrm, looks like the Client is dying shortly after launching the Cluster? It runs a few tasks, then right before mapping a big-ish one, the Client dies (cloudwatch logs just say
Killed
) and the mapping stays in a
mapped
state
a

Anna Geller

05/03/2022, 9:40 PM
Can you share a bit more information about your use case?
1. Are you on Prefect Cloud or Server? 2. Can you share the output of
prefect diagnostics
? 3. I assume you mean a Dask Cloud provider Fargate cluster given you mentioned CloudWatch? 4. Why are you using Client directly - do you use it to make some API calls to Prefect backend before doing mapping? Can you share some flow code?
m

Matt Alhonte

05/03/2022, 9:44 PM
By client I just mean the container that launches the Dask cluster. As in like, what gets launched by
ECSRun
. I'm on Cloud.
prefect diagnostics
output:
Copy code
{
  "config_overrides": {},
  "env_vars": [],
  "system_information": {
    "platform": "Linux-4.14.193-149.317.amzn2.x86_64-x86_64-with-glibc2.10",
    "prefect_backend": "cloud",
    "prefect_version": "1.2.0",
    "python_version": "3.8.8"
  }
}
a

Anna Geller

05/03/2022, 9:49 PM
Can you share the flow code that gives you the behavior that doesn't match your expectations? when ECS Task ends with a red message and says something about exit - all containers finished, it may look like an error message, but it's ECS's way of saying that all work finished
m

Matt Alhonte

05/03/2022, 10:16 PM
@Anna Geller It doesn't fail, it just stays in a
mapped
or
pending
state. And I see in the logs for the Scheduler that it says
distributed.scheduler - INFO - Close client connection: Client-67d0a242-cb2b-11ec-8013-06a62f0d58e7
a

Anna Geller

05/03/2022, 10:18 PM
This looks like Dask client log. which logs do you see in the Prefect Cloud UI? Could you share the flow run ID?
k

Kevin Kho

05/03/2022, 10:19 PM
How big is big ish? It looks like the scheduler is dying? Do you have access to the Dask dashboard?
m

Matt Alhonte

05/03/2022, 10:20 PM
ecd087e5-821c-4f0c-a5b7-b6ea708acfbb
Not sure how big! Just canceled my latest try, but I'll look at the Dask Dashboard when I try my next one (I think I should have access)
a

Anna Geller

05/03/2022, 10:39 PM
My understanding is that the dask cluster couldn't be created due to some misconfiguration (perhaps missing IAM roles?) and therefore task runs couldn't be submitted to the Dask executor. Given that the flow run had no submitted or running task runs due to this Dask-cluster-creation issue, Lazarus rescheduled the flow run and resubmitted everything for execution
m

Matt Alhonte

05/03/2022, 10:40 PM
Interesting!
It worked okay on a single node with
LocalDaskExecutor
?
a

Anna Geller

05/03/2022, 10:41 PM
is it strictly necessary for you to use Dask cloud provider? perhaps you could try running your flow first with LocalDaskExecutor? Otherwise, can you cross-check the Dask cluster config and share with us if you can't find the issue in the cluster configuration on your own?
yup exactly
k

Kevin Kho

05/03/2022, 10:52 PM
local dask executor is dask alone though and DaskExecutor specifically uses
distributed
so itโ€™s not a good test
m

Matt Alhonte

05/03/2022, 11:17 PM
Huzzah! So, there was 20mb NumPy array that I thought would be fine to pass to the tasks, but I just spilled it to a
.npz
file and now it all works fine!
๐Ÿš€ 1
k

Kevin Kho

05/04/2022, 12:11 AM
20mb using a scatter or future?
m

Matt Alhonte

05/04/2022, 12:13 AM
passed as part of a dictionary with some other stuff
๐Ÿ‘ 1
4 Views