https://prefect.io logo
Title
j

Julien Allard

08/20/2020, 1:51 PM
Hello! I'm having some trouble with running a flow on a single node dask cluster on Kubernetes. When I run the flow locally, the flow works as expected, but when I run it on Kubernetes , I get the following error:
Unexpected error: TypeError("Cannot map over unsubscriptable object of type <class 'NoneType'>: None...")
. The problems seems to come from a mapped task that outputs a pandas dataframe. Anyone has any ideas on how to fix this problem or how to debug it further?
d

Dylan

08/20/2020, 2:05 PM
Hi @Julien Allard Welcome! I don’t think we explicitly support mapping over pandas
DataFrame
s directly. If it works locally, that’s great. But, I think you’ll need to convert your
DataFrame
to a
List
. But let me double-check with the team!
j

Julien Allard

08/20/2020, 3:04 PM
Oops sorry my message might have been confusing, I'm actually mapping over a list of dataframes and not directly a dataframe
d

Dylan

08/20/2020, 3:25 PM
Ahh understood
@Julien Allard Do you have a result configured for your flow? https://docs.prefect.io/core/concepts/results.html#results
j

Julien Allard

08/20/2020, 3:26 PM
No I don't have any results configured
d

Dylan

08/20/2020, 3:28 PM
I’d suggest starting there. Even though you’re running on a single node, it’s best not to assume any reliable infrastructure in the cloud. Having a
GCS
or
S3
(depending on your cloud provider or choice`) result should guarantee that upstream results are passed and read correctly and that your flow can retry from failure
j

Julien Allard

08/20/2020, 3:31 PM
Sounds like a good idea. I have also noticed in the logs that the flow seems to start multiple times. I see the log
Beginning Flow run for 'Flow Example'
multiple times. Could that be related?
d

Dylan

08/20/2020, 3:32 PM
What environment do you have configured?
j

Julien Allard

08/20/2020, 3:33 PM
I have a
DaskKubernetesEnvironment
d

Dylan

08/20/2020, 3:33 PM
Hmm
I believe there’s a way to increase the logging for that environment
Dask likes to re-run tasks if it thinks it’s necessary
Try with the result and see if that helps
j

Julien Allard

08/20/2020, 3:35 PM
would that be the flag
scheduler_logs
?
d

Dylan

08/20/2020, 3:35 PM
If it doesn’t solve your issue ping me here and we can take some further debugging steps
Yes, that’s one of them 👍
j

Julien Allard

08/20/2020, 3:36 PM
Alright, I will try the result solution. Thanks for your help!
d

Dylan

08/20/2020, 3:38 PM
Anytime!