Looking for some help debugging a failed Flow in P...
# ask-community
s
Looking for some help debugging a failed Flow in Prefect Cloud. I have a Flow that maps over ~1500 items, then combines the results to dump into a DB. On each of those items, I have a retry set and a result-handler pushing the result object to S3. The location where the Flow was executed had a network issue, were it couldn't write to S3 for a brief period of time that appears to coincide with when the mapped Task was in the middle of executing. On the main landing page for the Flow it highlights the most recent Error messages, where we can click to navigate to see more details. In this case I was investigating the
inquisitive-wildcat
Flow Run. And the downstream Tasks all indicate they had a failure. However, when I navigate to that Flow Run and look at the mapped Task, I cannot locate the exact mapped instance where these failures happened. It seems the only way to jump to that specific error message is to navigate from the main Flow page? 1. Shouldn't the failed mapped Task show up as "FAILED" in the UI? 2. Is the Task considered "FAILED" if there was a problem in the Result Handler? On the immediate next Task that consumes the output of the mapped Task, it seems Prefect sent a
None
object, which then caused an exception and finally failed the Flow Run. 3. Why is Prefect sending a
None
to a downstream Task of a mapped Task output that had a failure?
j
Hey Scott, I had a similar issue with a high volume mapping task. Which execution approach are you using?
s
This particular instance was using
KubernetesJobEnvironment
with a
LocalDaskExecutor
.
j
Are you running Kubernetes is AWS, Azure, GCP
s
the K8 cluster this particular job was running on was an on-prem instance in a laboratory (where the network is notorious for being unstable) .. so the fact it had a network problem is not surprising. I'm more interested in how we can easily identify when it happens in Prefect Cloud, and how to catch issues with the Result Handler failures
j
What is happening most likely, based on my experience, is that the network is stopping metadata being sent to Prefect Cloud hence why you have no data. The way I fixed the problem is a refactored my code base to reduce the number a items to map over. Reduces the amount of network traffic. Also I have not used the
KubernetesJobEnvironment
but I have used the
DaskKubernetesEnvironment
and I found it unstable for intense data processing and switch to a
RemoteEnvironment
within Kubernetes
s
I'll give that a shot, thanks!
Seems I need to rethink this ..
RemoteEnvironment
doesn't let you specify a K8
job_spec.yaml
file. It'd be wonderful if
KubernetesJobEnvironment
would inherit from
RemoteEnvironment
rather than
Environment
. Guess I'll hack around with that later today.
l
@Marvin archive “Observations regarding high volume mapped pipelines returning None on failure”
l
^ sorry for the necro post, but archiving this convo and opening an issue about this one as I heard about it elsewhere too, interested if you have any more findings!