Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

image.png

Hi, quick question about task slugs. Context: I'm executing my flow and it fails. I have slugs of the tasks that failed (from _flow.slugs[&lt;task&gt;]_) and based on those I'd like to retrieve a task so that I can investigate it's dependencies using flow.upstream_tasks()

However when I look at flow.tasks, for each task the slug is null and flow.get_tasks(slug=X) doesn't work.

I could do this by reverse-mapping  flows.slugs which seems to be populated, but it feels like a hack, so I wanted to ensure I'm not missing a simpler way first.

Is that expected behaviour or I am missing something?

Hey <@U01BVQ9FGLE>, I don’t know this immediately. I’ll look into it for you.

Are you trying to do this in a state handler? Or trying to do this in another script without the Flow defnition?

I'm running Prefect in "local" mode, in my JupyterLab

So I have full access to the flow definition, which I later run to get the state

I've got some strange parts in my setup, but I'll try to create a minimal example and see if it also behaves like that

Oh so you don’t register and run it against Cloud or Server? Asking because we have a mechanism that lets you retrieve this information by querying the GraphQL API.

No, I'm doing this locally - it's a quasi-ML pipeline so I want to have full access to task results in the notebook for analysis

Looks like it suffers from the same issue when running a basic pipeline (that fails):

<https://pastebin.com/bb7cNHe7>

small correction, in previous example for some reason state _s_ ended up being NoneType, now after re-running it's _Failed_

So the task slugs are populated when the flow is serialized (for registration to the backend). When running on local, this won’t be populated. The suggestion is to try using the `flow.slugs` dictionary instead of the `task.slug`

Gotcha, thanks!
It also means that Flow.get_tasks(slug=X) is going to be broken on local, which you may or may not consider to be a bug (depending on your point of view). But it was definitely unexpected (for me at least)

This is on the radar for sure (and there are some other differences between running with a backend). No promises though when it would be changed. We don’t have a lot of users purely running on local. Would love to learn more about your use case. Also, just wanna be sure you’re aware Cloud is free to get started with.

That's my goal, to make sure you're aware :slightly_smiling_face:

As for the usecase, for me the easy access to task results when using flow.run() is a huge benefit of the local execution. It's big enough that I'm actually running using a backend, saving task results, copying them back to the local folder and then running the same pipeline _in local mode_ - it just loads saved task results.
This allows me to quickly retrieve results by tags, or by slug or by task name - makes analysis that much easier. Almost like in Metaflow
<https://docs.metaflow.org/metaflow/tagging>

If there is a better way to do it I'd be happy to hear about it