Hey team, I have two pipelines that I'm running us...
# prefect-community
j
Hey team, I have two pipelines that I'm running using the Cloud for orchestration. More often than I'd like I wake up the next morning to find that one of them has failed with
Unexpected error while running flow: KeyError('Task slug foo-3 not found in the current Flow; this is usually caused by changing the Flow without reregistering it with the Prefect API.')
It's always the same task for both pipelines - I've split out a common task to load data to BigQuery into another file and this is the task that fails. It will run fine for a few days and then randomly decide that it can't find this task, even though the exact same task is called two other times during each pipeline, foo-1 and foo-2. I'm wondering if I'm the first person to have their install randomly lose tasks..
k
Hey @John Grubb, This is certainly strange behavior that is hard to discern - my only suggestion currently would be to upgrade your version of Prefect (if not on the latest version), reregister all of these flows to the Cloud API with the latest version, and update your Agent’s version. If version mismatch isn’t the culprit, then we may have to dig deeper into code - perhaps it has to do with our storage method here.
j
thanks Kyle. The registering of the flow doesn't depend on the agent process remaining alive indefinitely does it? I was thinking maybe if the agent process fell over and were restarted at some point, because that's possible I spose?
k
The Agent is completely independent from the flow registration. In general, it’s always good practice to keep the Agent as up-to-date as possible for compatibility with previous Prefect Flow versions, so that suggestion was more a general suggestion than anything directly related to your issues.
Just trying to cross things off the list for potential weirdness.
j
typically I resolve the thing by re-registering and then running the flow manually from the cloud UI and most of the time it clears it. I have on at least one occasion restarted a failed flow run and it worked, which was the most mysterious of all. I doublecheck the versions I'm running and yes thank you
k
No problem, feel free to update here with your new findings. If this persists, this could be a high-bounty bug for my collection. 👍
j
I need to deepen my understand of the entire world here, but I just discovered that I had two other agents running on two other development environments and that by disabling those agents it's now behaving as intended. I'm trying to establish a development workflow using isolated environments for different git branches, but I suppose I need to better understand how to have agents only listen for flows that are intended for their particular environment.
a
I have faced this as well with my local test setup for Prefect Server (not Cloud): 1. Started Prefect Server 2. Started a Docker container with an entrypoint that is a shell script that creates the project, registers the flows, and then starts a local prefect agent 3. I access Prefect Server via the web UI, start a flow that does such an import manually, and tada! The same error that John Grubb observed, also inconsistently because it works for some flows that also do imports. In one case I manually merged the imported code back in, restarted, and it ran ok. Then I redid the separation for the imports, restarted, and it also ran ok despite being the same code that failed just minutes earlier.
I suppose I need to better understand how to have agents only listen for flows that are intended for their particular environment.
Ah, I believe that is done through labels on the agents and the RunConfig objects (previously the environments). Can refer to: https://docs.prefect.io/orchestration/flow_config/run_configs.html#labels
j
I was just looking at labels, thank you. My new working theory is that I scheduled a run in the UI but one of the other environment's agents actually picked it up and ran it, and that's why it was complaining about not being able to find tasks, presumably those were older environments that didn't have the newer tasks in the flow.
🤔 1
99% sure this was it, because through this I discovered what labels are for, and was previously running my agent(s) with no labels and registering my flows with no labels, so the 2 other development environments that I was running were (probably) picking up those jobs when the schedule came around. This explains all the weird behavior so I'll confirm when I'm sure it doesn't derail anymore.
b
Hey team - I’m also having some of these issues, what is the best way to go about debugging them? I’m going to try and pull down the images I’m building to check for the flow code thats being built, but how can I best replicate what server is doing when it sees
Task slug foo-3 not found in the current Flow
? My setup: • Prefect server running on kubernetes via the helm chart (agent included) • Docker storage (I have a base image containing my code, and building a “flows” tag via docker-storage from that) • KubernetesRun run config • Deployed via gitlab CI I’m pushing changes to code/flow and using the new flow.register(idempotency_key=flow.serialized_hash()) which kicks off the following CI process: • Build a new base image for the repo, run tests • Run the register step, building the docker storage (with a “flows” tag) with base image from previous step This is reasonably complicated setup, but I can’t see any reason why it would be throwing the
Task slug foo not found in the current Flow
(totally fresh build every time). Open to any ideas on how to debug
a
@Brad do you happen to have multiple projects such that different flows have the same name but in different projects? There's a bug with local storage concerning this situation, so I'm wondering if it applies to docker storage too.
b
Nope - I’ve started from scratch and just running a single project and single flow