Hi everyone, I've run into what looks like a bug i...
# ask-community
j
Hi everyone, I've run into what looks like a bug in Prefect Cloud. No tasks are running, yet this State Message is being reported:
Queued due to concurrency limits. The local process will attempt to run the task for the next 10 minutes, after which time it will be made available to other agents.
That string does not appear in the open-source part of Prefect, so it must be part of Prefect Cloud. The concurrency limit on that task is 10, and things were working until I changed some of Dask's configuration parameters to try to resolve an issue with it. The most likely cause of the above message is that some error happened and it didn't get handled correctly. https://cloud.prefect.io/stockwell/flow-run/4460703d-3c91-4573-b85c-a4b001048999
k
Hey @Jeremy Phelps, could you try querying for the relevant tags and seeing if there are task runs in a running state? Something like:
Copy code
query{
  task (where: {tags: {_eq: []}}) {
    flow {
        id
        name
      }
    id
    name
    tags
    task_runs (where: {state: {_eq:"Running"}}) {
      id
      name
      state
    }
    }
  }
j
I filled in that query as:
Copy code
query{
  task (where: {tags: {_eq: ["staging"]}}) {
    flow {
        id
        name
      }
    id
    name
    tags
    task_runs (where: {state: {_eq:"Running"}}) {
      id
      name
      state
    }
    }
  }
...and it returned no tasks.
I also tried the tag for the production cluster and found nothing there (as it should be).
The agent's logs don't have any useful information either:
Copy code
[2021-07-16 13:12:55-0500] INFO - prefect.CloudFlowRunner | Beginning Flow run for 'demand-forecasting-delivery-scheduler'
[2021-07-16 13:12:55-0500] INFO - prefect.DaskExecutor | Connecting to an existing Dask cluster at <tcp://dask-scheduler:8786>
Logs from the Dask scheduler show that a client connected right when I started the flow run. But no tasks were forwarded to the workers.
n
Hi @Jeremy Phelps - can you run this query instead?
Copy code
query{
  task (where: {tags: {_eq: ["staging"]}}) {
    id
    name
    tags
    task_runs (where: {state: {_in:["Running", "Submitted", "Queued", "Cancelling", "Retrying", "Resume", "Paused"]}}) {
      id
      name
      state
    }
    }
  }
j
That also returns nothing.
n
Interesting... let me dig around and see what I can find
j
Taking off all the parameters after the
task
token returns something, but Slack won't let me send it.
Pastebin it is, I guess: https://pastebin.com/SMpbVxz7
n
And
staging
is the tag you're having issues with, yeah? (Prefect employees can't see your UI links, just fyi)
j
Yes.
Are Prefect employees also blind to the contents of the database that these GQL queries operate on?
n
@Jeremy Phelps could you clarify what you mean?
j
When I run the GQL query you suggest, that performs a lookup in a database that Prefect owns. Can Prefect employees see what's in that database?
n
Prefect does have access, you're correct. Here's what I found: The only concurrency limit that's set is 
10
 on a tag called 
mysql-write
, and you already have 10 tasks in a 
running
 state with that tag; the tasks in the flow run you provided also have the 
mysql-write
 tag and are queued correctly as a result. There are no tasks with a
staging
tag and no concurrency limits with that tag either. One thing I did notice is that the tasks in
running
states don't all come from the same flow run, which could be causing the confusion
j
These tasks are not actually running. How do I find and get rid of them?
n
Let me see if I can grab some flow run ids and names for you and you can manually mark them as finished or cancelled
j
That doesn't solve the problem going forward. Things will reach this state again.
I found a bunch of Kubernetes pods that appear to be stale (the Dask schedulers they are talking to have been taken down). I deleted them, so maybe that will help.
I confused tags with labels. Do tasks with the same "tag" but different "labels" share the same concurrency pool?
n
Two things you can do for the future: you can set up flow SLA automations for that flow that will fail the flow if it exceeds some time threshold, you can manually mark those flow runs that are holding onto concurrency slots but whose jobs are stale as failed/completed (along with their associated tasks). Basically you'll need to kill the jobs in some way to make sure they're not holding onto those slots, whether that's through Prefect or your cluster
Tasks only have tags and so share the concurrency pool, flows have labels and share a different concurrency pool, though this is something we'd like to clarify in the future
j
It seems that the only way to have a truly separate staging environment is to have a separate Prefect account for it.
n
I think what you're describing is entirely doable with a single tenant but separate tags and labels on your flows to account for execution submission to different environments but I can put you in touch with one of our account managers to discuss multi-tenancy which will give you database-level sharding of environments.
j
Does multi-tenancy cost additional money?
n
It does, it's an enterprise-grade feature
j
Management will never agree to it.
Is there any documentation for the
set_task_run_states
mutation?
n
The GraphQL api has docs attached to the schema (you can view these in the interactive api) which denotes all input and output types. You can run that mutation through the interactive API like this:
Copy code
mutation {
  set_task_run_states(input: {states: [{task_run_id: "<<task run states>>", state: "{\"type\": \"Failed\", \"message\": \"<<your message>>\"}"}]}) {
    states {
      id
      status
    }
  }
}
Note the escaping of the
state
field, which is a JSON payload
j
The problem I'm running into is that I don't know which fields are expected in the
state
.
Oh, I see.
Ty, that worked.
👍 1