Anyone runs into a bug, when there's a a couple ta...
# ask-community
a
Anyone runs into a bug, when there's a a couple tasks mapping over a list with 10-20 elements (or more) in the Cloud, the execution just randomly stops?
Task 'Some important work[19]': Calling task.run() method...
and it just "runs" this task forever, but there's no
changed state from ... to Running
afterwards When using a LocalExecutor. I thought that this maybe has something to do with me using
raise SKIP
to "filter" some elements, but rewriting using a
FilterTask
didn't fix the problem
n
Hi @Arsenii! I haven't run into that before but I'll look into it and get back to you 🙂
@Arsenii is it possible your agent is running in a resource-constrained environment and is maxing out CPU/memory while running your mapped tasks?
a
That's a possibility -- but the mapped tasks are really lightweight... Usually basically just sending inserting a couple rows into a DB through psycopg2. Thanks for a hint @nicholas , I'll double-check when running next time
n
Of course! let us know what you find!
a
Hey! @nicholas Sorry for the delay. I have checked the usage statistics -- they seem normal, CPU and memory usage are 15-25% But I found the following interesting logs from the Agent:
Copy code
[2020-04-30 21:19:43,515] ERROR - agent | [{'message': 'request to <http://graphql:443/graphql/alpha/> failed, reason: connect ECONNREFUSED 10.30.42.241:443', 'locations': [{'line': 2, 'column': 5}], 'path': ['get_runs_in_queue'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'errors': [{'message': 'request to <http://graphql:443/graphql/alpha/> failed, reason: connect ECONNREFUSED 10.30.42.241:443', 'locations': [], 'path': ['get_runs_in_queue']}]}}}]
[2020-04-29 14:59:40,317] ERROR - agent | 530 Server Error:  for url: <https://api.prefect.io/graphql/alpha>
Any idea why that might be happening? I'm running the flow on Scheduler, if that helps
n
Oh that's very interesting, I think the API team is asleep right now but let me see if I can reproduce
@Arsenii can you confirm when that log is from? I'm seeing a 211942 timestamp from april 30th, but do you know what timezone that's in?
a
Looks like it's GMT
I'm trying to understand now if these error messages occur at the same time as flows getting stuck,
n
Oh my apologies, I thought that was communicating with Prefect Cloud, that's communication over your Prefect Server docker network it looks like
a
I'm using a
LocalExecutor
, should not be related to Docker..
n
Is your agent running in the same environment as your Prefect Server infrastructure?
a
Wait, I might be out-of-date with the latest Prefect updates, not sure what Prefect Server is
Oh, is that the new thing where you can run Prefect Cloud UI on-premises? I'm not using that
n
No no that's my mistake, I misread your earlier message, this is coming from Prefect Cloud
🙂 1
I think I'll need to elevate this to our API team to look at in the morning, would that be alright @Arsenii?
a
Yeah absolutely, no worries, thanks for the fast reply!
n
Of course, will get back to you as soon as possible!
👍 1
a
For future reference and discussion: I've been noticing flows also getting "stuck" during the
Starting to upload result to xxxx
step of a mapped task. If the flow is cancelled and re-run (with less elements to be mapped over), works fine
n
That's really helpful, thank you - I may follow up tomorrow to get some more info, once we can dig a little further on our end
m
Hi! I am having almost identical problem on the core version using the local executor:
Copy code
graphql_1    | GraphQL request:2:3
graphql_1    | 1 | mutation ($_v0_input: get_runs_in_queue_input!) {
graphql_1    | 2 |   get_runs_in_queue(input: $_v0_input) {
graphql_1    |   |   ^
graphql_1    | 3 |     flow_run_ids
apollo_1     | 2020-05-02T06:01:20.110Z {"message":"An unknown error occurred.","locations":[{"line":2,"column":5}],"path":["get_runs_in_queue"],"extensions":{"code":"INTERNAL_SERVER_ERROR","exception":{"errors":[{"message":"An unknown error occurred.","locations":[],"path":["get_runs_in_queue"]}]}}}
and from my agent:
Copy code
[2020-05-02 06:04:30,420] ERROR - agent | [{'message': 'An unknown error occurred.', 'locations': [{'line': 2, 'column': 5}], 'path': ['get_runs_in_queue'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'errors': [{'message': 'An unknown error occurred.', 'locations': [], 'path': ['get_runs_in_queue']}]}}}]
I am brand new to Prefect, so any docs or direction are helpful.
n
Hi @Arsenii and @Matt! Sorry for the delay here, the team has looked into the errors you've provided and have identified the issue and taken steps to resolve it, however that shouldn't be leading to hanging mapped task runs. @Arsenii have you confirmed that those errors show up when your flow runs are stalling, or are they happening at different times?
a
Last time I checked, I thought that they didn't correlate with each other too much, yeah..
Turns out I was running an outdated version of
prefect==10.1
, I believe issue is resolved now thanks to this PR: https://github.com/PrefectHQ/prefect/issues/2270 . So far I haven't run into the same issue on the new version, will update if that's just luck 🙂
n
Well that's good news, I've been racking my brain on this one! if you run into this again, post another comment in the community channel so we can triage in a new thread
a
Thanks @nicholas and sorry for stressing you out haha
n
Not at all, really happy you figured it out!