https://prefect.io logo
Title
a

Arsenii

04/25/2020, 5:18 AM
Anyone runs into a bug, when there's a a couple tasks mapping over a list with 10-20 elements (or more) in the Cloud, the execution just randomly stops?
Task 'Some important work[19]': Calling task.run() method...
and it just "runs" this task forever, but there's no
changed state from ... to Running
afterwards When using a LocalExecutor. I thought that this maybe has something to do with me using
raise SKIP
to "filter" some elements, but rewriting using a
FilterTask
didn't fix the problem
n

nicholas

04/25/2020, 3:05 PM
Hi @Arsenii! I haven't run into that before but I'll look into it and get back to you 🙂
@Arsenii is it possible your agent is running in a resource-constrained environment and is maxing out CPU/memory while running your mapped tasks?
a

Arsenii

04/26/2020, 1:43 AM
That's a possibility -- but the mapped tasks are really lightweight... Usually basically just sending inserting a couple rows into a DB through psycopg2. Thanks for a hint @nicholas , I'll double-check when running next time
n

nicholas

04/26/2020, 1:44 AM
Of course! let us know what you find!
a

Arsenii

05/01/2020, 5:38 AM
Hey! @nicholas Sorry for the delay. I have checked the usage statistics -- they seem normal, CPU and memory usage are 15-25% But I found the following interesting logs from the Agent:
[2020-04-30 21:19:43,515] ERROR - agent | [{'message': 'request to <http://graphql:443/graphql/alpha/> failed, reason: connect ECONNREFUSED 10.30.42.241:443', 'locations': [{'line': 2, 'column': 5}], 'path': ['get_runs_in_queue'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'errors': [{'message': 'request to <http://graphql:443/graphql/alpha/> failed, reason: connect ECONNREFUSED 10.30.42.241:443', 'locations': [], 'path': ['get_runs_in_queue']}]}}}]
[2020-04-29 14:59:40,317] ERROR - agent | 530 Server Error:  for url: <https://api.prefect.io/graphql/alpha>
Any idea why that might be happening? I'm running the flow on Scheduler, if that helps
n

nicholas

05/01/2020, 5:40 AM
Oh that's very interesting, I think the API team is asleep right now but let me see if I can reproduce
@Arsenii can you confirm when that log is from? I'm seeing a 21:19:42 timestamp from april 30th, but do you know what timezone that's in?
a

Arsenii

05/01/2020, 5:46 AM
Looks like it's GMT
I'm trying to understand now if these error messages occur at the same time as flows getting stuck,
n

nicholas

05/01/2020, 5:47 AM
Oh my apologies, I thought that was communicating with Prefect Cloud, that's communication over your Prefect Server docker network it looks like
a

Arsenii

05/01/2020, 5:47 AM
I'm using a
LocalExecutor
, should not be related to Docker..
n

nicholas

05/01/2020, 5:47 AM
Is your agent running in the same environment as your Prefect Server infrastructure?
a

Arsenii

05/01/2020, 5:48 AM
Wait, I might be out-of-date with the latest Prefect updates, not sure what Prefect Server is
Oh, is that the new thing where you can run Prefect Cloud UI on-premises? I'm not using that
n

nicholas

05/01/2020, 5:49 AM
No no that's my mistake, I misread your earlier message, this is coming from Prefect Cloud
🙂 1
I think I'll need to elevate this to our API team to look at in the morning, would that be alright @Arsenii?
a

Arsenii

05/01/2020, 5:51 AM
Yeah absolutely, no worries, thanks for the fast reply!
n

nicholas

05/01/2020, 5:52 AM
Of course, will get back to you as soon as possible!
👍 1
a

Arsenii

05/01/2020, 5:53 AM
For future reference and discussion: I've been noticing flows also getting "stuck" during the
Starting to upload result to xxxx
step of a mapped task. If the flow is cancelled and re-run (with less elements to be mapped over), works fine
n

nicholas

05/01/2020, 5:54 AM
That's really helpful, thank you - I may follow up tomorrow to get some more info, once we can dig a little further on our end
m

Matt

05/02/2020, 6:06 AM
Hi! I am having almost identical problem on the core version using the local executor:
graphql_1    | GraphQL request:2:3
graphql_1    | 1 | mutation ($_v0_input: get_runs_in_queue_input!) {
graphql_1    | 2 |   get_runs_in_queue(input: $_v0_input) {
graphql_1    |   |   ^
graphql_1    | 3 |     flow_run_ids
apollo_1     | 2020-05-02T06:01:20.110Z {"message":"An unknown error occurred.","locations":[{"line":2,"column":5}],"path":["get_runs_in_queue"],"extensions":{"code":"INTERNAL_SERVER_ERROR","exception":{"errors":[{"message":"An unknown error occurred.","locations":[],"path":["get_runs_in_queue"]}]}}}
and from my agent:
[2020-05-02 06:04:30,420] ERROR - agent | [{'message': 'An unknown error occurred.', 'locations': [{'line': 2, 'column': 5}], 'path': ['get_runs_in_queue'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'errors': [{'message': 'An unknown error occurred.', 'locations': [], 'path': ['get_runs_in_queue']}]}}}]
I am brand new to Prefect, so any docs or direction are helpful.
n

nicholas

05/04/2020, 9:57 PM
Hi @Arsenii and @Matt! Sorry for the delay here, the team has looked into the errors you've provided and have identified the issue and taken steps to resolve it, however that shouldn't be leading to hanging mapped task runs. @Arsenii have you confirmed that those errors show up when your flow runs are stalling, or are they happening at different times?
a

Arsenii

05/05/2020, 5:40 AM
Last time I checked, I thought that they didn't correlate with each other too much, yeah..
Turns out I was running an outdated version of
prefect==10.1
, I believe issue is resolved now thanks to this PR: https://github.com/PrefectHQ/prefect/issues/2270 . So far I haven't run into the same issue on the new version, will update if that's just luck 🙂
n

nicholas

05/06/2020, 2:24 AM
Well that's good news, I've been racking my brain on this one! if you run into this again, post another comment in the community channel so we can triage in a new thread
a

Arsenii

05/06/2020, 2:25 AM
Thanks @nicholas and sorry for stressing you out haha
n

nicholas

05/06/2020, 2:28 AM
Not at all, really happy you figured it out!