Has anyone had the problem when a scheduled flow just stop s Prefect Community #ask-community

Has anyone had the problem when a scheduled flow, ...

Daniel Caldeweyher

03/31/2021, 4:10 PM

Has anyone had the problem when a scheduled flow, just stop scheduling?

Copy code

weekly = CronSchedule("15 1 * * *", start_date=DEFAULT_START_DATE)
with Flow("Daily Extract", schedule=weekly) as flow:
   ...

This flow was happily running at the scheduled time at 1:15 a.m. every day, except now it just stopped getting scheduled. Curiously today is the first day of the next month....

Daniel Caldeweyher

03/31/2021, 4:12 PM

There should be upcoming runs. And under Activity it is missing todays run, last one is yesterday:

Daniel Caldeweyher

03/31/2021, 4:13 PM

Daniel Caldeweyher

03/31/2021, 4:27 PM

Daniel Caldeweyher

03/31/2021, 4:30 PM

The documentation says

The scheduler periodically queries for flows with active schedules and creates flow runs corresponding to the next 10 scheduled start times of the flow.

Any ideas what could cause it to stop doing that?

nicholas

03/31/2021, 4:46 PM

Hi @Daniel Caldeweyher - thanks for the report! Could you provide your flow or flow group ID for that flow? You can find these in the details of the flow page (top left tile, details tab).

Daniel Caldeweyher

03/31/2021, 5:50 PM

flow id 8c23a270-9546-4b71-add8-a686fb89334b but I doubt this is going to help you as we are using prefect community/server

nicholas

03/31/2021, 5:52 PM

Ah ok you're correct I won't be able to look up that flow id - in which case you may need to do some digging, check that your containers have the CPU/memory they need, and I'd also inspect the flow and flow group objects to see that the schedules are active (you can do this through the interactive API in the UI)

Daniel Caldeweyher

03/31/2021, 5:53 PM

what i just came across (just merely looked into it before) but.... would the re-scheduling be done by Lazarus?

Daniel Caldeweyher

03/31/2021, 5:53 PM

nicholas

03/31/2021, 5:56 PM

Re-scheduling would happen from lazarus if the flow run heartbeat was lost for some reason, the initial scheduling would happen from the scheduler service

Daniel Caldeweyher

03/31/2021, 5:58 PM

what about:

Copy code

{
  "severity": "ERROR",
  "name": "prefect-server.Scheduler",
  "message": "Unexpected error: ConnectError(gaierror(-2, 'Name or service not known'))"
}

Daniel Caldeweyher

03/31/2021, 5:58 PM

this is from my towel service logs

nicholas

03/31/2021, 5:59 PM

yeah that could be the issue - I'm not sure how you've deployed your server but could there be some networking issues?

Daniel Caldeweyher

03/31/2021, 5:59 PM

deployed on ECS via docker-compose

Daniel Caldeweyher

03/31/2021, 6:00 PM

based on

Copy code

<https://github.com/PrefectHQ/prefect/blob/master/src/prefect/cli/docker-compose.yml>

Daniel Caldeweyher

03/31/2021, 6:01 PM

i can debug it if i know what is trying to connect to where

nicholas

03/31/2021, 6:02 PM

Got it - unfortunately it's really difficult for me to give much guidance with any sort of custom docker compose file; it looks to me like there's an issue with your containers not being able to communicate, most likely it's the scheduler service unable to find the graphql/apollo containers

Daniel Caldeweyher

03/31/2021, 6:04 PM

docker-compose.yml

Daniel Caldeweyher

03/31/2021, 6:06 PM

based on https://github.com/PrefectHQ/server/blob/3eb177418b46e318026f87e06946d5d5c6bffd0c/src/prefect_server/services/towel/scheduler.py it really only needs graphql/apollo (or hasura)?

Daniel Caldeweyher

03/31/2021, 6:16 PM

... the hasura client defaults to config.hasura.graphql_url

Daniel Caldeweyher

03/31/2021, 6:17 PM

which default to

graphql_url = "http://${server.hasura.host}:${server.hasura.port}/v1alpha1/graphql"

according to https://github.com/PrefectHQ/prefect/blob/master/src/prefect/config.toml

Daniel Caldeweyher

03/31/2021, 6:21 PM

my gut feeling tells me that my docker-compose

towel

service needs:

Copy code

links:
      - hasura

Daniel Caldeweyher

03/31/2021, 6:42 PM

the above change seems to have fixed it (unless simply having restarted as part of the redeployment did the trick). ...but another issue i was having also seems to be resolved now. Regardless, i am not sure that the provided docker-compose in the prefect repo is correct:

nicholas

03/31/2021, 6:48 PM

Hm glad you got it fixed but since all Prefect Server deployments run off that docker compose file I'd be surprised if it were incorrect - feel free to open an issue though if you're able to reproduce with a vanilla deployment (using

prefect server start

)

Daniel Caldeweyher

03/31/2021, 6:50 PM

yes, i know, docker should find the service by name without using the link. i will try undo and see if it breaks again. it might also only be an issue when deployed on ECS

nicholas

03/31/2021, 6:51 PM

That's entirely possible - it's also odd that it was working previously but stopped; that lends to your theory that restarting is what really fixed it

Daniel Caldeweyher

03/31/2021, 6:52 PM

still... the official docker-compose links to

graphql

where as the apollo service links to both which is also reflected in the env variables

Daniel Caldeweyher

03/31/2021, 6:52 PM

well.... it schedules the first 10 by default

Daniel Caldeweyher

03/31/2021, 6:53 PM

i won't really know until my flows hit the 11th scheduled run

Daniel Caldeweyher

03/31/2021, 6:56 PM

Daniel Caldeweyher

03/31/2021, 7:02 PM

thanks for your help @nicholas

nicholas

03/31/2021, 7:04 PM

Sure thing - let me know if you see this again and we can dig further; a note on those

depends_on

lines - they dictate when a container can start, rather than actual links between the containers. In those cases the various container definitions depend on another container being started and healthy. So for apollo, the graphql container needs to be started in order to correctly build the apollo service.

nicholas

03/31/2021, 7:05 PM

But those health checks have also raised issues in the past (including some open ones) so I wouldn't rule them out as potential sources

5 Views

Open in Slack

Previous Next