https://prefect.io logo
Title
j

James Watt

08/19/2019, 6:03 AM
@Chris White It seems that once Prefect entering the retrying state, there is no API that could be called back to periodically check its context during retry delay period. Am I missing something here?
c

Chris White

08/19/2019, 6:06 AM
So, using just the open source engine, the python process running the Flow simply sleeps until the next retry time; for an external API that you could query (which I think is what you’re asking about?), you’d need an external orchestration service which is what our Prefect Cloud offering provides — let me know if I’m misunderstanding your question though
j

James Watt

08/19/2019, 6:20 AM
Do you mean the "external orchestration service" is needed when multiple flow runner instances need to be coordinated by a higher level service? However, that is not my use case here. I am wondering what Prefect could do during the retry delay period. Is it possible to pop out its "delaying" progress information all by itself?
Single flow runner instance is good enough for my application.
c

Chris White

08/19/2019, 6:23 AM
Gotcha; Yea the external orchestration service is necessary for multiple instances. What sort of information do you need to access, other than the scheduled start time for the next retry attempt? The next scheduled retry attempt is something you could query for from Prefect Cloud but isn’t readily available in the open source engine.
j

James Watt

08/19/2019, 6:32 AM
Such delay period in between retry attempts like a black hole. A sort of heart-beat signalling is needed so that the flow runner caller can be confident that the underlying flow runner is still alive. This is particularly true if the delay is quite long compared to the individual task. In fact, I make a count-down timer just for counting this delay's elapsing. But I'm bothered by there is no way to periodically check Prefect state during its retry delay period.
c

Chris White

08/19/2019, 6:34 AM
oh ok I see what you’re worried about; in Core (the open source engine) you are correct - there is no way to truly verify that everything is still alive. In Cloud, however, there are a number of services monitoring the state of your Flow Run so you can transparently see the next scheduled start time for the task, etc. and be confident that the task will retry again at the correct time
In Cloud, there is no “sleeping” process as there is in Core - we have a full scheduler service that is ensuring things run at the appropriate times
j

James Watt

08/19/2019, 6:39 AM
I see. Hope the core engine could feature something like a heart-beat callback once day.
c

Chris White

08/19/2019, 6:43 AM
That makes sense - would something like an occasional log every 30 minutes be sufficient for your needs? If so, feel free to open an issue and we could consider implementing something lightweight such as that
j

James Watt

08/19/2019, 6:55 AM
That sounds great! I don't know how complicated it might become. My thought is to make a subclass of retrying state that could auto self toggling its state on and off by settable period. As such, we can take advantage on_state_change function to generate the signaling to external world as we want.
👍 1
c

Chris White

08/19/2019, 7:04 AM
Yea that’s the only caveat - walking down the path of needing external visibility / Process monitoring / APIs opens up a large can of worms + edge cases (and is why we built Cloud), but for something simple like a “heartbeat log” it should be easy enough to include