Thread
#prefect-community
    j

    James Watt

    3 years ago
    @Chris White It seems that once Prefect entering the retrying state, there is no API that could be called back to periodically check its context during retry delay period. Am I missing something here?
    Chris White

    Chris White

    3 years ago
    So, using just the open source engine, the python process running the Flow simply sleeps until the next retry time; for an external API that you could query (which I think is what you’re asking about?), you’d need an external orchestration service which is what our Prefect Cloud offering provides — let me know if I’m misunderstanding your question though
    j

    James Watt

    3 years ago
    Do you mean the "external orchestration service" is needed when multiple flow runner instances need to be coordinated by a higher level service? However, that is not my use case here. I am wondering what Prefect could do during the retry delay period. Is it possible to pop out its "delaying" progress information all by itself?
    Single flow runner instance is good enough for my application.
    Chris White

    Chris White

    3 years ago
    Gotcha; Yea the external orchestration service is necessary for multiple instances. What sort of information do you need to access, other than the scheduled start time for the next retry attempt? The next scheduled retry attempt is something you could query for from Prefect Cloud but isn’t readily available in the open source engine.
    j

    James Watt

    3 years ago
    Such delay period in between retry attempts like a black hole. A sort of heart-beat signalling is needed so that the flow runner caller can be confident that the underlying flow runner is still alive. This is particularly true if the delay is quite long compared to the individual task. In fact, I make a count-down timer just for counting this delay's elapsing. But I'm bothered by there is no way to periodically check Prefect state during its retry delay period.
    Chris White

    Chris White

    3 years ago
    oh ok I see what you’re worried about; in Core (the open source engine) you are correct - there is no way to truly verify that everything is still alive. In Cloud, however, there are a number of services monitoring the state of your Flow Run so you can transparently see the next scheduled start time for the task, etc. and be confident that the task will retry again at the correct time
    In Cloud, there is no “sleeping” process as there is in Core - we have a full scheduler service that is ensuring things run at the appropriate times
    j

    James Watt

    3 years ago
    I see. Hope the core engine could feature something like a heart-beat callback once day.
    Chris White

    Chris White

    3 years ago
    That makes sense - would something like an occasional log every 30 minutes be sufficient for your needs? If so, feel free to open an issue and we could consider implementing something lightweight such as that
    j

    James Watt

    3 years ago
    That sounds great! I don't know how complicated it might become. My thought is to make a subclass of retrying state that could auto self toggling its state on and off by settable period. As such, we can take advantage on_state_change function to generate the signaling to external world as we want.
    Chris White

    Chris White

    3 years ago
    Yea that’s the only caveat - walking down the path of needing external visibility / Process monitoring / APIs opens up a large can of worms + edge cases (and is why we built Cloud), but for something simple like a “heartbeat log” it should be easy enough to include