Heya! So, I'm pulling from a database that goes d...
# ask-community
m
Heya! So, I'm pulling from a database that goes down for ~75 minutes at random times. I set my tasks to have
@task(max_retries=3, retry_delay=timedelta(minutes=30))
but apparently Zombie Killer doesn't like that? Looking through the logs, I see
No heartbeat detected from the remote task; marking the run as failed.
, then `Flow run is no longer in a running state; the current state is: <Failed: "Some reference tasks failed.">`` then
Heartbeat process died with exit code -9
then
Copy code
Failed to set task state with error: ClientError([{'message': 'State update failed for task run ID 43f52f19-fffb-4d16-8223-da4ffc5668b2: provided a running state but associated flow run 8c8fc810-eb3d-447c-ab70-76dd1dc2acaa is not in a running state.', 'locations': [{'line': 2, 'column': 5}], 'path': ['set_task_run_states'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'State update failed for task run ID 43f52f19-fffb-4d16-8223-da4ffc5668b2: provided a running state but associated flow run 8c8fc810-eb3d-447c-ab70-76dd1dc2acaa is not in a running state.'}}}],)
z
Hi @matta ! I think this kind of looks like a bug, we'll have to take a look at the desired behavior for long retries like this. Can you check on your database as the first task of your flow then, if it is not ready, use the
StartFlowRun
task with a
scheduled_start_time
to kick off a new run in the future?
m
Sometimes it goes down in the middle of the pull (the whole thing takes about 3 hours, I'm replicating a whole db). Buuut I guess I could do that within a fow maybe? Make a trigger like "more than 10% failed" downstream from that step, and then have it do
StartFlowRun
?
Okay, this is coming together in my head. Thanks!
a
I guess there's an option to disable heartbeat check. Not the best option, but maybe something worth considering in this case?
z
We do think this is a possible bug on our end though, I'll post back in this thread if we identify an issue.
m
We're still using 0.13.19 btw
Not sure if there might have been a fix since then.