I am trying to run some 24+ hour workloads using an ECS pushpool. These workloads are failing at some point but on our prefect cloud dashboard they seem to be stuck in the
running
state - with the last log being days ago. On our ECS cluster I can see that they are no longer running. Does anyone have suggestions for troubleshooting this disconnect between prefect dash/ECS status?
Joe D
03/22/2024, 4:39 PM
I think the issue here was that the prefect_api_key being passed to the fargate containers was expiring after ~36 hours. Explicitly passing an unexpired key as an environmental variable in the push pool solved the issue but there is probably a better way...
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.