https://prefect.io logo
Title
g

Guillermo Galan

11/21/2022, 4:02 PM
Hi all! Thanks for a great tool and community. Summary: I'm trying to get a deployment up and running in Azure Container Instances but Prefect Cloud keeps stopping my container after a few minutes. Do you have any ideas of what could be happening? I'm running a flow using deployments in Azure Container Instances. I manage to successfully trigger the job and the container starts running and processing the tasks correctly. After a while, for no apparent reason, the agent receives the instruction from prefect (running on Cloud) to delete the container:
16:44:27.874 | INFO    | prefect.infrastructure.container-instance-job - AzureContainerInstanceJob 'sftp-to-blob-ingestion': Preparing to run command '/opt/prefect/entrypoint.sh python -m prefect.engine' in container 'a53fd966-c635-4605-a8ab-4a3f487a7463' (prefecthq/prefect:2.6.7-python3.9)...
16:44:27.875 | INFO    | prefect.infrastructure.container-instance-job - AzureContainerInstanceJob 'sftp-to-blob-ingestion': Waiting for container creation...
16:45:37.024 | INFO    | prefect.infrastructure.container-instance-job - AzureContainerInstanceJob 'sftp-to-blob-ingestion': Running command...
16:45:37.025 | INFO    | prefect.agent - Completed submission of flow run '6bb0fc44-197e-4bfd-846e-877f5dbbf0b8'
16:48:43.454 | INFO    | prefect.infrastructure.container-instance-job - AzureContainerInstanceJob 'sftp-to-blob-ingestion': Completed command run.
16:48:43.455 | INFO    | prefect.infrastructure.container-instance-job - AzureContainerInstanceJob 'sftp-to-blob-ingestion': Deleting container...
In the meantime the Orion UI gets frozen in 'running' state and there are no log lines indicating any error or crash.
s

Sam Cook

11/21/2022, 4:27 PM
I have no help to offer, but I'm having a similar issue in Kubernetes where my jobs are getting OOMKilled (because of limits I have in place in the deployment) but no errors are being reported. Prefect tries to re-run the job but immediately completes because it's already in a RUNNING state and the run gets indefinitely stuck in the UI as running.
r

Ryan Peden

11/21/2022, 5:41 PM
Hi Guillermo! The
AzureContainerInstanceJob
checks the status of your container every 5 seconds and based on the output, it looks like the container is getting deleted because it has stopped running. Prefect isn't stopping the stopping the container; it actually doesn't have a way to stop the container mid-run (though it will soon). So something else is causing the flow run to stop prematurely. Enabling log streaming on your
AzureContainerInstanceJob
block (if it's not enabled already) might reveal more information about what is going wrong.
g

Guillermo Galan

11/22/2022, 10:13 AM
Thanks a lot! I'll look into the logs in the container and try to see what might be happenning. I'll keep you posted with my results. On the other hand, it seems a bit odd that the flow will maintain a "Running" state in this case, what's the mechanism for detecting if it's crashed, running, etc?