Maikel Penz04/13/2023, 11:35 PM
.. and these are the logs from the job pod..
| _ \ _ \ __| __| __/ __|_ _| /_\ / __| __| \| |_ _| | _/ / _|| _|| _| (__ | | / _ \ (_ | _|| .` | | | |_| |_|_\___|_| |___\___| |_| /_/ \_\___|___|_|\_| |_| Agent started! Looking for work from queue(s): infra-dev-plexflow-2... 23:25:42.455 | INFO | prefect.agent - Submitting flow run '69af3c9b-f22b-42db-a0a5-21595ec5408a' 23:25:44.631 | INFO | prefect.infrastructure.kubernetes-job - Job 'flow-infra-data-engineering-infra-dev-96hss': Pod has status 'Pending'. 23:25:44.705 | INFO | prefect.agent - Completed submission of flow run '69af3c9b-f22b-42db-a0a5-21595ec5408a' 23:26:44.628 | ERROR | prefect.infrastructure.kubernetes-job - Job 'flow-infra-data-engineering-infra-dev-96hss': Pod never started. 23:26:44.792 | INFO | prefect.agent - Reported flow run '69af3c9b-f22b-42db-a0a5-21595ec5408a' as crashed: Flow run infrastructure exited with non-zero status code -1.
.. it gets stuck trying to download the code. The role I have assigned to the cluster has S3 full access. My S3 Block has this configuration:
kubectl logs flow-infra-data-engineering-infra-dev-jhqhm-74k5l --follow /usr/local/lib/python3.8/runpy.py:127: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) 23:31:09.701 | INFO | Flow run 'economic-jerboa' - Downloading flow code from storage at '' 23:36:17.233 | WARNING | aiobotocore.credentials - Refreshing temporary credentials failed during mandatory refresh period. Traceback (most recent call last): ... .. .. raise ConnectTimeoutError(endpoint_url=request.url, error=e) botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "<https://sts.eu-west-1.amazonaws.com/>"
There seem to be multiple problems but.. 1. Why does the flow fail right after it starts with
Bucket Path: <bucket-name> AWS Access Key Id: None AWS Secret Access Key: None
? 2. And why it cannot pull the code from S3 and the log shows
Pod never started.
? which is empty
Downloading flow code from storage at ''
Nate04/14/2023, 3:08 PM
this error definitely seems permissions-related somehow, a couple ideas to check: 1. can your worker nodes actually assume your full s3 access role? docs 2. could this be a networking issue? the timeout to STS is kind of odd to me if you're just getting started, I'd recommend checking prefect projects / workers / work pools managing your deployment and its execution environment, since that is our recommendation going forward happy to continue debugging with you if you're still blocked
23:36:17.233 | WARNING | aiobotocore.credentials - Refreshing temporary credentials failed during mandatory refresh period. ... botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "<https://sts.eu-west-1.amazonaws.com/>"
Maikel Penz04/16/2023, 11:50 PM
Nate04/16/2023, 11:51 PM