https://prefect.io logo
Title
m

Mark NS

12/21/2022, 9:48 AM
Hi, I'm trying to understand how remote-storage should be used to distribute flows to a heterogeneous execution environment. I've drawn a couple of diagrams to clarify my understanding. The first diagram shows the creation of a deployment in the orion server via the cli. Config goes to orion, code goes to the s3 remote storage bucket. The agent subscribes to the work queue. So far so good. The second diagram shows the prefect agent receiving an run event from the queue and spawning a
myapp
docker container for execution.
What I would now expect is that the
myapp
container downloads the flows from the s3 storage bucket, and the entrypoint is executed. However, instead the container fails with an s3fs Access Denied error.
File "/app/.venv/lib/python3.11/site-packages/s3fs/core.py", line 774, in _find
    out = await self._lsdir(path, delimiter="", prefix=prefix)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/s3fs/core.py", line 725, in _lsdir
    raise translate_boto_error(e)
PermissionError: Access Denied
I would have thought distributing these authentication creds should be handled by the agent and the s3 block somehow.
k

Kelvin DeCosta

12/21/2022, 9:55 AM
I think the Docker container must be explicitly supplied with the required credentials.
m

Mark NS

12/21/2022, 10:02 AM
Hmm, I don't see anything about supplying the s3 credentials in the documentation here: https://docs.prefect.io/tutorials/docker/
k

Kelvin DeCosta

12/21/2022, 10:11 AM
Could be 2 cases: •
S3Bucket
and
S3Block
are two separate block types, which confused me in the past. For a storage block, you need the one in
prefect.filesystems
• The S3 block takes aws credentials as optional fields. So, maybe it needs to be specified in this case
m

Mark NS

12/21/2022, 10:14 AM
Thanks, but yep, already doing that...
s3 = S3(bucket_path=os.environ.get("AWS_S3_BUCKET_NAME", DEFAULT_BLOCK),
            aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID", DEFAULT_BLOCK),
            aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", DEFAULT_BLOCK)
            )
    s3.save("default")
I'm going through the tutorial I linked again. Maybe my custom image is not working the same way as the default image
Also, I know the s3 block is correctly configured, as the upload on
prefect deployment build -sb s3/default ...
works fine
k

Kelvin DeCosta

12/21/2022, 10:21 AM
hmm that is weird indeed
m

Mark NS

12/21/2022, 10:28 AM
Ok, so using the default image
prefecthq/prefect:2.7.3-python3.10
the flows are downloaded correctly from remote storage. Trying to figure out why my custom image using
python:3-slim
with
prefect = "^2.7.3"
prefect-dbt = "^0.2.5"
s3fs = "^2022.11.0"
installed is not behaving the same way...
d

davzucky

12/22/2022, 1:09 AM
We will be using the pattern of creating a docker image with our flow and not use the S3 storage. Did you consider this pattern?
m

Mark NS

01/06/2023, 1:10 PM
Hi @davzucky, I was initially creating a docker image with the flow runs and cloning the application code at runtime. However, I realised this approach wouldn't work in an environment where you are trying to orchestrate diverse runtimes.
Sorry for the late reply too!