Hi everyone. First off <@U02H1A95XDW> (and the re...
# prefect-aws
t
Hi everyone. First off @Anna Geller (and the rest of the team) thanks for all the work you’ve put into the templating at anna-geller/dataflow-ops 🙌 I am trying to run the flow and I’ve been able to solve most small issues so far. However, at the Prefect Blocks & S3 Upload I am getting an error that I can’t seem to solve. I think its might be a IAM role issue, but the AWS Access is set generously to FullS3. The run fails at
Upload to S3 - maintenance flow
with a permission error.
Copy code
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
I checked the permissions of
dataflowops_ecs_task_role
and it does have full access to S3
"s3:*"
1
In
workflows/ecs_prefect_agent.yml
the role set for running the
Create Blocks & Upload to S3
is set to
dataflowops_ecs_execution_role
which does not have read/write rights to S3. I tried adding full S3 access to this role, but still does not work.
a
Great to hear! The task role needs S3 permissions, but curious did you adjust the bucket name? Perhaps it's just an issue with the name I could try to replicate soon to see if something changes that would break this template (shouldn't be the case)
also, at which point do you get this error? Do you know if this is from GitHub Actions? if so, it might be that your IAM user that you leverage in your GHA Secrets doesn't have permission to upload to S3
t
I did not adjust the bucket name. It fails at the step called “Upload to S3 - maintenance flow”. If I run the step manually from my terminal I get the same error. In Prefect cloud the Bucket path is
prefect-orion/prod
which is the standard for the workflow. However, this might be a problem because it needs to be globally unique? Will delete all resources and try again with a new bucket name and get back 👍
Changed the bucket name and now I get a new error:
FileNotFoundError: The specified bucket does not exist
Also, I cant see the bucket in S3? I think the bucket is generated in the step called “Create Blocks & Upload to S3” which is executed without error. However, there is no resulting bucket in S3?
Solved. Solution: Manually made a S3 bucket and changed the name in the flow to correspond to that bucket. Is having a bucket a prerequisite for this? If not, I am not sure if the bucket is created in the flow correctly.
🙌 1
gratitude thank you 1
👍 1
a
nice work!
b
I think this is a bigger issue, I am getting this error today just randomly. Sometimes it works, other times it doesnt. This happens when running
prefect deployment apply --upload
I cant reproduce because it is happening randomly
a
we fixed --skip-upload flag bug in 2.6.7 - could you check if it's working now and if not, could you open an issue for it?
b
Ah yep, I'll try today and let you know
gratitude thank you 1
looks like that did the trick @Anna Geller 🙂
gratitude thank you 1
💯 1
Hey Anna - I can report that this error is still occurring - we are using prefect
2.6.7
. As you know I am going through a migration at the moment from 1.0 -> 2.0 and this strangely only seems to occur when I deploy a new repo to our CI the first couple of times. I then run the CI step locally and it seems to fix it somehow. Here are some logs if they are of any use. The CI step 100% has the correct credentials it needs etc. This definitely feels like some strange transient issue that keep happening. could it be my aiobotocore/s3fs dependancies?
Copy code
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/s3fs/core.py", line 111, in _error_wrapper
    return await func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/aiobotocore/client.py", line 358, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the PutObject operation: No AWSAccessKey was presented.
 
The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/prefect/cli/_utilities.py", line 41, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/prefect/utilities/asyncutils.py", line 201, in coroutine_wrapper
    return run_async_in_new_loop(async_fn, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/prefect/utilities/asyncutils.py", line 152, in run_async_in_new_loop
    return anyio.run(partial(__fn, *args, **kwargs))
  File "/usr/local/lib/python3.9/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "/usr/local/lib/python3.9/site-packages/prefect/cli/deployment.py", line 519, in apply
    file_count = await deployment.upload_to_storage()
  File "/usr/local/lib/python3.9/site-packages/prefect/deployments.py", line 576, in upload_to_storage
    file_count = await self.storage.put_directory(
  File "/usr/local/lib/python3.9/site-packages/prefect/filesystems.py", line 481, in put_directory
    return await self.filesystem.put_directory(
  File "/usr/local/lib/python3.9/site-packages/prefect/filesystems.py", line 358, in put_directory
    self.filesystem.put_file(f, fpath, overwrite=True)
  File "/usr/local/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
    raise return_result
  File "/usr/local/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/usr/local/lib/python3.9/site-packages/s3fs/core.py", line 1056, in _put_file
    await self._call_s3(
  File "/usr/local/lib/python3.9/site-packages/s3fs/core.py", line 338, in _call_s3
    return await _error_wrapper(
  File "/usr/local/lib/python3.9/site-packages/s3fs/core.py", line 138, in _error_wrapper
    raise err
PermissionError: No AWSAccessKey was presented.
An exception occurred.
ERROR: 1
Failed to deploy prefect flows!
a
it looks like your S3 block doesn't have explicitly defined access keys
b
Yeah, that's on purpose. I never do that. It reads it from the environment and whatever role it has
a
it seems to be the issue - the S3 file system block is based on s3fs which is a submodule of fsspec which relies on access keys - this issue might help https://github.com/s3fs-fuse/s3fs-fuse/issues/742
if possible, I'd recommend giving access keys a try and see if this helps, at least to see if this fixes the issue temporarily (it should)
b
Thanks but that is 100% not the issue. I use other CI pipelines with no access keys and they work fine. As mentioned, this issue seems to happen when the infrastructure and blocks have been recently created and then for some reason the issue disappears with no changes. The error that is being propagated is misleading
oh it is an s3fs issue, hmm ok might be a version thing
hmm I fixed it by restarting the CI agent - I think it is a s3fs cache issue
a
Makes sense, thanks so much for the update
b
some more info for you - it definitely appears to be something to either do with the cache / or even permissions of s3fs? When it is time for a flow to be deployed for the first time, it fails and then when I deploy it locally and the path is created in the bucket from the
--path
arg in
prefect build deployment
it seems to work 🤷
a
How do you use --path? Technically it's not required, the bucket path argument has everything that's needed
b
I use it with --path $flowName
isn't the bucket_path arg just the bucket name?
if I want each flow to be in its own dir then I add the --flow arg to the deploy command?
a
It's an entire path, ie bucket name and path to where you want to copy your entire project directory
You can certainly keep using path argument this way, but it's easier to just let Prefect upload the whole project directory once and add skip upload flag to other flows, you can check cicd from Dataflow ops repo to see how I approached that
The path is not needed because the entrypoint points to the flow location already
a
For future searchers, there was apparently a breaking change between 2.7.0 and 2.7.2 that caused this
Bucket Not Found
error when you specify a
project/file.py:function_name
rather than
project:file.py:FlowName
, as well.
a
ahh gotcha - we just shipped a fix with 2.7.3