Hi Folks, my team and I are moving over some pipel...
# ask-community
j
Hi Folks, my team and I are moving over some pipelines that make use of S3, and I cannot figure out how to get this new AWS creds block or the new S3 block working. Has anyone gotten this working? I've tried bringing over all sorts of configuration options, but I keep running into "profile not found for profile _____" and access denied errors. I can connect just fine with boto3 directly, but would like to use these blocks to shorten my code, but at this point, I feel like these blocks are more likely to shorten my lifespan 😆
1
s
Hi, can you post your actual traceback in the thread?
j
I haven't seen any errors messages like that, no
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
Are there differences in required permissions when using this block versus the vanilla way of using boto3?
I haven't gotten any variations to work. I guess what I could really use is more documentation around actually setting up this aws S3 block and what permissions the AWS role needs, and what goes where
The docs around this block just don't tell me much
On a different note, that snowflake connector block is working, at least. I had to switch from Spyder to PyCharm to get any of the blocks to work at all.
s
Hi Jacob, I actually meant for “profile not found for profile__” not the access denied error, which could comes from the former error
j
Oh, I removed the profile argument, because I think I confused that option with the session name, which can be arbitrary
That 'profile' argument would make prefect look at aws for a predefined object, though, right?
s
As far as more documentation, here are a few resources, but note that these are using the prefect-aws integration (S3Bucket block) https://prefecthq.github.io/prefect-aws/#using-prefect-with-aws-s3 https://github.com/PrefectHQ/prefect-recipes/blob/main/flows-advanced/etl/export-airbyte-config-and-write-to-s3-bucket-using-blocks.py I’m not sure what you mean by removed the profile argument. Guessing the profile error is a botocore error? Hard to say without knowing more.
j
I was getting an error message:
raise ProfileNotFound(profile=profile_name)
botocore.exceptions.ProfileNotFound: The config profile (data-eng-bot) could not be found
I removed the "profile" argument from the block itself in the web UI
Then I started getting these other errors
How do I just test this block? I think that's pretty key
I just get error after error, and I don't know if it's to do with how I set it up in the UI, or if I need to update the AWS role, or what
c
These errors do look similar - botocore is looking for a profile named data-eng-bot here . Where you are trying to execute this code, and is this as a deployment operation meaning to push and retrieve your code from s3 , or in the flow meaning at execution time ? Is this a local run, process run, docker, ECS or something ?
Basically botocore is looking for a profile that doesn’t exist . Why that is remains to be seen but based on the name , you have a profile somewhere with that statically
j
local run for now - need to figure out how to use these things before I try to push to a work queue
But I don't need a profile, right?
c
Botocore thinks you have one
j
I removed the profile argument from the block though
I didn't have it when I used S3 with boto3 before
I was just trying to figure out how to create the block
c
Can you share your code minus the actual credentials
j
Copy code
aws_credentials_block = AwsCredentials.load("<my stored aws bot creds>")
s3_bucket = S3Bucket(
  bucket_name="<top-level bucket>",
  aws_credentials=aws_credentials_block
)

test_test = s3_bucket.read_path(path="<a folder one level below my default bucket >/<subfolder>/test_json_file.json")
s3_bucket_block.list_objects()
s3_bucket.download_folder_to_path('email_scan_bot', 'attachments')

with open("test_response.json", "wb") as f:
  s3_bucket.download_object_to_file_object("<a folder one level below my default bucket >/<subfolder>/test_json_file.json", f)

with open("test_response.json", "wb") as f:
  s3_bucket.download_object_to_file_object("<subfolder>/test_json_file.json", f)
No luck with anything so far
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
That's the one I'm getting with
Copy code
with open("test_response.json", "wb") as f:
  s3_bucket.download_object_to_file_object("<subfolder>/test_json_file.json", f)
c
one moment
Copy code
c3cb88d5-f75a-40ed-b01c-… │ S3                        │ tests3      │ s3/tests3
Ok, it looks like there are two separate s3 blocks, there is an s3 base, and an s3Bucket used with prefect-aws
j
I think maybe only Prefect folks can see the one you've got there
I sure don't see that in my web UI
c
yea, that’s my block
Copy code
from prefect import task, flow
from prefect import get_run_logger
from prefect_aws import AwsCredentials
from prefect_aws.s3 import S3Bucket

def this_is_not_a_task(logger):
    <http://logger.info|logger.info>("I am not a task context")

@task
def log_platform_info():
    logger = get_run_logger()
    <http://logger.info|logger.info>("hello world")
    this_is_not_a_task(logger)


@flow(log_prints=True)
def hello_world():
    logger = get_run_logger()
    #log_platform_info()

    aws_creds = AwsCredentials(
        aws_access_key_id="<removed>",
        aws_secret_access_key="removed"
    )

    s3_bucket_block = S3Bucket(
        bucket_name="<bucket name> ",
        aws_credentials=aws_creds,
        basepath=""
    )

    s3_bucket_block.list_objects("storage")
    s3_bucket_block.upload_from_path("deployment.py", "storage/deployment.py")

if __name__ == "__main__":
    
    hello_world()
Copy code
20:36:49.060 | INFO    | Flow run 'magnificent-chimpanzee' - Listing objects in bucket storage.
20:36:49.292 | WARNING | prefect._internal.concurrency.timeouts - Overriding existing alarm handler <function _alarm_based_timeout.<locals>.sigalarm_to_error at 0x10b804af0>
20:36:49.624 | INFO    | Flow run 'magnificent-chimpanzee' - Uploaded from '/Users/christopherboyd/all_the_things/s3_community/deployment.py' to the bucket '<bucket name>' path 'storage/deployment.py'.
20:36:49.750 | INFO    | Flow run 'magnificent-chimpanzee' - Finished in state Completed()
20:36:49.932 | INFO    | prefect._internal.concurrency.threads - Exiting worker thread 'APILogWorkerThread'
Copy code
(prefect2) (base)  christopherboyd@Christophers-MacBook-Pro  ~/all_the_things/s3_issue_community  
 $ aws sts get-caller-identity

An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation: The security token included in the request is invalid.
(prefect2) (base)  ✘ christopherboyd@Christophers-MacBook-Pro  ~/all_the_things/s3_issue_community  
 $
I have no profile enabled, I’m not passing any credentials, and in your case, I would entirely remove / move your
/.aws/
folder to rule out any possible conflicts. I used the exact code you saw there in a fresh venv - I used my actual access key + secret (just redacted for sharing) For the bucket, I have the name of the bucket and a single folder at the root named “storage” (so
/storage/deployment.py
is where this file went)
j
I'll try just the S3 bucket then, I guess
Maybe I can do it without pulling the aws credentials down, but I was kinda hoping to centralize those creds using the block
What's the best way to just test if this works?
Please keep in mind I'm used to using boto3 directly, and this is basically a layer on top of that, so I'm not sure what methods are being used in all cases
c
You c an try the code I just shared replacing with your values
Add your credentials, specify your bucket and a path, then a file to upload as a test
j
accessdenied message again
I know the creds are good because I use them in my v1 flows, though
c
And you can literally use that user using aws cli
Via aws s3
And using a profile with those credentials
If there is a policy attached to that user / role , it might allow access from wherever you run it in v1 but not where you are trying to here
j
Yeah, that's the thing I'm having trouble seeing
I think it's permissions
I made the role and the policy, and I can't see what could be blocking me
Shoot. Well, I think this one is just taking up too much time, and I'm kind of lost here, with such little help material out there for these aws blocks. I'm just going to store secrets and give up on using the aws blocks
Thanks for trying to help me out, though. I appreciate it
c
I would just try accessing s3 using that profile via CLI to make sure it’s not actually a permission issue for that user
If it works , then it’s something else, but I’m not sure it’s the integration as many users use those blocks
e.g.:
Copy code
$ aws s3 ls <bucket> --profile <the one for my credentials>
                           PRE storage/
2023-04-12 08:40:42        223 Y2lzY29zcGFyazovL3VzL01FU1NBR0UvM2E1NjJkOTAtZDkyZi0xMWVkLTg2MjEtZjkyYmY3YzMyMjVj
2022-11-07 14:54:41        223 Y2lzY29zcGFyazovL3VzL01FU1NBR0UvMDIyYTZhZTAtNWVkNi0xMWVkLWI0ZTAtYTU5MWY4YjhmZTY4
2023-04-10 13:35:56        223 Y2lzY29zcGFyazovL3VzL01FU1NBR0UvMjM1NDMxMTAtZDdjNi0xMWVkLWFlMmMtMTFiNTI1NWMxZjJj
2022-11-18 12:22:04        223 Y2lzY29zcGFyazovL3VzL01FU1NBR0UvODI0ZGFkODAtNjc2NS0xMWVkLWJiMTktMWRiNjE1ODBmMDU2
2022-11-18 12:22:05        223 Y2lzY29zcGFyazovL3VzL01FU1NBR0UvODM0YzU4ZDAtNjc2NS0xMWVkLWE3MWMtMjkyYzQzYjZhNzdi
2022-11-14 09:50:28        223 Y2lzY29zcGFyazovL3VzL01FU1NBR0UvYWI0NjVkMDAtNjQyYi0xMWVkLTk0YmQtNDM2ZTdiYzNiYjZl
is the same set of credentials i’m using for the flow
j
Do I have to have the CLI set up for this?
I've never had to do that for any reason before
The AWS CLI, that is
c
you don’t - I’m trying to suggest validating the user profile / credentials that you are using work in another scenario to be certain its not the credentials / permissions
j
Do I have to make a profile?
c
here is the absolute minimum code that i’m using:
Copy code
from prefect import flow
from prefect_aws import AwsCredentials
from prefect_aws.s3 import S3Bucket


@flow(log_prints=True)
def check_s3():
    logger = get_run_logger()

    aws_creds = AwsCredentials(
        aws_access_key_id="",
        aws_secret_access_key=""
    )

    s3_bucket_block = S3Bucket(
        bucket_name="my_bucket",
        aws_credentials=aws_creds,
        basepath=""
    )

    s3_bucket_block.list_objects("storage")
    s3_bucket_block.upload_from_path("deployment.py", "storage/deployment.py")
    print(s3_bucket_block.list_objects("storage"))

if __name__ == "__main__":
    
    check_s3()
you can run
aws configure
to set up a profile needing you key id, access key, region and format output
you can then run
aws s3 ls <your bucket>
--profile <the profile name you just created>
if that fails with your credentials, I’d investigate the user
if that succeeds with the user, try the code I just pasted which is the bare minimum, only requiring pip install prefect_aws
j
oh man, it was permissions
1
Huge thank you to both of you
🙌 2
I got my objects all listed out
There was some more IAM work to do, looks like
c
glad you got it sorted, and hopefully that helps you further with the blocks 🙂
j
It does! Thanks again
catjam 1