< Marvin> we are currently using the `prefect filesystems S3 Prefect Community #ask-marvin

<@ULVA73B9P> we are currently using the `prefect.f...

Scott Pedersen

07/29/2024, 10:06 PM

@Marvin we are currently using the

prefect.filesystems.S3

package and

S3.load('BLOCK')

. We would like to move to

prefect_aws.S3

what method should I be using to replace

load

Marvin

07/29/2024, 10:08 PM

There isn't a direct

load

method in the

prefect_aws.S3Bucket

class, but you can achieve similar functionality using the

S3Bucket.load()

method from Prefect. Here's a brief guide on how to transition: 1. Migrating from
prefect.filesystems.S3
to `prefect_aws.S3Bucket`: - Old:

prefect.filesystems.S3.load('BLOCK')

- New:

prefect_aws.S3Bucket.load('BLOCK')

2. Example Usage:

Copy code

python
from prefect_aws.s3 import S3Bucket

# Load the S3 bucket block
s3_bucket = S3Bucket.load("BLOCK")

# You can then use methods like `s3_bucket.write_path`, `s3_bucket.read_path` to interact with the S3 bucket

Here are some relevant links for more detailed information: - prefect_aws S3Bucket Documentation If you encounter any issues, please refer to the documentation or open a new issue on GitHub.

Nate

07/29/2024, 10:15 PM

hi @Scott Pedersen - they are both blocks so they should both have

load

methods ie, you shouldnt need to are you seeing something unexpected?

Scott Pedersen

07/29/2024, 10:18 PM

Hey @Nate, when I was looking at the documentation for

prefect_aws.S3Bucket

I couldn't see a

load

method in the table of contents, however, further down in the examples I can see

S3Bucket.load()

being used

Nate

07/29/2024, 10:19 PM

aha. i see, we're probably just showing methods specific to the

S3Bucket

subclass there, as opposed to all methods that come for free by virtue of it being a

Block

👍 1

Scott Pedersen

07/29/2024, 10:21 PM

Just to clarify, is it just a package change and we can continue to use

load

they way we are currently using it? We pass the s3_bucket to

Flow.from_source

Nate

07/29/2024, 10:21 PM

exactly! that should be right

thank you 1

Scott Pedersen

07/30/2024, 10:01 PM

Hi @Marvin I have set up an S3 Bucket block with the correct Bucket Name and Bucket Folder i.e. bucket-name/folder-name The assumed role used for running flows on K8s has access to the bucket and folders. I am using

from prefect_aws.S3 import S3Bucket

package and

S3Bucket.load("name-of-block")

to load the contents. I have the

flow.from_source(s3_block, {path}/flow.py:my_flow)

. I am receiving the following error

Copy code

prefect.exceptions.ScriptError: Script at '/tmp/tmpiekdqty5/s3-bucket-name-of-block/flow.py' encountered an exception: FileNotFoundError(2, 'No such file or directory')"

The path looks correct except I would have thought that the

/tmp/tmpiekdqty5/s3-bucket-name-of-block

would be the initial path to the flow folder. Can you offer any advise on how to fix this?

Marvin

07/30/2024, 10:01 PM

It seems like there isn't a direct match for your specific issue in the repositories or additional documentation found, but I have a few recommendations to help troubleshoot and potentially solve your problem. 1. Verify Path Setup: Ensure that the path you are passing is correctly formatted and exists in the specified S3 bucket. 2. Check Permission: Double-check that the assumed role you're using indeed has the necessary permissions to access the bucket and folder. Sometimes, the assumed roles might miss specific permissions like

s3:ListBucket

s3:GetObject

. 3. Intermediate Directory Structure: Sometimes, interim paths might not be correctly laid out. Try printing the available directories and checking the exact structure of the temporary directory. 4. Example Usage: Here's a refined example that ensures the correct loading and access to the S3 bucket contents:

Copy code

python
   from prefect_aws.s3 import S3Bucket
   from prefect import flow

   @flow
   def load_flow():
       s3_bucket = S3Bucket.load("name-of-block")
       local_path = s3_bucket.download("folder-name/flow.py")
       print(f"Flow path: {local_path}")
   
   load_flow()

5. Temporary Directory Handling: Ensure that the flow execution environment has the necessary permissions to write to the temporary directory

/tmp/

. If the issue persists, consider opening a new issue on the Prefect GitHub repository. Provide details about your configuration, the exact error message, and any relevant logs. For more detailed information, refer to the Prefect AWS S3 Documentation.

Nate

07/30/2024, 10:03 PM

hi @Scott Pedersen,can you share block of code where you do this?

I am using
from prefect_aws.S3 import S3Bucket
package and
S3Bucket.load("name-of-block")
to load the contents. I have the
flow.from_source(s3_block, {path}/flow.py:my_flow)

Scott Pedersen

07/30/2024, 10:04 PM

Sure

Scott Pedersen

07/30/2024, 10:06 PM

workspace_settings.flow_storage_block = S3 Bucket Block Name

Nate

07/30/2024, 10:16 PM

hrm im not sure as a sanity check, maybe we could print

s3_storage

before you pass it into

from_source

? its very odd to me that we're seeing the block name in this path

/tmp/tmpiekdqty5/s3-bucket-name-of-block/flow.py

prefect version

would also be helpful

Scott Pedersen

07/30/2024, 10:16 PM

The prefect version we are currently using is 2.16.6

Scott Pedersen

07/30/2024, 10:24 PM

What I'll do is set it back to the original S3 block and S3.load method and print that out and see what the differences are between to the two.

Scott Pedersen

07/30/2024, 11:01 PM

@Nate I logged the

S3Bucket.load

and

S3.load

objects that are returned and they both look similar except for the additions in

S3Bucket

Copy code

S3Bucket(
	bucket_name='bucket-name', 
	credentials=AwsCredentials(
		aws_access_key_id=None, 
		aws_secret_access_key=None, 
		aws_session_token=None, 
		profile_name=None, 
		region_name=None, 
		aws_client_parameters=AwsClientParameters(
			api_version=None, 
			use_ssl=True, 
			verify=True, 
			verify_cert_path=None, 
			endpoint_url=None, 
			config=None)
	), 
	bucket_folder='bucket-name/flows'
)


S3(bucket_path='bucket-name/flows', 
	aws_access_key_id=None, 
	aws_secret_access_key=None
)

Scott Pedersen

07/31/2024, 1:20 AM

@Marvin for an S3 Bucket block if I have a bucket called my-bucket and a folder called my-folder what should the bucket_folder be populated with? Should it be my-bucket/my-folder or just my-folder?

Marvin

07/31/2024, 1:20 AM

The correct value for the

bucket_folder

in your

S3Bucket

block configuration should be just the folder name within the bucket. So if your bucket is called

my-bucket

and the folder you want to use is

my-folder

, you should set

bucket_folder

my-folder

. Here's the correct configuration syntax:

Copy code

bucket_name: my-bucket
bucket_folder: my-folder

For more details, you can refer to the relevant Prefect AWS S3 Documentation.

Scott Pedersen

08/01/2024, 2:32 AM

@Marvin when I use

Flow.from_source

and I use what is returned from calling

S3Bucket.load

so my call looks like

my_flow.from_source(s3_storage, path/to/flow.py:my_flow)

does the

path/to/flow.py:my_flow

need to include the

bucket-name

and

bucket-folder

Marvin

08/01/2024, 2:33 AM

The documentation does not explicitly detail the format for paths when using

Flow.from_source

with

S3Bucket

. However, based on common practice and the structure typically used in similar operations, here's how you should structure the path: 1. Without
bucket-name
and
bucket-folder
: - When you call

S3Bucket.load

, it should already be aware of the

bucket_name

and

bucket_folder

. Therefore, the path provided to

Flow.from_source

should be relative to the

bucket_folder

. Example:

Copy code

python
   from prefect_aws.s3 import S3Bucket
   from prefect import flow

   @flow
   def my_flow():
       pass

   # Load the S3 bucket block
   s3_storage = S3Bucket.load("name-of-block")

   # Call from_source with a path relative to the bucket folder
   my_flow.from_source(s3_storage, "path/to/flow.py:my_flow")

2. Alternative Scenario: - If you face issues, another approach to ensure correctness would involve giving the full S3 path. Example:

Copy code

python
   my_flow.from_source(s3_storage, "my-bucket/my-folder/path/to/flow.py:my_flow")

However, the preferred practice should be aligning with the block's inherent configuration. Here’s an important point from the relevant Prefect AWS S3 Documentation: - The

bucket_folder

should only contain the folder inside the bucket, e.g.,

my-folder

. -

Flow.from_source

path should then be relative to this

bucket_folder

, e.g.,

path/to/flow.py:my_flow

. If these suggestions don't resolve the issue, I recommend reaching out to the Prefect community or consulting the Prefect documentation further to ensure clarity.

Scott Pedersen

08/01/2024, 2:36 AM

Hi @Marvin that was my understand that the

/path/to/flow.py:my_flow

would be relative to the

bucket-name/bucket-folder

Marvin

08/01/2024, 2:36 AM

Given your understanding, you are correct that the path specified in

Flow.from_source

should be relative to the

bucket_folder

defined in the

S3Bucket

block. Here's how you can structure it: 1. Define the
bucket_folder
properly: - Ensure your

S3Bucket

block has the

bucket_folder

set correctly (just the folder name within the bucket). Example:

Copy code

python
from prefect_aws.s3 import S3Bucket

s3_bucket = S3Bucket(
    bucket_name="my-bucket",
    bucket_folder="my-folder",
    credentials=<Your AWSCredentials Block>
)
s3_bucket.save("name-of-block")

2. Load the flow from the S3 bucket: - When specifying the path in

Flow.from_source

, it should be relative to the

bucket_folder

. Example:

Copy code

python
from prefect_aws.s3 import S3Bucket
from prefect import flow

@flow
def my_flow():
    pass

# Load the S3 bucket block
s3_storage = S3Bucket.load("name-of-block")

# Call from_source with a path relative to the bucket folder
my_flow.from_source(s3_storage, "path/to/flow.py:my_flow")

### Detailed Steps: 1. Define your S3Bucket block: Make sure the

bucket_name

and

bucket_folder

are correctly set when you create and save the block. 2. Load the S3Bucket: Use

S3Bucket.load("name-of-block")

to retrieve the block. 3. Set the path relative to
bucket_folder
: Provide the path in

Flow.from_source

relative to the

bucket_folder

. Here’s a concrete example: - S3 Bucket:

my-bucket

- Folder inside the bucket (

bucket_folder

my-folder

- Path to your flow file inside the bucket folder:

path/to/flow.py

Copy code

python
# Structure the path as relative to the bucket_folder
from prefect_aws.s3 import S3Bucket
from prefect import flow

@flow
def example_flow():
    pass

s3_bucket = S3Bucket.load("name-of-block")

# Assuming the actual S3 structure is: my-bucket/my-folder/path/to/flow.py
example_flow.from_source(s3_bucket, "path/to/flow.py:example_flow")

In case the issue persists, consider verifying the contents of your S3 bucket directly to ensure the paths are correct, and that the role assumed has all the necessary permissions to access the files. If any issues remain, check the structure to confirm it aligns with the expected layout.

Scott Pedersen

08/01/2024, 2:38 AM

@Marvin we use an assumed role that runs the flow and has access to the underlying S3 bucket so in this case the

credentials=<Your AWSCredentials Block>

wouldn't be required. Would it be OK to leave this out?

Marvin

08/01/2024, 2:39 AM

Yes, if you are using an assumed role to run the flow and it has the necessary access to the S3 bucket, you can leave out the

credentials

parameter when setting up your

S3Bucket

block. Prefect should use the role that the process is running under to access the S3 bucket. Here's how you can set up and use the

S3Bucket

block without specifying credentials: 1. Define the
S3Bucket
block without credentials:

Copy code

python
from prefect_aws.s3 import S3Bucket

# Define the S3 bucket block
s3_bucket = S3Bucket(
    bucket_name="my-bucket",
    bucket_folder="my-folder"
)

# Save the block
s3_bucket.save("name-of-block")

2. Load the S3Bucket block and use
Flow.from_source
:

Copy code

python
from prefect_aws.s3 import S3Bucket
from prefect import flow

@flow
def example_flow():
    pass

# Load the S3 bucket block
s3_bucket = S3Bucket.load("name-of-block")

# Specify the path relative to the bucket folder
example_flow.from_source(s3_bucket, "path/to/flow.py:example_flow")

### Additional Tips: - Ensure Permissions: Make sure that the assumed role has sufficient permissions (including

s3:GetObject

s3:ListBucket

, etc.) for the S3 bucket. - Validate Path: Double-check the S3 bucket structure and ensure the path specified (

path/to/flow.py:example_flow

) correctly maps to the location within

my-folder

. This approach adheres to your setup and leverages the role's inherent permissions to access the S3 bucket without needing additional credentials. If everything is configured properly, this should resolve the file access issues you were encountering.

Scott Pedersen

08/01/2024, 4:42 AM

@Marvin when I call

Flow.deploy()

from my code what is the workflow that is followed within Prefect?

Marvin

08/01/2024, 4:43 AM

When you call

Flow.deploy()

within Prefect, the workflow generally follows these steps: 1. Serialization of Flow: - The flow is serialized into a format that can be stored and later executed. This serialization includes the flow's code, configuration, dependencies, and metadata. 2. Storage of Flow: - The serialized flow is stored in the specified storage backend (e.g., S3, Docker, Kubernetes, etc.). The storage backend defined in the flow is responsible for housing the flow's code and any associated files. 3. Registration as a Deployment: - The flow is registered as a deployment within the Prefect API (either Prefect Cloud or a self-hosted Prefect server). This includes creating a deployment entry that tracks the flow's configuration, schedules, parameters, and storage details. 4. Creation of Work Pool or Queue: - The deployment may be associated with a work pool or queue. This is where it waits for a worker to pick it up and execute a flow run. Different infrastructure backends (like Docker, Kubernetes, serverless cloud options) determine how and where these flows run. 5. Polling and Execution by Workers: - Workers, which could be running in different environments (e.g., local processes, Docker containers, Kubernetes jobs), poll the work pool or queue for new flow runs. Once a new flow run is identified, the worker retrieves the flow from storage, initializes the environment, and executes the flow. Here is a high-level visualization of this process:

Copy code

plaintext
Flow Code -> Serialization -> Storage Backend (e.g., S3, Docker) -> Registration as Deployment -> Work Pool/Queue -> Worker Polling -> Flow Run Execution

For more detailed guidance, including how to set up and manage deployments, refer to these documentation pages: - Prefect Deployment Guide - Prefect Deployments Tutorial These docs provide comprehensive insights into the deployment process, configurations, and best practices for setting up your flows in Prefect.

4 Views

Open in Slack

Previous Next