question about s3 interaction within tasks I have around 100 Prefect Community #ask-community

question about s3 interaction within tasks: I have...

Lior Barak

09/26/2023, 2:34 PM

question about s3 interaction within tasks: I have around 100 tasks that run in parallel in a deployment and all interact with s3 (specifically they check if a file exists and if so they do some stuff) all the implementations I see eventually create some client interface which is ok, on the first task it takes 0.2 seconds to init s3 client, and by the time we reach the 100th task it takes around 10 seconds. which is wayyyy to much for me. is there a best practice to share objects/ s3 clients between tasks and flows that run in the same infra? stuff that I looked at: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html https://pypi.org/project/aioboto3/ https://prefecthq.github.io/prefect-aws/s3/ https://docs.prefect.io/latest/concepts/filesystems/#s3

Nicholas Torba

09/26/2023, 3:31 PM

where are your tasks running? depending on your infra, seems like you could either use a global s3_client var, or pass the same client to each of the tasks?

Lior Barak

09/26/2023, 3:48 PM

running in local hosted agent (pod on kubernetes over aws) when passing an s3 client as a parameter I got a pydantic validation error assumed you are only supposed to pass basic pydantic types that can be validated

Nicholas Torba

09/26/2023, 5:17 PM

does setting the s3_client type from s3 as the type hint in the task func help or it still fails (disclaimer: i don't work at prefect, just use it a fair amount and happened to be on slack this morning and see this)

🙂 1

Lior Barak

09/26/2023, 5:20 PM

will give that a shot tomorrow and see

Lior Barak

09/27/2023, 9:05 AM

so here is my solution: looks like flows don't accept arbitrary types but tasks do so just initialised s3 clients on flow level instead of task level did some creative rearranging so to make it work

3 Views

Open in Slack

Previous Next