https://prefect.io logo
t

Tibs

01/05/2023, 5:29 AM
Hi, I am trying to optimize deployments to prefect cloud. I am creating from a github action 6 deployments and 8 blocks (aws credentials, s3 bucket, 6 ecs task blocks), using Python API. Currently this takes around 12 mins to deploy, which is too slow for us, as we will be adding a lot more. The storage block is an S3 bucket, infrastructure block for every flow is an ECSTask. Does anyone has an idea why it's slow and how can we optimize this?
r

redsquare

01/05/2023, 7:15 AM
We use gh actions and it takes nowhere near this long. Are they sequential or do you run them in parallel via a matrix...can you share any of the action yml.
1
b

Ben Muller

01/05/2023, 8:01 AM
+1 to this @Tibs - I run all my
prefect build apply
commands concurrently with the
&
command but for a repo with 20ish deployments it can take about 10 minutes per environment ( prod | staging ) to deploy. It has been a massive time suck. Would be great to get a batch deploy or something of the kind...
t

Tibs

01/05/2023, 8:07 AM
it is a sequential script in our case, using a python script, I can look to change to a parallel execution, however, as per @Ben Muller's point, still would be slow when we get to more deployments.
r

redsquare

01/05/2023, 8:12 AM
if you use a matrix they can all run in parallel (depending on node availability) https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs
we have a deployment per flow and can choose to deploy only flows that have changed or a separate workflow for everything
t

Tibs

01/05/2023, 8:20 AM
how do you detect what are the flows that have changes?
that gets me a list of flow folders that have changed which is the matrix for parallel deploys
I too was worried about what happens when we have dozens of flows so tackled this early
b

Ben Muller

01/05/2023, 8:35 AM
This would assume there are no other modules of shared code in the repo and all the logic lives within the flows folder.
r

redsquare

01/05/2023, 8:37 AM
on build we copy down a shared module folder that geta deployed with the flow, for development we symlink
👍 1
we found it preferable over packaging them up
a

Anna Geller

01/05/2023, 1:27 PM
you are most likely uploading all code to S3 during build? there is no reason for deployment build to take that long regardless of parallel or not, unless you keep uploading everything
t

Tibs

01/05/2023, 1:36 PM
@Anna Geller yes, I think this is the case
a

Anna Geller

01/05/2023, 1:37 PM
you could follow the same approach I did - only upload once with the maintenance flow and --skip-upload in the matrix build command
t

Tibs

01/05/2023, 1:40 PM
@Anna Geller But when there are changes in the flows, shouldn't files be reloaded to S3 to reflect the changes, so that the agent picks up the updated version? I might be misunderstanding something here
a

Anna Geller

01/05/2023, 1:43 PM
btw in the coming weeks I'll try to make that more modular in a separate branch, might be easier to use then
t

Tibs

01/05/2023, 1:48 PM
@Anna Geller cool, seems like I have a lot to modify on my side, thanks a lot for sharing!
🙌 1
b

Ben Muller

01/05/2023, 3:45 PM
So @Anna Geller what you're saying is that if I have 10 flows I can skip upload for 9 of them and everything will work fine?
a

Anna Geller

01/05/2023, 4:29 PM
I only do upload for a maintenance flow that is not in the flows/ directory - the normal build for all flows/ iterated over in a matrix doesn't upload to S3
and yes, you could even upload the flows manually to S3 via: aws s3 sync yourproject/ s3://yourbucket/yourproject/ and this would also work
but the approach I shared seemed the easiest to me - we always upload to make sure the project is up to date and deployments are only created for new/modified flows so that e.g. if you add new parameters to your flow function, the parameter gets properly updated on a deployment
b

Ben Muller

01/05/2023, 6:19 PM
I actually bake all my code into a custom image anyway. Is there a way to tell prefect that it's there?
a

Anna Geller

01/05/2023, 7:35 PM
yes, 100%! you would set --path e.g.
--path /opt/prefect
having flows baked into Docker image is fully supported and totally encouraged - LMK if sth doesn't work, I can investigate
b

Ben Muller

01/05/2023, 7:35 PM
Path on my EcsTask block? And then I don't need to upload anything in my deployments?
a

Anna Geller

01/05/2023, 7:35 PM
spot on!
the path would be set on the build command
b

Ben Muller

01/05/2023, 8:16 PM
Interesting. I'll give it a go today and report back. That would be enormous for us.
🙌 1
Hey @Anna Geller - just looking at this now - you say it would be set on the build command. Do you mean the
save
command ? I can not see a build command on the
ECSTask
block? Or do you mean the path on the
prefect deployment build
command ?
a

Anna Geller

01/05/2023, 9:17 PM
on build
Copy code
prefect deployment build myflow.py:myflow -ib ecs-task/default --path /opt/flows -n default -a
b

Ben Muller

01/05/2023, 9:20 PM
gotcha
thanks
I also have to now remove the
--storage-block
argument or it will look in s3 by default, yeah?
After following the instructions I had no luck. Kept getting errors when running the flow:
Copy code
Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 262, in retrieve_flow_then_begin_flow_run
    flow = await load_flow_from_flow_run(flow_run, client=client)
  File "/usr/local/lib/python3.10/site-packages/prefect/client/utilities.py", line 47, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect/deployments.py", line 166, in load_flow_from_flow_run
    await storage_block.get_directory(from_path=deployment.path, local_path=".")
  File "/usr/local/lib/python3.10/site-packages/prefect/filesystems.py", line 143, in get_directory
    copytree(from_path, local_path, dirs_exist_ok=True)
  File "/usr/local/lib/python3.10/shutil.py", line 556, in copytree
    with os.scandir(src) as itr:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/flows'
For the record it only saved 525 seconds ~ in deploy time per environment ( I am deploying 12 flows ) The costly part seems to be the generation of the yml file in the
build
command
would I not need to prepend the flow file location to each of the
--path
options ?
--path {work_dir}/flows/my_flow
?
t

Tibs

01/10/2023, 10:18 AM
So, I was able to take make the process faster by running the maintenance step and skip_uploads for the other flows. Although I did this using python scripts instead of doing it in the github actions template. Also caching python dependencies in the github workflow helped.
3 Views