https://prefect.io logo
Title
r

Robert Banick

05/19/2023, 3:50 PM
Does anyone have experience with DevOps operations within the context of a flow? My team uses Github branches of our main ETL repo to develop new features or fixes for our ETLs. In our previous orchestration stack we could swap these branches in to the container server we spun up for a given ETL run. This helped us test the feature/fix in a production environment before merging to the ETL repo
main
. Now setting up Prefect I’m having trouble designing a similar system. I’m using Prefect 2 on AWS Elastic Container Service (Fargate) tasks to implement ETL runs. We install our ETL repo as a library onto a docker image that the
ECSTask
block Task Definition uses. I’ve tried replicating our previous system by running
pip install --upgrade git@<repo>@<branch>
to upgrade the package in question at the very beginning of my flow. This works and I’m even able to see that the function I’m modifying is indeed updated in
/usr/local/lib/python3.10/dist-packages/<package>
. Nevertheless, when my flow run reaches the crucial step I’m testing, it very clearly uses (and fails on) the “old” function currently in the ETL repo
main
. Forcing reloading the repository in question via
importlib.reload(<package>)
does not appear to resolve the problem. My questions therefore are: 1. Is it possible to change a library mid-flow like this or is it a hard limitation of Python / Prefect? 2. Is Python code used by flows somehow installed somewhere different from
/usr/local/lib/python3.10/dist-packages/
on the container? Such that
pip
would install in the wrong place… 3. If it’s not possible to change a library mid-flow, is it possible to have the Prefect Agent/Worker run the
pip
installs prior to spinning up the flow? Could the Agent even read the desired branch names from the flow parameters? 4. Any other ideas? The nuclear option here is manually changing the branches on docker images but that’s very clunky and will make iterative testing extremely time consuming. So we’d really like to avoid that path. All help and suggestions most appreciated, Robert
z

Zanie

05/19/2023, 3:59 PM
You can add your library as an
EXTRA_PIP_PACKAGES
option and it’ll get installed before your flow is loaded
You’d probably need to set the branch name as an environment variable and template it in that way instead.
You can probably upgrade the package from within your run if you defer all imports until after it is upgraded.
👍 1
r

Robert Banick

05/19/2023, 4:03 PM
Hmmm OK this could work
How do I differ imports until upgrades?
Like run the imports within a flow task instead of in the flow header?
👍 1
a

Austin Weisgrau

05/19/2023, 4:08 PM
You could also look carefully at your code and determine if your package is being imported before or after the pip reinstall. It's possible to reload a module after updating source code with
importlib
, but probably easier to delay the import until after the correct source code is in place
r

Robert Banick

05/19/2023, 4:12 PM
@Zanie would the idea be to pass the
EXTRA_PIP_PACKAGES
variable within the
Infrastructure Overrides
of a deployment? Or better to modify
Environment Variables
of the ECSTask Block?
We could potentially set up one or several dedicated development ECSTask blocks and use those. It’s less elegant than a parameter within a deployment or a flow run but far better than manually rebuilding docker
a

Austin Weisgrau

05/19/2023, 4:13 PM
run the imports in a task instead of in the header - that's right.
from prefect import flow, task

@task
def my_task():
    import mypackage
    mypackage.foobar()

@flow
def myflow():
   reinstall_package()
   my_task()
z

Zanie

05/19/2023, 4:21 PM
Or you can like
subprocess.run("pip" ...)

import ...
You can pass the variable either way. On ECS though our entrypoint is not respected by default and you’ll need to update the container command to call the entrypoint then call the engine. You might as well just run
pip install
yourself in the command at that point.
r

Robert Banick

05/19/2023, 4:25 PM
Trialing…
Passing an
Infrastructure Override
to deployment does not work
We’re pretty sure the
subprocess
method won’t work — Python seems to line up a snapshot of all the libraries it’s going to import at runtime and changes afterwards don’t really register
I am pretty sure this is a hard limitation of Python. Will noodle but I’m not optimistic
the
EXTRA_PIP_PACKAGES
route is more promising but we can’t get it working with
Infra Overrides
or
Environment Variables
— possibly we are mis-specifying so could be user error @Zanie could you explain in a bit more detail what you meant w/ regards to entrypoints not being respected? Where would I modify the container command — on the agent? Sorry if this question is naive, I’m quite new to both Prefect and AWS land,
z

Zanie

05/19/2023, 4:40 PM
ECS bypasses image entrypoints
The
ECSTask
let’s you configure the
command
run
it defaults to
python -m prefect.engine
to enter our engine
You could modify it to
pip install … && python -m prefect.engine
or
/opt/prefect/entrypoint.sh python -m prefect.engine
(the first might require
bash -c "…"
or something)
r

Robert Banick

05/19/2023, 4:44 PM
Where would I change this on the ECS task? In the console and/or as Python code?
Ah sorry, I see
Command
now
A quick question, how does one adjust the timeout period for the Console / Agent? Implementing this command
command=["/opt/prefect/entrypoint.sh","python","-m","prefect.engine"]
in the ECS Task code I’m now getting the below error
Submission failed. RuntimeError: Timed out after 120.72852396965027s while watching task for status RUNNING
I think this is a good thing and means it’s timing out while the
pip install
runs
ah nvm task timeout seconds are under the block definition
@Zanie unfortunately none of the previewed command options are working. The error messages are not very helpful, any ideas what could be the issue?
command=["pip","install","git+<https://github.com/Arbol-Project/gridded-etl-tools@popen_IO>","&&","python","-m","prefect.engine"], : Flow run infrastructure exited with non-zero status code 2.

command=["/opt/prefect/entrypoint.sh","python","-m","prefect.engine"]  : `Submission failed. prefect_aws.ecs.TaskFailedToStart: CannotStartContainerError: ResourceInitializationError: failed to create new container runtime task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/opt/prefect/entrypoint.sh": stat /opt/prefect/entrypoint.sh: no such file or directory: unknown`

command=["bash","-c","\"","pip","install","git+<https://github.com/Arbol-Project/gridded-etl-tools@popen_IO>","&&","python","-m","prefect.engine","\""] : Flow run infrastructure exited with non-zero status code 2.
z

Zanie

05/19/2023, 5:28 PM
The first fails because && requires bash Not sure why the second fails, are you using our official image? The third fails because you’re quoting too much
["bash", "-c", "pip install git+<https://github.com/Arbol-Project/gridded-etl-tools@popen_IO> && python -m prefect.engine"]
r

Robert Banick

05/19/2023, 5:29 PM
Got it
Yes OK I’ve been used to passing each string separately for
subprocess
commands, but here that’s not the case
On the second — we are providing our own docker image via a Task Definition, as per the original post above “We install our ETL repo as a library onto a docker image that the
ECSTask
block Task Definition uses.”
@Zanie your suggestion of
command=["bash", "-c", "pip install git+<https://github.com/Arbol-Project/gridded-etl-tools@popen_IO> && python -m prefect.engine"]
works, thank you very much! Hard coding the repo and package is not the optimum workflow here so I’d love to get the
/opt/prefect/entrypoint.sh
command working. Since we’re using our own docker images would you suggesting replicating the script as part of our docker image setup so we can run the
pip install $EXTRA_PIP_PACKAGES
component of it?
z

Zanie

05/19/2023, 6:07 PM
If you use the new
ECSWorker
you can create custom template variables for those and they’ll show up in the UI — a little advanced but might be what you need.
I presume that environment variables are also available in the command as you’ve set it now
r

Robert Banick

05/19/2023, 6:10 PM
Where in the UI would they show up — under the ECSTask or Deployments? I’m guessing this is for an ECS Worker?
z

Zanie

05/19/2023, 6:27 PM
What do you mean?
r

Robert Banick

05/19/2023, 7:49 PM
Hi sorry I don’t fully understand how the workers interact with the UI so am not well equipped to use what you said — but thank you. We’ll put this on our road map to move over to a worker when time allows. For now we recreated the essential functionality by coding the relevant if/fi conditions from
entrypoint.sh
into the command of our ECSTask, like so
command=["bash", "-c", 'if [ ! -z "$EXTRA_PIP_PACKAGES" ]; then pip install $EXTRA_PIP_PACKAGES; fi && python -m prefect.engine'],
This allows us to use
$EXTRA_PIP_PACKAGES
overrides from within the UI as if the script were being run. Thank you so much for your support @Zanie, I would have lost days on this before I figured it out otherwise!