Robert Banick
05/19/2023, 3:50 PMmain
.
Now setting up Prefect I’m having trouble designing a similar system. I’m using Prefect 2 on AWS Elastic Container Service (Fargate) tasks to implement ETL runs. We install our ETL repo as a library onto a docker image that the ECSTask
block Task Definition uses.
I’ve tried replicating our previous system by running pip install --upgrade git@<repo>@<branch>
to upgrade the package in question at the very beginning of my flow. This works and I’m even able to see that the function I’m modifying is indeed updated in /usr/local/lib/python3.10/dist-packages/<package>
.
Nevertheless, when my flow run reaches the crucial step I’m testing, it very clearly uses (and fails on) the “old” function currently in the ETL repo main
. Forcing reloading the repository in question via importlib.reload(<package>)
does not appear to resolve the problem.
My questions therefore are:
1. Is it possible to change a library mid-flow like this or is it a hard limitation of Python / Prefect?
2. Is Python code used by flows somehow installed somewhere different from /usr/local/lib/python3.10/dist-packages/
on the container? Such that pip
would install in the wrong place…
3. If it’s not possible to change a library mid-flow, is it possible to have the Prefect Agent/Worker run the pip
installs prior to spinning up the flow? Could the Agent even read the desired branch names from the flow parameters?
4. Any other ideas?
The nuclear option here is manually changing the branches on docker images but that’s very clunky and will make iterative testing extremely time consuming. So we’d really like to avoid that path.
All help and suggestions most appreciated,
RobertZanie
05/19/2023, 3:59 PMEXTRA_PIP_PACKAGES
option and it’ll get installed before your flow is loadedRobert Banick
05/19/2023, 4:03 PMAustin Weisgrau
05/19/2023, 4:08 PMimportlib
, but probably easier to delay the import until after the correct source code is in placeRobert Banick
05/19/2023, 4:12 PMEXTRA_PIP_PACKAGES
variable within the Infrastructure Overrides
of a deployment? Or better to modify Environment Variables
of the ECSTask Block?Austin Weisgrau
05/19/2023, 4:13 PMfrom prefect import flow, task
@task
def my_task():
import mypackage
mypackage.foobar()
@flow
def myflow():
reinstall_package()
my_task()
Zanie
05/19/2023, 4:21 PMsubprocess.run("pip" ...)
import ...
pip install
yourself in the command at that point.Robert Banick
05/19/2023, 4:25 PMInfrastructure Override
to deployment does not worksubprocess
method won’t work — Python seems to line up a snapshot of all the libraries it’s going to import at runtime and changes afterwards don’t really registerEXTRA_PIP_PACKAGES
route is more promising but we can’t get it working with Infra Overrides
or Environment Variables
— possibly we are mis-specifying so could be user error
@Zanie could you explain in a bit more detail what you meant w/ regards to entrypoints not being respected? Where would I modify the container command — on the agent? Sorry if this question is naive, I’m quite new to both Prefect and AWS land,Zanie
05/19/2023, 4:40 PMECSTask
let’s you configure the command
runpython -m prefect.engine
to enter our enginepip install … && python -m prefect.engine
/opt/prefect/entrypoint.sh python -m prefect.engine
bash -c "…"
or something)Robert Banick
05/19/2023, 4:44 PMCommand
nowcommand=["/opt/prefect/entrypoint.sh","python","-m","prefect.engine"]
in the ECS Task code I’m now getting the below error
Submission failed. RuntimeError: Timed out after 120.72852396965027s while watching task for status RUNNING
pip install
runscommand=["pip","install","git+<https://github.com/Arbol-Project/gridded-etl-tools@popen_IO>","&&","python","-m","prefect.engine"], : Flow run infrastructure exited with non-zero status code 2.
command=["/opt/prefect/entrypoint.sh","python","-m","prefect.engine"] : `Submission failed. prefect_aws.ecs.TaskFailedToStart: CannotStartContainerError: ResourceInitializationError: failed to create new container runtime task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/opt/prefect/entrypoint.sh": stat /opt/prefect/entrypoint.sh: no such file or directory: unknown`
command=["bash","-c","\"","pip","install","git+<https://github.com/Arbol-Project/gridded-etl-tools@popen_IO>","&&","python","-m","prefect.engine","\""] : Flow run infrastructure exited with non-zero status code 2.
Zanie
05/19/2023, 5:28 PM["bash", "-c", "pip install git+<https://github.com/Arbol-Project/gridded-etl-tools@popen_IO> && python -m prefect.engine"]
Robert Banick
05/19/2023, 5:29 PMsubprocess
commands, but here that’s not the caseECSTask
block Task Definition uses.”command=["bash", "-c", "pip install git+<https://github.com/Arbol-Project/gridded-etl-tools@popen_IO> && python -m prefect.engine"]
works, thank you very much!
Hard coding the repo and package is not the optimum workflow here so I’d love to get the /opt/prefect/entrypoint.sh
command working. Since we’re using our own docker images would you suggesting replicating the script as part of our docker image setup so we can run the pip install $EXTRA_PIP_PACKAGES
component of it?Zanie
05/19/2023, 6:07 PMECSWorker
you can create custom template variables for those and they’ll show up in the UI — a little advanced but might be what you need.Robert Banick
05/19/2023, 6:10 PMZanie
05/19/2023, 6:27 PMRobert Banick
05/19/2023, 7:49 PMentrypoint.sh
into the command of our ECSTask, like so
command=["bash", "-c", 'if [ ! -z "$EXTRA_PIP_PACKAGES" ]; then pip install $EXTRA_PIP_PACKAGES; fi && python -m prefect.engine'],
This allows us to use $EXTRA_PIP_PACKAGES
overrides from within the UI as if the script were being run.
Thank you so much for your support @Zanie, I would have lost days on this before I figured it out otherwise!