Michael Warnock
07/17/2021, 4:12 PMfeature-generator
which contains both worker/orchestration logic and the code for doing the work. I added task and flow definitions to it, but with github storage, the flow can't find the other modules in that repo (I've seen https://github.com/PrefectHQ/prefect/discussions/4776 and understand this is intentional).
My question is how best to structure things so that my flow can use that repo's code, but also execute a parameterized run from feature-generator
, on commit, through CI (because that's how we start the job right now). Obviously, I can make feature-generator
a package and depend on it from a new flows
repo, but to have feature-generator
start the run would create a circular dependency. Would you split it into three repos, with one of them just being responsible for executing the flow? I don't love that idea, but maybe that's best practices?Kevin Kho
feature-generator
available to the flow. Because you have a module already, we normally recommend putting stuff in a Docker container (as a module), copying it over, and using pip install -e .
. I have a minimal example for this one that might help if you haven’t done it yet.
After the image is hosted somewhere, the flow will grab that image and run it on top of that container. I think the CI/CD process can start the run on commit. You would register the flow, and then run using the CLI of Prefect. (prefect register ….
, prefect run ...
). This would give you the registration and the one time run.
A very common complaint is that the Docker requires a re-build with this approach. It’s not particularly bad with the Docker cache, but some users rely on the image hash being the same every time. If ever you want a setup that allows you to not always build, you can put all of the dependencies in the Docker container and leave that static. When the flow runs, it will pull that image and run on top of it. In this setup, the dependencies are decoupled from the flow.
But it seems if you want them side by side, you need to do the re-builds.Michael Warnock
07/17/2021, 4:38 PMKevin Kho
DockerRun
as the Run Configuration seen here to specify that image.Michael Warnock
07/17/2021, 4:45 PMprefect register & run
as CLI utils?Kevin Kho
Michael Warnock
07/17/2021, 4:50 PMKevin Kho
Client
to create a flow run with client.create_flow_run(flow_id)
. And then run the python script with python ____.py
in your CI/CD.Michael Warnock
07/17/2021, 4:51 PMKevin Kho
Michael Warnock
07/17/2021, 4:54 PMKevin Kho
docker-py
doesn’t expose the flag to add it. I feel like in the case of GPU, maybe we should rely on Coiled to have the Docker image and dependencies. You can then just continue to use Github storage and that would run on top of the Coiled image for the cluster.software_environment
. I am not seeing anything immediate in their docs. Want me to ask on their Slack for you?Michael Warnock
07/17/2021, 5:05 PMKevin Kho
RunConfiguration
or LocalRun
, have all of the Docker stuff be handled by Coiled’s software_environment. Install your custom module with the link above. This makes your library on all Dask workers along with other dependencies. Specify that software environment when you choose an executor. This all lets you stay with Github Storage.Michael Warnock
07/17/2021, 5:11 PMKevin Kho
Michael Warnock
07/17/2021, 5:13 PMKevin Kho
map
ever becomes a bottleneck to efficient resource management, we would have to move to Dask code and some Dask mechanisms such as annotations that would allow you to specify resources. Prefect would then orchestrate the Dask code.
4. You used GithubStorage and ran into module issues previously. You would need to install the module on the Dask workers/schedulers. You would need to package your scripts as a Python module and install it like this on the Coiled Slack. (Coiled can help you with that if you need more advice)
5. The agent would need the modules installed. This can be avoided by importing the modules inside your tasks so that the import is deferred.
Let me know if you have any questions.Michael Warnock
07/19/2021, 4:58 PMKevin Kho
annotations
and move away from the Prefect map.Michael Warnock
07/19/2021, 5:06 PM