when using docker storage and/or remote executors,...
# prefect-community
b
when using docker storage and/or remote executors, what’s considered the best practice for shipping code around? does it make sense to rely on cloudpickle to send any required non-third party code via the flow itself, or should everything run from the same base image that has the needed internal functions?
c
Based on my own experience, I recommend: - putting the code in a common base image - ideally converting the code into an “installable” package so that PATH issues don’t arise the reason is that it’s hard to know apriori what the working directory for the Flow process is (although, we could document this and enforce a common location). Additionally, any code that is imported based on file location will be interpreted by cloudpickle as a module, so that can just cause some headaches with newer users if the flow ends up running in a place with different relative file paths
if your code isn’t packaged already, you could also just add the files to the Docker image’s PATH so that they are universally importable, but either way I think it’s easier to customize Docker than to customize cloudpickle
b
thanks @Chris White! all our code is already in 1 rather large Docker image that I’m using as my
Storage
base, so for now everything works fine. just trying to think through the case where a worker cluster stays running, and users submit flows that could have slightly different versions of some of the involved functions…safest best is definitely to always rebuild the base image and redeploy everything, but it’d be nice to avoid that in some cases if the changes are really small
c
oh that makes a lot of sense; I’m sure someone could tell us why this isn’t the best idea, but if your cluster is long-standing and not adaptive, you could exec into the workers and update the code manually. Because of how cloudpickle works, your tasks should actually import the new code (this might warrant a test to make sure)
👍 1
Hi @Brett Naul - I’ve been thinking a little more about this as I think it could end up being a common situation; if you are very conscious of the versions you require, you could always tag your dask workers with resources identifying their versions, and then tag your corresponding Prefect Tasks to ensure they are only submitted to the appropriately versioned workers. Additionally, for continually upgrading code on the workers, it seems there are hooks that could possibly achieve this: https://github.com/dask/distributed/issues/2767
not 100% sure how robust / easy this would all be to pull off in practice, but there might be something there