Sean Harkins
11/18/2020, 5:12 PMcloudpickle
serializes my Flow's dependencies with S3 storage. Based on some information outlined here https://prefect-community.slack.com/archives/C014Z8DPDSR/p1605200879483400 I've got S3 storage configured and working but it seems that my Flow's upstream dependencies are not pickled when I register the flow as indicated in https://docs.prefect.io/core/advanced_tutorials/task-guide.html#task-inputs-and-outputs. Specifically I receive the following dependency error Unexpected error: ModuleNotFoundError("No module named 'h5netcdf'")
. I can build an image with the necessary dependencies for use with the DaskExecutor
and everything works correctly but our goal is to decouple our execution environment from Flows.
Am I misunderstanding how Flow dependencies should be serialized by cloudpickle
. Is there another approach I should be considering in this case? Thanks in advance.Jim Crist-Harif
11/18/2020, 5:16 PMStorage
class manages getting the contents of this to an execution environment.
• The Python dependencies your code relies on (for example h5netcdf
). These need to be handled separately by you using e.g. a docker image, conda environments, etc...
If you want to decouple "execution environment" from flows I recommend having a superset of all dependencies needed by all flows in an environment - a prefect storage class has no way of moving around anything except your flow code itself.S3
storage to manage only your flow code. Then all flows share the same image.S3
storage + image pair to configure a flow. Up to you.Sean Harkins
11/18/2020, 5:25 PMdask-gateway
on some other projects so I'm psyced to see you are working with Prefect. Thanks again.Jim Crist-Harif
11/18/2020, 5:25 PMChris White
Marvin
11/18/2020, 11:29 PM