Hi There, in Prefect 1 we exposed our code to a Do...
# prefect-community
Hi There, in Prefect 1 we exposed our code to a Docker container via a volume. Does anyone know if this is possible in Prefect 2? Everything seems to point to wanting to pull code down from a storage block.
This documentation makes me think it is possible.
However, when I create my deployment like:
Copy code
prefect deployment build /workspace/flows/endpoints.py:run_flow --path /workspace --name endpoints -q dev -ib docker-container/buddy -o endpoints.yaml
And then try to run it, I get an error that makes it seem like the flow code is still trying to be copied from somewhere.
Copy code
20:31:19.125 | ERROR   | Flow run 'loyal-jaguarundi' - Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 269, in retrieve_flow_then_begin_flow_run
    flow = await load_flow_from_flow_run(flow_run, client=client)
  File "/usr/local/lib/python3.10/site-packages/prefect/client/utilities.py", line 47, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect/deployments.py", line 175, in load_flow_from_flow_run
    await storage_block.get_directory(from_path=deployment.path, local_path=".")
  File "/usr/local/lib/python3.10/site-packages/prefect/filesystems.py", line 147, in get_directory
    copytree(from_path, local_path, dirs_exist_ok=True)
  File "/usr/local/lib/python3.10/shutil.py", line 559, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
  File "/usr/local/lib/python3.10/shutil.py", line 513, in _copytree
    raise Error(errors)
Hi Leo, I’m not sure what you’re looking to accomplish here - is there an issue with adding the code into the docker container directly? If there is, where is your code located normally? Typical practice is to store your code somewhere and load it via blocks if it’s not in the docker container. https://docs.prefect.io/concepts/storage/ Not to say you can’t do it that way, but I don’t think that’s either a common or documented usage pattern
I’m not sure what you’re looking to accomplish here - is there an issue with adding the code into the docker container directly?
Our team leverages feature branch based development. So our individual data projects and bug fixes live on short or long lived git branches. Our data engineers work locally on their laptops. We use Gitlab CI for automated testing. We have some automated QA that runs all of our pipelines and compares output between branches. Right now we use the same image for local development, CI/CD and production. This gives us really great parity between environments and makes debugging/dependency management pretty easy. We don't bake the code into the container because we'd need to create a copy of the image every time we create a feature branch. It also makes all flows with different branches a bit more laborious. I'm also not sure how that would work for local development.
If there is, where is your code located normally?
Our code is stored in a Gitlab repository. It's checked out locally for development, during cd and on our production server.
We're open to suggestions on how we might change our workflow to match the established patterns better.
To me, it just seems a little clunky to manage multiple images for different environments and branches, when we don't really have to.
I suppose we could checkout the code dynamically in the container.
I know this is supported on some level.
You should be able to use GitLab for your storage: https://discourse.prefect.io/t/how-to-use-gitlab-storage-block/2217
I presume it's possible to swap branches with run parameters?
So it can't change dynamically?
We'd need to create a deployment per branch basically.
The flow has to be loaded from somewhere when the deployment runs - you can create a block dynamically without saving it
so it can be run as deployment_build_from_flow and passing in a dynamic gitlab block you create on the fly with the branch you want
But the deployment object itself requires : a flow infrastructure to execute on (defaults to process if nothing provided) storage (defaults to none if nothing provided)
Ok, that seems reasonable. I guess I'm not clear on how the block block gets created dynamically. Would that be included in the flow definition?
It would look like:
Copy code
>>> storage = S3.load("dev-bucket") # load a pre-defined block
>>> deployment = Deployment.build_from_flow(
...     flow=my_flow,
...     name="s3-example",
...     version="2",
...     tags=["aws"],
...     storage=storage,
...     infra_overrides=dict("env.PREFECT_LOGGING_LEVEL"="DEBUG"),
>>> )
>>> deployment.apply()
But instead of S3.load(“dev-bucket”) you can create one on the fly to use as your storage
if a block is defined (e.g. storage = <block_definition>) but not saved, it just becomes an anonymous block that’s thrown away
Ok, I'll give this a shot. Thanks for helping me with all of this! Really appreciate it.
🙌 1