I want to know what exactly does the prefect agent...
# ask-community
t
I want to know what exactly does the prefect agent pull from storage ? I'm asking this because in the main flow file which is
Copy code
with Flow(...) as flow:
    a = first_task()
    b = second_task()
say there are tasks which are defined and imported from other files (which indeed call other tasks and so on..) There is no way I can look at just this main file and tell how the entire flow is gonna look like ? What kind of dependencies all the tasks will have, retries, etc, etc Basically there isn't much info I can get . So what are we doing here ? (and why?)
a
which Storage type do you use?
How the entire flow is gonna look like ?
you can use
flow.visualize()
for that
in general, Prefect pulls only the flow file or pickled flow from storage, nothing else. All other dependencies should be included in the execution environment, e.g. your Docker image that you pass to the run config.
t
Github Storage
Yeah I do have an execution environment wrapped up in a docker image
But can the execution environment contain prefect tasks ?
a
with Github, Prefect in theory clones the entire repo, but don’t rely on that, you should treat it as if Prefect pulls only the relevant flow file. Everything else must be in the execution env
your tasks should normally be defined within your flow file, the same flow file that gets pulled from storage
t
But can I not have them all over in folders and subfolders as files as imports ?
(like basically I want to import tasks from other files into the main flow file) because i like it organizd
a
You should ideally install all your folders and subfolders as modules and have them baked into your Docker image
an alternative is using a local agent where you can simply point Prefect to a specific directory that contains the files you need
t
Yes, I have spread normal functions in folders and subfolders. But can I spread prefect task functions the same way and bake it into the docker file?
a
then you would need to use e.g. Docker storage and build the image upon registration - this way your tasks living somewhere else are still copied into the storage image
with Github Storage you can’t do that
t
okay.. how do I migrate to Docker Storage ?
a
t
hmmm. you dont have docker storage with vertex agent
I use VertexRun to which I supply a docker image of the current environment.
a
the Storage works the same way really regardless of which container-based agent you use (Docker, ECS, Kubernetes, Vertex)
t
Copy code
docker_storage = Docker(
    image_name="community",
    image_tag="latest",
   dockerfile="/Users/anna/repos/packaging-prefect-flows/Dockerfile",
)
Hey @Anna Geller , I want to know if it is possible for it to pull from DockerHub and second question - how does it know where all the files are in the Docker Image ? Like I would have mine at /app/flow.py and some of the @task functions that it calls would be imported from say /app/mini_tasks.py
a
it can be pushed to Dockerhub 🙂 Prefect will be pulling the image during flow run. the flow is pickled on top of your image during registration - this is how Prefect knows how to find the flow
t
i~t can be pushed to Dockerhub~ Prefect will be pulling the image during flow run
1. @Anna Geller Show me how I can make it . I have a docker image in the hub but what arguments do I pass? "Docker('user/repo:tag')"?
the flow is pickled on top of your image during registration
2. I don't understand what that means(and I know you know that too 😉), but I am gonna try it whatever and will get back to you
a
You only need to pass the image really, plus your dependencies - either as a list of python_dependencies or dockerfile
Copy code
from prefect.storage import Docker

Docker(
    image_name="<http://gcr.io/prefect-community/demos/community|gcr.io/prefect-community/demos/community>",
    image_tag="latest",
    python_dependencies=["pandas", "scikit-learn"],
)
or:
Copy code
from prefect.storage import Docker

Docker(
    image_name="<http://gcr.io/prefect-community/demos/community|gcr.io/prefect-community/demos/community>",
    image_tag="latest",
    dockerfile="/path/to/Dockerfile",
)
t
Copy code
Docker(
    image_name="<http://gcr.io/prefect-community/demos/community|gcr.io/prefect-community/demos/community>",
    image_tag="latest",
    dockerfile="/path/to/Dockerfile",
)
@Anna Geller doubt - If it can pull the image from docker hub then why does it need path to dockerfile ?
I would have built the image and then pushed it to docker hub no ?
a
when you define a Docker storage and you then register your flow e.g. using CLI “prefect register --project yourproject -p your_flow.py”, then this triggers a Docker image build process. Your docker image gets built locally and gets pushed to the GCR to the repository we specified under image_name and tag. When prefect then runs the flow, it pulls the image. Is it clearer now? 🙂