how does Prefect 2 0 deal with dependencies Does agent need Prefect Community #ask-community

how does Prefect 2.0 deal with dependencies? Does ...

Artem Vysotsky

05/09/2022, 12:51 PM

how does Prefect 2.0 deal with dependencies? Does agent need all deps that flow requires? Also, what if the flow file depends on local files? I.e. Here is my

flow

directory:

Copy code

(venv) ➜  src git:(main) ✗ ls -la flow                                 
drwxr-xr-x  6 avysotsky  staff   192 May  9 08:47 .
drwxr-xr-x  5 avysotsky  staff   160 May  9 08:34 ..
-rw-r--r--  1 avysotsky  staff  3301 May  9 08:47 flow.py
-rw-r--r--  1 avysotsky  staff   870 May  8 10:11 graphql.py
-rw-r--r--  1 avysotsky  staff   611 May  9 08:47 prefect_client.py

And here is how I create the deployment:

Copy code

d = DeploymentSpec(
        flow_location="./flow/flow.py",
        name=name,
        schedule=CronSchedule(
            cron=schedule
        ),
        tags=[
            f"user_id:{user_id}",
            f"job_id:{job_id}"
        ],
        parameters={
            user_id: user_id,
            job_id: job_id
        }
    )

Is Prefect smart enough to pull in all the deps that flow.py needs?

Anna Geller

05/09/2022, 1:47 PM

Does agent need all deps that flow requires?

not at all! there is a separation of concern now in that the

flow_runner

is responsible for deploying the relevant infrastructure such as a Docker container or a Kubernetes job - the agent is only responsible for picking up scheduled runs from the work queue and the

flow_runner

takes care of the entire infrastructure work

Anna Geller

05/09/2022, 1:49 PM

Is Prefect smart enough to pull in all the deps that flow.py needs?

depends on the

flow_runner

you choose - you may create a virtual environment with conda and point your flow runner at the relevant environment that has all the dependencies - see

Artem Vysotsky

05/09/2022, 1:50 PM

So, let me rephrase it. Say, my flow.py has a required dependency file graphql.py and I point DeploymentSpec to flow.py. In this case DeploymentSpec is smart enough to package graphql.py and then agent is smart enough to use that dependency?

Artem Vysotsky

05/09/2022, 1:52 PM

I think I actually don’t understand how exactly Prefect sends a given flow to an agent

Anna Geller

05/09/2022, 1:54 PM

maybe you can check the relevant flow runner code? I think the virtual environment sounds like a good approach to test out in your use case - if the graphql.py is available in that conda environment, you should be good to go, but the easiest would be to give it a try and see which flow runner works best for you my main point was that flow runner is the right place for you to explore dependency management

Artem Vysotsky

05/09/2022, 1:55 PM

I see

Artem Vysotsky

05/09/2022, 1:55 PM

I have to explicitly specify the runner

👍 1

Anna Geller

05/09/2022, 1:56 PM

is smart enough to package graphql.py

by default, Prefect doesn't package any code for you, you would need to do it e.g. as part of CI/CD and point at it in your flow runner

Artem Vysotsky

05/09/2022, 1:57 PM

Do you have an example?

Anna Geller

05/09/2022, 1:58 PM

Orion docs have the best examples so far, but it depends on flow runner type

Anna Geller

05/09/2022, 1:59 PM

e.g. with Docker flow runner, you need to build an image and point at this image

Artem Vysotsky

05/09/2022, 2:02 PM

I can’t find anywhere in the docs where it says that I have to build a docker image https://orion-docs.prefect.io/tutorials/docker-flow-runner/

Anna Geller

05/09/2022, 2:33 PM

well, you don't always have to do that - if you don't require other dependencies than Prefect, then just doing:

Copy code

flow_runner=DockerFlowRunner()

will be enough but to be fair, your point is totally valid, and we are working on easier way of packaging code dependencies, it will get easier in the future 🤞 LMK if you need more help building a Docker image, I can try to build an example

Artem Vysotsky

05/09/2022, 2:39 PM

So, let me clarify why I got confused. Since the storage configuration is a required step to run prefect flow, I assumed that the packaging is a solved problem. Why else do you need the storage then? I.e. what is the point of creating docker image AND storing the flow file on a blob store? Why not just package the entire flow into a docker image?

Anna Geller

05/09/2022, 2:48 PM

Why else do you need the storage then?

mainly due to the hybrid execution model to respect your privacy; think of Storage as a map to where your flow is located - it can point to an object in S3, to a local file in Local storage or to a flow file on GitHub (not available yet, on the roadmap)

Anna Geller

05/09/2022, 2:48 PM

Storage = flow code Flow runner = infrastructure and code dependencies

Anna Geller

05/09/2022, 2:52 PM

what is the point of creating docker image AND storing the flow file on a blob store?

I can understand the confusion but think of a use case when your flow code may change very frequently but your code dependencies don't - you may always rely on the same Snowflake, Pandas, scikit-learn, and dbt package versions, but your definition of the data flow (your data transformations, ML models, etc) may change frequently as your business use case evolves It also allows you to reuse the same image across multiple flows, which is often required by many teams who don't wish to have one image per flow, which can also get "heavy" and storage-intensive

17 Views

Open in Slack

Previous Next