Thread
#prefect-community
    Artem Vysotsky

    Artem Vysotsky

    4 months ago
    how does Prefect 2.0 deal with dependencies? Does agent need all deps that flow requires? Also, what if the flow file depends on local files? I.e. Here is my
    flow
    directory:
    (venv) ➜  src git:(main) ✗ ls -la flow                                 
    drwxr-xr-x  6 avysotsky  staff   192 May  9 08:47 .
    drwxr-xr-x  5 avysotsky  staff   160 May  9 08:34 ..
    -rw-r--r--  1 avysotsky  staff  3301 May  9 08:47 flow.py
    -rw-r--r--  1 avysotsky  staff   870 May  8 10:11 graphql.py
    -rw-r--r--  1 avysotsky  staff   611 May  9 08:47 prefect_client.py
    And here is how I create the deployment:
    d = DeploymentSpec(
            flow_location="./flow/flow.py",
            name=name,
            schedule=CronSchedule(
                cron=schedule
            ),
            tags=[
                f"user_id:{user_id}",
                f"job_id:{job_id}"
            ],
            parameters={
                user_id: user_id,
                job_id: job_id
            }
        )
    Is Prefect smart enough to pull in all the deps that flow.py needs?
    Anna Geller

    Anna Geller

    4 months ago
    Does agent need all deps that flow requires?
    not at all! there is a separation of concern now in that the
    flow_runner
    is responsible for deploying the relevant infrastructure such as a Docker container or a Kubernetes job - the agent is only responsible for picking up scheduled runs from the work queue and the
    flow_runner
    takes care of the entire infrastructure work
    Is Prefect smart enough to pull in all the deps that flow.py needs?
    depends on the
    flow_runner
    you choose - you may create a virtual environment with conda and point your flow runner at the relevant environment that has all the dependencies - see
    Artem Vysotsky

    Artem Vysotsky

    4 months ago
    So, let me rephrase it. Say, my flow.py has a required dependency file graphql.py and I point DeploymentSpec to flow.py. In this case DeploymentSpec is smart enough to package graphql.py and then agent is smart enough to use that dependency?
    I think I actually don’t understand how exactly Prefect sends a given flow to an agent
    Anna Geller

    Anna Geller

    4 months ago
    maybe you can check the relevant flow runner code? I think the virtual environment sounds like a good approach to test out in your use case - if the graphql.py is available in that conda environment, you should be good to go, but the easiest would be to give it a try and see which flow runner works best for you my main point was that flow runner is the right place for you to explore dependency management
    Artem Vysotsky

    Artem Vysotsky

    4 months ago
    I see
    I have to explicitly specify the runner
    Anna Geller

    Anna Geller

    4 months ago
    is smart enough to package graphql.py
    by default, Prefect doesn't package any code for you, you would need to do it e.g. as part of CI/CD and point at it in your flow runner
    Artem Vysotsky

    Artem Vysotsky

    4 months ago
    Do you have an example?
    Anna Geller

    Anna Geller

    4 months ago
    Orion docs have the best examples so far, but it depends on flow runner type
    e.g. with Docker flow runner, you need to build an image and point at this image
    Artem Vysotsky

    Artem Vysotsky

    4 months ago
    I can’t find anywhere in the docs where it says that I have to build a docker image https://orion-docs.prefect.io/tutorials/docker-flow-runner/
    Anna Geller

    Anna Geller

    4 months ago
    well, you don't always have to do that - if you don't require other dependencies than Prefect, then just doing:
    flow_runner=DockerFlowRunner()
    will be enough but to be fair, your point is totally valid, and we are working on easier way of packaging code dependencies, it will get easier in the future 🤞 LMK if you need more help building a Docker image, I can try to build an example
    Artem Vysotsky

    Artem Vysotsky

    4 months ago
    So, let me clarify why I got confused. Since the storage configuration is a required step to run prefect flow, I assumed that the packaging is a solved problem. Why else do you need the storage then? I.e. what is the point of creating docker image AND storing the flow file on a blob store? Why not just package the entire flow into a docker image?
    Anna Geller

    Anna Geller

    4 months ago
    Why else do you need the storage then?
    mainly due to the hybrid execution model to respect your privacy; think of Storage as a map to where your flow is located - it can point to an object in S3, to a local file in Local storage or to a flow file on GitHub (not available yet, on the roadmap)
    Storage = flow code Flow runner = infrastructure and code dependencies
    what is the point of creating docker image AND storing the flow file on a blob store?
    I can understand the confusion but think of a use case when your flow code may change very frequently but your code dependencies don't - you may always rely on the same Snowflake, Pandas, scikit-learn, and dbt package versions, but your definition of the data flow (your data transformations, ML models, etc) may change frequently as your business use case evolves It also allows you to reuse the same image across multiple flows, which is often required by many teams who don't wish to have one image per flow, which can also get "heavy" and storage-intensive