l

    Luis Muniz

    2 years ago
    Hi I thought I was being smart when modularizing the types of tasks, and having one module where I construct my flow:
    from tasks.collect.games import *
    from tasks.collect.streamers import *
    from tasks.collect.streams import *
    from tasks.enrich.game import *
    from tasks.enrich.streamer import *
    from tasks.enrich.stream import *
    from tasks.store.game import *
    from tasks.store.streamer import *
    from tasks.store.stream import *
    from tasks.util.common import *
    
    with Flow("STRDATA POC") as strdata:
        collected_games = collect_games()
        enriched_games = enrich_game.map(collected_games)
    
        collected_streamers = collect_streamers()
        enriched_streamers = enrich_streamer.map(collected_streamers)
    
        collected_streams = collect_streams_per_game.map(enriched_games, unmapped(enriched_streamers))
        enriched_streams = enrich_stream.map(flatten(collected_streams))
    
        store_game.map(enriched_games)
        store_stream.map(enriched_streams)
        store_streamer.map(enriched_streamers)
    The Flow runs OK when I run it standalone, but when I register it in my local prefect server, I can see the following error in the dashboard:
    Failed to load and execute Flow's environment: ModuleNotFoundError("No module named 'tasks'")
    it seems to be similar to an issue I found about not being able to submit flows to prefect cloud because of some peculiarity with pickle?https://github.com/PrefectHQ/prefect/issues/1742 But this was related to packaging the flow in a docker image, so I can't apply the solution to my case The layout of my project is the following:
    deploy
    |_
      prod
      |_
        register.py (contains flow.register)
    flows
    |_
      strdata_poc.py (contains flow definition - see above)
    tasks
    |_
      collect
      |_
        games.py
        streamers.py
        streams.py
      enrich
      |_
        ...
    Chris White

    Chris White

    2 years ago
    Hi Luis, whether the Flow is stored as a pickle or as a script, the imports you run to build your flow need to also be accessible at runtime. The most common way of achieving this in Docker is to write your own Dockerfile that adds your entire project to the container + adds this project location to your importable
    PATH
    .” We’re working on trying to make this a little easier to manage within Prefect’s API but ultimately we have to be able to recreate your Flow object at runtime (which requires your imports be accessible)
    l

    Luis Muniz

    2 years ago
    Ah, that's something that I seem to have missed, then. When I run my flow with the local executor, the flow is containerized too?
    i guess if it uses dask... hm yes
    Chris White

    Chris White

    2 years ago
    Ah I’m sorry, are you running this via a local agent + dask?
    l

    Luis Muniz

    2 years ago
    yes
    all the default execution environment
    Chris White

    Chris White

    2 years ago
    OK gotcha; still a very similar story but in this case you additionally need to ensure that each dask worker can import your flow dependencies
    the dask workers essentially need to be able to recreate the python code that is submitted to them, which requires your dependencies are importable
    l

    Luis Muniz

    2 years ago
    If I'm not mistaken, the default environment spawns en ephemeral dask cluster when a flow is submitted, right?
    How do I tell these workers where my code is?
    Chris White

    Chris White

    2 years ago
    Yes that’s true; in this case you might be able to get away with adding the import path to your local agent, something like:
    prefect agent start -p /path/to/my/project -p /maybe/another/path/within/my/project
    (you can use as many
    -p
    flags as you need, and each directory you pass is added to your import path for your process, and in your case taht should also cover the ephemeral dask workers)
    l

    Luis Muniz

    2 years ago
    aha
    For production, when you use a base docker image and an S3 storage object to specify this non-docker storage for container environments (don't know if I'm getting the vocabulary right) this might be fixed if the full project is deployed on S3 ?
    Chris White

    Chris White

    2 years ago
    Hmmm yea, but typically we recommend converting your project into a small python package and installing your package into the base image. That being said, we are currently actively working on making this simpler from the user-perspective. This issue is just the tip of the iceberg for making sure that Prefect starts inferring your import paths in an intelligent way: https://github.com/PrefectHQ/prefect/issues/2857
    l

    Luis Muniz

    2 years ago
    Ok, thanks for the big push
    you're awesome, guys
    Chris White

    Chris White

    2 years ago
    Thank you!! We’re always trying to make things as intuitive as possible, and we appreciate all the feedback the community gives us 😄
    l

    Luis Muniz

    2 years ago
    I confirm that your workaround helped for local execution
    Chris White

    Chris White

    2 years ago
    awesome, glad to hear
    Anish Chhaparwal

    Anish Chhaparwal

    1 year ago
    hey i'm facing a similar issue. is there any update on this? or docker images is the only way..