Thread
#prefect-community
    j

    Jacques

    2 years ago
    Hi all, I'm doing multiple flow.run()'s in different threads - but when I do this output caching breaks. I'm trying to debug, and I think the issue is that the prefect context isn't shared between runs in multiple threads. Docs here: https://docs.prefect.io/api/latest/utilities/context.html#context-2 state that context is thread safe, so I tried creating a context using
    shared_context = prefect.utilities.context.Context()
    and then passing that to run with
    my_flow.run(context=shared_context)
    but this doesn't seem to solve my problem. Would appreciate any pointer in the right direction!
    Jim Crist-Harif

    Jim Crist-Harif

    2 years ago
    The context is thread local, so it's thread safe in that you can have prefect run in multiple threads, but the context isn't shared between threads.
    So that explains the issue you're running into. I'm not sure how to handle your actual problem of cross-thread caching though.
    nicholas

    nicholas

    2 years ago
    @Jacques - is the issue that the cached outputs of tasks in one thread aren't available to the tasks running in the other threads?
    j

    Jacques

    2 years ago
    Exactly that yes
    If I'm following then its going to be cached if I call the cached task multiple times in the same flow in the same thread, however, there is no cross thread cache sharing mechanism. I imagine implementing a cross-thread caching mechanism would be tricky, but perhaps I could "cheat" a bit by doing something like reading cache out of previous run's contexts and pre-populating the context of future threads with past caches? This would miss cache if started too close, but would miss less over time.
    nicholas

    nicholas

    2 years ago
    Cached outputs are still stored on the context, meaning the results would still be thread-specific. Are you using Core Server or Cloud, or are you working with Core only?
    I think for this use case you'd need some sort of persistence layer, which can't be achieved with Core-only. This is possible with result handlers and caching with Cloud/Server; here's an example of @Laura Lorenz (she/her) working on this in the PyData talks from a couple of weeks ago:

    https://youtu.be/FETN0iivZps?t=5160

    j

    Jacques

    2 years ago
    I'm using core at the moment. Running inside a lambda so need to keep things light.
    Thanks for this, going to watch now!