p

    Preston Marshall

    2 years ago
    Also is there a way to pass streams across task boundaries? I'd like to stream data from sftp to GCS if possible, that way it doesn't need to be all downloaded to disk first
    Chris White

    Chris White

    2 years ago
    Hi Preston - this is not possible; as a workflow tool, Prefect has a strong concept of dependency between tasks. In this case, a Task must be completely finished before a downstream dependency can begin running. In your case, you might consider combining your two tasks into a single task to support the streaming behavior you’re after
    p

    Preston Marshall

    2 years ago
    Gotcha, that's what I landed on. I'm trying to just pull the file down and send it up to GCS using the
    GCSUpload
    task, and it seems like it expects the whole file as a string? That seems like it would cause a lot of problems, serializing multi-gigabyte files and sending them over the wire. Am I missing something?
    Chris White

    Chris White

    2 years ago
    No, you’re correct; that task is probably better as a template than something that should be used for large datasets
    p

    Preston Marshall

    2 years ago
    got it, thanks
    as far as data locality, I can only depend on a task having access to the same filesystem, right? so outside of those boundaries all bets are off
    Chris White

    Chris White

    2 years ago
    Actually that largely depends on what environment / executor you execute with; for example, if you ran your Flow on dask cluster your tasks would all run on different machines; if you use the
    LocalExecutor
    and run your flows in a non-dockerized environment, then all tasks will run in the same process on the same machine so it’s relatively easy to reason about locality in that scenario