? Seems the flow itself rather than a task is failing. This is using
Docker
storage, `LocalEnvironment`/`DaskExecutor` with Dask running on Kubernetes. Seems like somehow
lz4
is not present where the job is started? I do install
pyarrow
using
python_dependencies
in the
Docker
storage so I’d expect
lz4
to be there. I’m not sure where else
lz4
could be missing.
Isaac Brodsky
10/26/2020, 7:29 PM
Alternately, where is
CloudFlowRunner
being run? I assume in the flow Docker image?
k
Kyle Moon-Wright
10/26/2020, 7:53 PM
Hey @Isaac Brodsky,
Yes, I believe the CloudFlowRunner is run in the pulled image. Otherwise the KeyError is interesting, how did you setup your Docker storage?
Kyle Moon-Wright
10/26/2020, 7:56 PM
Probably something like this?:
Copy code
with Flow(
storage=Docker(
python_dependencies=["pyarrow"]
)
) as flow:
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.