Beginner’s Question:
• _I have a number of flows which ingest one or more jsons and transform them into a single dataframe_:
• The flows are currently wired to write their output in a target path, that is parameterized (via a block, but this is still TBD).
• But I also like to unit-test and in some use cases do not require to land the data at all.
In effect I have reusable flows.
What is the best pattern for this? Optionally persist as an input parameter, optionally return the DF or None? Control behavior via blocks?
k
Kevin Grismore
11/20/2023, 8:21 PM
Maybe have a parameter (let's say it's called
target_path
) to the flow that controls output destination, but the param is optional so output writing logic is behind
if target_path:
And then you could have multiple deployments of that flow, with
target_path
set/not set for convenience in the UI
d
David Parmenter
11/20/2023, 8:33 PM
ok, that makes sense, thanks!
David Parmenter
11/21/2023, 5:29 PM
I am trying having the tasks write to an AbstractFileSystem, and then I can use S3 for production, and a memory based fs for unit testing. This seems to work!
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.