John Jacoby

05/03/2022, 11:48 PM
Hi all, this is a great channel idea. I just came here specifically to ask a best practices question and the first thing I saw was the announcement about this new channel! I'm wondering if anyone else has been thinking about what the best practice is for tasks that produce and/or consume file paths. Without going into too much detail, each input into my main flow comes with a unique ID. This ID is used by each task to construct file paths for the task's persistent outputs. The issue came when I realized that I needed the path constructed by one task in another task down the line. I can go edit the upstream task to return the needed path, but now I need to re-run all the mapped iterations of that lengthy task just to return a file path, which on it's own is a very small and quick operation. I can think of a few ways to get around this and I'm wondering if one of them is considered standard or best practice: 1. Don't bother passing the paths down the flow and just re-construct the required file paths in each individual task. 2. Have one task at the start of the flow that constructs quick metadata that all the other tasks can use. That way if I need a new file path, I just have to re-run this quick task. 3. Same as 2, but write the paths and other metadata to a persistent file like a JSON instead of passing it down the flow. Other tasks can then read from this JSON.

Anna Geller

05/03/2022, 11:54 PM
Hi John, this is an interesting question, but it's hard for me to answer without knowing what problem you try to solve with this approach. Could you explain your use case more? Depending on your use case, my answer will be different. This "consulting" answer may seem unsatisfactory but really "it depends" is the only valid answer based on your description so far