Hi all! I have a very specific technical question ...
# ask-community
b
Hi all! I have a very specific technical question about prefect related to task configuration.
I would like a series of tasks to feel like a unix pipeline, in the sense of metadata about the previous job in the series could be fed into the next job
My use case is let’s say I have spark jobs A -> B -> C Each one reads from a data lake, does some processing, then writes back to the same data lake I would like for the output directory of A to automagically become the input directory of B How would you do something like this with prefect?
n
Hi @brian - if I understand your question correctly, you should be able to do this:
Copy code
@task
def A(dir: str):
  # .. do stuff
  return dir

@task
def B(dir:str):
  # .. do stuff
  return dir

@task
def C(dir:str):
  # .. do stuff
  return dir

with Flow("Unix-like Pipeline") as flow:
  dir = Parameter("directory")
  
  a = A(dir=dir)
  b = B(dir=a)
  c = C(dir=b)
b
Yes! Or maybe even something like
Copy code
with Flow("foo") as flow:
  in = Parameter("input_dir")
  out = Parameter("output_dir")
  with Chain(in, out) as chain:
     chain(A, B, C)
I think my issue is that unix pipelines are not graphs! What I’m wanting might be a fantasy
I just love
|
so much….
n
😆 it is really really nice to use, I’m not sure there’s a python equivalent 😞
(in JS we can use method chaining though … 😇 )
b
It’s more just me wanting unix pipelines everywhere and also this I hate configuration
z
We have
|
implemented for tasks in Prefect
Copy code
def __ror__(self, other: object) -> "Task":
        """
        Creates a state dependency between `self` and `other`:
            `other | self --> self.set_dependencies(upstream_tasks=[other])`

        Args:
            - other (object): An object that will be converted to a Task and set as an
                upstream dependency of this Task.

        Returns:
            - Task
        """
        self.set_dependencies(upstream_tasks=[other])
        return self
🤯 3
You could implement a
Task
subclass that binds the return value of the upstream to the first argument of the downstream by overriding this method.
n
whoa! TIL 🙂
b
Thx for sharing that @Zanie it sounds incredibly slick
I typically hate operator overloading because I think it leads to ungrokable code, but I’ll have to mull it over for this case…
k
Wow TIL also