Hello! Is it possible to set a template name (usin...
# ask-community
l
Hello! Is it possible to set a template name (using target) for a task with multiple return values? I'm doing something like this:
Copy code
@task(checkpoint=True, target="{task_name}.pkl", nout=4)
def split_train_test(dataset):
    array = dataset.values
    X = array[:,0:4]
    y = array[:,4]
    X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20, random_state=1)
    return X_train, X_validation, Y_train, Y_validation
Using LocalResult on my flow, I noticed that I've saved an
split_train_test.pkl
, which was what I expected, but also 4 files with prefect default names (
prefect-result-...
). Is it possible to define a template for these files too? Or just not save them separately? This is important because, with these names, the cache will never work for them (as it have the current timestamp)
z
Hi! Does the first pickle contain all 4 results then there are 4 files containing each of the subitem pickles as well?
l
Yes, the "main pickle" contains all 4 results (as a tuple), so Prefect seems to be duplicating them along 4 different files too
Prefect seems to treat them as different tasks, which are never cached, given the default names
@Zanie Is this the expected behaviour?
z
Hey Luiz, this is expected from the implementation of n_out. I'm still considering the best solution.
I think it'd make sense to only cache the parent or the child tasks but I'm not sure which is feasible.
l
I see, thanks. What I'm afraid is that if this task is really expensive. I worry that the child tasks could become a major bottleneck, as they can't be cached. Are these child tasks just for saving the outputs, or are they running part of the method too?
z
They're just placeholders for splitting the outputs unless I misunderstand your flow. They could be cached by loading from the parent task's cache still.
l
Yes, I've tested it too by adding a print to the task, and it just ran once. I'm thinking prefect does this so it could pass part of the outputs to other tasks, right? (In my case 2 outputs will go to taskX and 2 to taskY). I'm okay with that, so it would be interesting if I could just set a default name for them too. (So I could avoid creating new files for each execution)
z
@Marvin open "Allow target names for
n_out
tasks or avoid duplicating cached results"