If you return a python set from a FunctionTask and...
# ask-community
h
If you return a python set from a FunctionTask and try to use it in another task B, Prefect fails B silently without reason.
Copy code
[2021-09-20 10:26:55+0200] INFO - prefect.TaskRunner | Task 'found eligible apps': Finished task run for task with final state: 'Success'
[2021-09-20 10:26:55+0200] INFO - prefect.TaskRunner | Task 'fetch_mmm_data': Starting task run...
[2021-09-20 10:26:55+0200] DEBUG - prefect.TaskRunner | Task 'fetch_mmm_data': Handling state change from Pending to Failed
[2021-09-20 10:26:55+0200] INFO - prefect.TaskRunner | Task 'fetch_mmm_data': Finished task run for task with final state: 'Failed'
[2021-09-20 10:26:55+0200] INFO - prefect.FlowRunner | Flow run FAILED: some reference tasks failed.
[2021-09-20 10:26:55+0200] DEBUG - prefect.FlowRunner | Flow 'nightly_mmm': Handling state change from Running to Failed
deleting tmpfiles dir: /var/folders/cc/70yk9qg16hj0kx5_5r_vq_y40000gn/T/tmpc8f7ggrq
If you remove the
list(...)
conversions in this code it would crash with no message
e
I couldn't reproduce this, at least in prefect core. Am I missing something
Copy code
@task(nout=3)
def setstuff(x, y):
    xs, ys = set(x), set(y)
    return ys.intersection(xs), xs.difference(ys), ys.difference(xs)

with Flow("aaaaa") as f:
    a, b, c = setstuff([1, 2, 3], [1, 3, 5])
    printr(a)
    printr(b)
    printr(c)
h
I'm using Tuple annotations instead of
nout
Also,
x
and
y
are
@dataclass
values
e
Converted to dataclass and tuple annotations, still works. Are you running on core or server? Also is there mapping involved? Can you share a snippet where you are setting up your taskj dependencies in the flow, specifically for
fetch_mmm_data
h
Yes, there's mapping
and it's running locally
Copy code
eligible_apps, missing_from_result, missing_from_metrics = app_set_intersection(
        metrics_eligible_apps, result_eligible_apps
    )

    print_list(eligible_apps, task_args={"name": "found eligible apps"})
    print_list(missing_from_result, task_args={"name": "missing from result"})
    print_list(missing_from_metrics, task_args={"name": "missing from metrics"})

    dataset = fetch_mmm_data.map(eligible_apps)
All prints are passing
e
I see, now I am getting the same issue. AFAIK mapped tasks need their mapped inputs as lists, other iterables won't work.
h
Yes, that's basically the issue!
Makes no sense
e
Just checked taskrunner code. To be mappable, your task result needs to be subscriptable. i.e. impement
__getitem__
, i.e. the collection needs to support
x[0]
like operations. Sets don't do that, because as a data structure, they do not guarantee ordering of their elements.
h
but they are iterable and I as a user don't care about indexing.
e
in this case sure, you don't. What if you had a mapped task taking 2 inputs. First elements need to be called together, Second elements need to be called together. If prefect supported sets in this case you could get random pairings in every run.
h
Doesn't prefect know if the task takes two inputs and can fail then? And isn't prefect able to check for the subscriptable behaviour and warn about it?
If you have an ABI/API, prefer to make it total if you can, rather than making it partial. If you have an API, prefer failing explicitly rather than failing implicitly.
e
I can't really talk about their design decisions, you could open an issue on github and argue your point. About implicit failure, apparently the error is stored in a state message.
Copy code
new_state = Failed("At least one upstream state has an unmappable result.")
Sadly, this is not logged to stdout, you would probably see this message in prefect server directly, but in core its a little buried down.
h
It's not really a design decision, it fails with no reason, so it's a bug.
@emre Thanks for debugging it though
e
np, you shıoould probably carry this over to a github issue, see the core teams opinion about it.