If you return a python set from a FunctionTask and try to us Prefect Community #ask-community

If you return a python set from a FunctionTask and...

haf

09/20/2021, 8:30 AM

If you return a python set from a FunctionTask and try to use it in another task B, Prefect fails B silently without reason.

haf

09/20/2021, 8:32 AM

Copy code

[2021-09-20 10:26:55+0200] INFO - prefect.TaskRunner | Task 'found eligible apps': Finished task run for task with final state: 'Success'
[2021-09-20 10:26:55+0200] INFO - prefect.TaskRunner | Task 'fetch_mmm_data': Starting task run...
[2021-09-20 10:26:55+0200] DEBUG - prefect.TaskRunner | Task 'fetch_mmm_data': Handling state change from Pending to Failed
[2021-09-20 10:26:55+0200] INFO - prefect.TaskRunner | Task 'fetch_mmm_data': Finished task run for task with final state: 'Failed'
[2021-09-20 10:26:55+0200] INFO - prefect.FlowRunner | Flow run FAILED: some reference tasks failed.
[2021-09-20 10:26:55+0200] DEBUG - prefect.FlowRunner | Flow 'nightly_mmm': Handling state change from Running to Failed
deleting tmpfiles dir: /var/folders/cc/70yk9qg16hj0kx5_5r_vq_y40000gn/T/tmpc8f7ggrq

haf

09/20/2021, 8:33 AM

If you remove the

list(...)

conversions in this code it would crash with no message

emre

09/20/2021, 8:47 AM

I couldn't reproduce this, at least in prefect core. Am I missing something

Copy code

@task(nout=3)
def setstuff(x, y):
    xs, ys = set(x), set(y)
    return ys.intersection(xs), xs.difference(ys), ys.difference(xs)

with Flow("aaaaa") as f:
    a, b, c = setstuff([1, 2, 3], [1, 3, 5])
    printr(a)
    printr(b)
    printr(c)

haf

09/20/2021, 8:48 AM

I'm using Tuple annotations instead of

nout

haf

09/20/2021, 8:48 AM

Also,

and

are

@dataclass

values

emre

09/20/2021, 8:58 AM

Converted to dataclass and tuple annotations, still works. Are you running on core or server? Also is there mapping involved? Can you share a snippet where you are setting up your taskj dependencies in the flow, specifically for

fetch_mmm_data

haf

09/20/2021, 8:58 AM

Yes, there's mapping

haf

09/20/2021, 8:58 AM

and it's running locally

haf

09/20/2021, 8:58 AM

Copy code

eligible_apps, missing_from_result, missing_from_metrics = app_set_intersection(
        metrics_eligible_apps, result_eligible_apps
    )

    print_list(eligible_apps, task_args={"name": "found eligible apps"})
    print_list(missing_from_result, task_args={"name": "missing from result"})
    print_list(missing_from_metrics, task_args={"name": "missing from metrics"})

    dataset = fetch_mmm_data.map(eligible_apps)

haf

09/20/2021, 9:02 AM

All prints are passing

emre

09/20/2021, 9:03 AM

I see, now I am getting the same issue. AFAIK mapped tasks need their mapped inputs as lists, other iterables won't work.

haf

09/20/2021, 9:04 AM

Yes, that's basically the issue!

haf

09/20/2021, 9:04 AM

Makes no sense

emre

09/20/2021, 9:08 AM

Just checked taskrunner code. To be mappable, your task result needs to be subscriptable. i.e. impement

__getitem__

, i.e. the collection needs to support

x[0]

like operations. Sets don't do that, because as a data structure, they do not guarantee ordering of their elements.

haf

09/20/2021, 9:09 AM

but they are iterable and I as a user don't care about indexing.

emre

09/20/2021, 9:12 AM

in this case sure, you don't. What if you had a mapped task taking 2 inputs. First elements need to be called together, Second elements need to be called together. If prefect supported sets in this case you could get random pairings in every run.

haf

09/20/2021, 9:13 AM

Doesn't prefect know if the task takes two inputs and can fail then? And isn't prefect able to check for the subscriptable behaviour and warn about it?

haf

09/20/2021, 9:14 AM

If you have an ABI/API, prefer to make it total if you can, rather than making it partial. If you have an API, prefer failing explicitly rather than failing implicitly.

emre

09/20/2021, 9:21 AM

I can't really talk about their design decisions, you could open an issue on github and argue your point. About implicit failure, apparently the error is stored in a state message.

Copy code

new_state = Failed("At least one upstream state has an unmappable result.")

Sadly, this is not logged to stdout, you would probably see this message in prefect server directly, but in core its a little buried down.

haf

09/20/2021, 9:22 AM

It's not really a design decision, it fails with no reason, so it's a bug.

haf

09/20/2021, 12:06 PM

@emre Thanks for debugging it though

emre

09/20/2021, 12:09 PM

np, you shıoould probably carry this over to a github issue, see the core teams opinion about it.

38 Views

Open in Slack

Previous Next