Hi again I was wondering if there s anything special about t Prefect Community #ask-community

Hi again, I was wondering if there's anything spe...

Jeremy Phelps

09/28/2021, 6:28 PM

Hi again, I was wondering if there's anything special about the way intermediate values from mapped tasks are stored. My tasks are all defined with

task(result=OurGCSResult(bucket='our-bucket'))

, where

OurGCSResult

is a copied-and-modified version of the

GCSResult

class found in Prefect. The difference is that

OurGCSResult

is compatible with the old version of the Google Cloud Storage library that our code uses. We would not be able to use the standard

GCSResult

without first rewriting a significant portion of our code. This class works fine for tasks that are not mapped, but something goes wrong for mapped tasks. I dug around using the GraphQL client and noticed that all task runs with a non-negative

map_index

seem to fail to have any storage information:

Copy code

{
              "id": "eba36675-c15a-43ea-ad5b-2540936477b5",
              "map_index": 0,
              "name": null,
              "serialized_state": {
                "type": "Success",
                "_result": {
                  "type": "Result",
                  "location": null,  // WAT?!
                  "__version__": "0.14.22+9.g61192a3ee"
                },
                "context": {
                  "tags": []
                },
                "message": "Task run succeeded.",
                "__version__": "0.14.22+9.g61192a3ee",
                "cached_inputs": {}
              }
            }

As far as I can tell from the logs, no errors are happening, and the only possible result of running the task in question is a possibly-empty array being returned, or an exception being thrown (which would be logged).

Kevin Kho

09/28/2021, 6:32 PM

I’ll give this a try myself and look around. I suspect it’s just rendering as null because Prefect doesn’t recognize the result type.

Jeremy Phelps

09/28/2021, 6:44 PM

For comparison, here's what gets stored for a non-mapped task:

Copy code

{
  "id": "1ba64aab-482a-4e81-93c9-f01287cff21d",
  "map_index": -1,
  "name": null,
  "serialized_state": {
    "type": "Success",
    "_result": {
      "type": "Result",
      "location": "2021/9/24/d13c57e3-63d0-4b00-b8ad-8899b843aac5.prefect_result",
      "__version__": "0.14.22+9.g61192a3ee"
    },
    "context": {
      "tags": [
        "mysql-write"
      ]
    },
    "message": "Task run succeeded.",
    "__version__": "0.14.22+9.g61192a3ee",
    "cached_inputs": {}
  }
}

Kevin Kho

09/28/2021, 6:49 PM

So I tried this and I used a copy of the

LocalResult

class. Made it

LocalResultCopy

and I did get a location:

Copy code

"serialized_state": {
              "type": "Success",
              "_result": {
                "type": "Result",
                "location": "/Users/kevinkho/Work/scratch/4.txt",
                "__version__": "0.14.22+9.g61192a3ee"
              },

But if the task returns nothing:

Copy code

@task(result=LocalResultCopy(location='/Users/kevinkho/Work/scratch/{x}.txt'))
def abc(x):
    if x == 2:
        return
    return x

The result is not persisted and the location ends up as

null

Copy code

"serialized_state": {
              "type": "Success",
              "_result": {
                "type": "Result",
                "location": null,
                "__version__": "0.14.22+9.g61192a3ee"
              },
              "context": {
                "tags": []
              },

While the mapped runs with an output do return something:

Copy code

"serialized_state": {
              "type": "Success",
              "_result": {
                "type": "Result",
                "location": "/Users/kevinkho/Work/scratch/3.txt",
                "__version__": "0.14.22+9.g61192a3ee"
              },

Kevin Kho

09/28/2021, 6:49 PM

Did you template the result of the mapped task?

Jeremy Phelps

09/28/2021, 6:50 PM

What does that mean?

Kevin Kho

09/28/2021, 6:51 PM

You can template the names of mapped tasks so that they all get stored in different locations. Doing something like:

Copy code

@task(result=LocalResultCopy(location='/Users/kevinkho/Work/scratch/{x}.txt'))
def abc(x):
    if x == 2:
        return
    return x

and doing `abc.map([1,2,3,4])`will create

1.txt

2.txt

3.txt

4.txt

becuase for the

{x}.txt

where x is the input variable of the task

Jeremy Phelps

09/28/2021, 6:52 PM

No, I don't do anything like that. I didn't see the need.

Kevin Kho

09/28/2021, 6:53 PM

I think if you don’t succeed it, the succeeding mapped tasks will overwrite the files of the previous mapped tasks (because they all write to he same location)

Jeremy Phelps

09/28/2021, 6:54 PM

The location includes a UUID, so I doubt that applies outside of LocalResult.

Jeremy Phelps

09/28/2021, 6:55 PM

The location passed in by Prefect also includes a date.

Jeremy Phelps

09/28/2021, 6:55 PM

For example,

2021/9/24/d13c57e3-63d0-4b00-b8ad-8899b843aac5.prefect_result

Kevin Kho

09/28/2021, 6:57 PM

Ah I see what you mean. You just specify the bucket and let Prefect give the filename. That looks good, so I would think the tasks are returning empty things for the location to be

null

Jeremy Phelps

09/28/2021, 6:58 PM

The emptiest possible thing that could be returned would be

[]

Kevin Kho

09/28/2021, 6:58 PM

Will try that with my example

Kevin Kho

09/28/2021, 7:14 PM

My empty list still persists a result.

Jeremy Phelps

09/28/2021, 7:15 PM

As expected.

Jeremy Phelps

09/30/2021, 6:25 PM

Any new developments?

Kevin Kho

09/30/2021, 6:27 PM

Sorry I wouldn’t know what is causing the

null

location since empty lists work for persisting results. I would need a minimum example to really be able to look into it.

Jeremy Phelps

09/30/2021, 7:09 PM

I analyzed the

Result

class and found several ways that the SDK can ask for a

Result

with a null location. I'll try overriding the behavior in

OurGCSResult

so it's impossible to have a null location.

4 Views

Open in Slack

Previous Next