https://prefect.io logo
Title
b

Ben Ayers-Glassey

03/30/2023, 3:56 PM
Hello! I have a v1 flow which ingests data via SFTP. I have a task which lists the files to be downloaded, and then a mapped task which downloads each file. The flow completed successfully, but seems to have not downloaded a few files. There were 3,202 files, but only 3,197 seem to have been downloaded. So we're missing 5. There are no errors associated with this; rather, it appears that only 3,197 mapped runs were ever scheduled, even though it says "Expected Runs: 3,202". When I mouse over the "3,202", a popup says:
The number of mapped children expected to run.
Note that the number of active mapped runs may be less than this if some have not yet entered a
Pending
state.
This seems like it would make sense while the flow was still running, but not once it has completed. 🤔
👀 1
The logs for the mapped task itself don't show any issues:
If I click on its "Mapped Runs", I see:
1-25 of 3198
...i.e. the 3197 mapped runs, plus the parent. So I can't find any mention of the 5 missing expected runs.
b

Bianca Hoch

03/30/2023, 4:54 PM
Hey Ben! If possible, I'd try running the following query to see if you can get the IDs of those task runs:
query {
  task_run(
    where: {
      flow_run_id: {_eq: "insert_flow_run_id"},
      state: {_ne: "Success"}
      }
  ) {
    name
    id
    state
  }
}
^ or something similar to this query. Essentially, it's looking for any task runs from that flow run which do not have a status of
Success
Other than that, it's hard to say for certain what happened here if the logs don't tell the story of the missing work
b

Ben Ayers-Glassey

03/30/2023, 5:03 PM
In [11]: from prefect.client import Client
    ...: 
    ...: c = Client()
    ...: 
    ...: c.graphql("""query {
    ...:   task_run(
    ...:     where: {
    ...:       flow_run_id: {_eq: "45ddf3e0-8d23-460e-a244-28fcb7b96f52"},
    ...:       state: {_neq: "Success"}
    ...:       }
    ...:   ) {
    ...:     name
    ...:     id
    ...:     state
    ...:   }
    ...: }""")
    ...: 
Out[11]: 
{
    "data": {
        "task_run": [
            {
                "name": null,
                "id": "ebad8ee0-51af-49e3-a519-a53e8b4b1366",
                "state": "Mapped"
            }
        ]
    }
}
I think this makes sense... that's presumably the mapped task itself. Its status says "Mapped" on the UI as well:
...but the query doesn't seem to have found any of its children, which presumably means they all have "Success" status. Which again matches what I see in the UI.
So then, there is no clue why the number of children ended up less than "Expected Runs"?
b

Bianca Hoch

03/30/2023, 5:56 PM
Hmm..maybe we need to try re-configuring the query to get more details here
If you go to the interactive API, there is a "Query.mapped_children" option. It takes a task run ID as a search parameter
👀 1
b

Ben Ayers-Glassey

03/30/2023, 5:57 PM
This mapped task is supposed to store the files it downloads in GCS, and I'm fairly sure I see them all there -- all 3,202 files. So it seems like maybe all of the task's mapped runs did actually run, just Prefect didn't learn about all of them?..
👀 1
🤔 1
b

Bianca Hoch

03/30/2023, 5:58 PM
Odd, but it's nice that the files are there after all
b

Ben Ayers-Glassey

03/30/2023, 5:58 PM
Yes 🙂
query {
  mapped_children(task_run_id:"45ddf3e0-8d23-460e-a244-28fcb7b96f52") {
    min_start_time
    max_end_time
    state_counts
  }
}
...gives:
{
  "data": {
    "mapped_children": {
      "min_start_time": null,
      "max_end_time": null,
      "state_counts": {}
    }
  }
}
🤷