Hello I ve got a load of `invalid duration format` errors sh Prefect Community #ask-community

Hello! I’ve got a load of `invalid duration format...

Tom Manterfield

04/28/2022, 8:41 PM

Hello! I’ve got a load of

invalid duration format

errors showing up in my Orion API, just checking if this is a bug or misconfig on my part?

Tom Manterfield

04/28/2022, 8:41 PM

Copy code

20:39:34.009 | ERROR   | uvicorn.error - Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 366, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
    return await <http://self.app|self.app>(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/message_logger.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/message_logger.py", line 78, in __call__
    await <http://self.app|self.app>(scope, inner_receive, inner_send)
  File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 261, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await <http://self.app|self.app>(scope, receive, _send)
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 92, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 147, in simple_response
    await <http://self.app|self.app>(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc
  File "/usr/local/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
    await <http://self.app|self.app>(scope, receive, sender)
  File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await <http://self.app|self.app>(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 656, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 408, in handle
    await <http://self.app|self.app>(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 261, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await <http://self.app|self.app>(scope, receive, _send)
  File "/usr/local/lib/python3.9/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc
  File "/usr/local/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
    await <http://self.app|self.app>(scope, receive, sender)
  File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await <http://self.app|self.app>(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 656, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 259, in handle
    await <http://self.app|self.app>(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 61, in app
    response = await func(request)
  File "/usr/local/lib/python3.9/site-packages/prefect/orion/utilities/server.py", line 87, in handle_response_scoped_depends
    response = await default_handler(request)
  File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 227, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 160, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/local/lib/python3.9/site-packages/prefect/orion/api/flow_runs.py", line 118, in flow_run_history
    return await run_history(
  File "/usr/local/lib/python3.9/site-packages/prefect/orion/database/dependencies.py", line 112, in async_wrapper
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/prefect/orion/api/run_history.py", line 158, in run_history
    return pydantic.parse_obj_as(List[schemas.responses.HistoryResponse], records)
  File "pydantic/tools.py", line 38, in pydantic.tools.parse_obj_as
  File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for ParsingModel[List[prefect.orion.schemas.responses.HistoryResponse]]
__root__ -> 28 -> states -> 0 -> sum_estimated_lateness
  invalid duration format (type=value_error.duration)

Tom Manterfield

04/28/2022, 8:42 PM

☝️ the stack trace from one of them. There’s lots, seem to be a batch of 2-3 every few minutes.

Tom Manterfield

04/28/2022, 8:43 PM

It might be happening every time I view the flow runs list in the UI, but I wasn’t able to confirm that behaviour 100%

Anna Geller

04/28/2022, 8:43 PM

can you share

prefect version

output?

Tom Manterfield

04/28/2022, 8:43 PM

If that was the case, I have two flow runs that are pending and therefore have no duration, but all of this is wild ass guessing rather than proper debugging.

Tom Manterfield

04/28/2022, 8:44 PM

Sure, one sec

Tom Manterfield

04/28/2022, 8:45 PM

Copy code

Version:             2.0b3
API version:         0.3.0
Python version:      3.9.12
Git commit:          58a401bc
Built:               Wed, Apr 13, 2022 11:21 AM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         ephemeral
Server:
  Database:          postgresql

Anna Geller

04/28/2022, 8:48 PM

to check if this is a misconfiguration or a bug, when exactly did you see this error showing up in your API logs - when you triggered a specific run, you said it might be when viewing flow runs from the UI, could you confirm that? could help troubleshoot if this happens when running a flow, could you share flow code or deployment spec?

Tom Manterfield

04/28/2022, 8:48 PM

It’s definitely not when running a flow, I can confirm that part

👍 1

Anna Geller

04/28/2022, 8:49 PM

so far I don't have enough info to reproduce or say anything helpful to fix the issue

Tom Manterfield

04/28/2022, 8:49 PM

One moment, I’ll see if I can reliably get it to happen.

Anna Geller

04/28/2022, 8:50 PM

that would be helpful, thanks a lot! If I can't reproduce or help further, I'll open an issue and ask the team

Tom Manterfield

04/28/2022, 8:51 PM

Right, I just refreshed the UI three times with the flow runs tab selected and had it appear each time.

Tom Manterfield

04/28/2022, 8:51 PM

I’m not sure how to delete flow runs, if I can do that I can nuke them all and see if it still happens

Anna Geller

04/28/2022, 8:51 PM

so it looks like if I start ephemeral Orion with Postgres and view the UI anywhere, I should be able to reproduce?

Anna Geller

04/28/2022, 8:52 PM

sorry, but I think the only reliable way to nuke old flow runs would be to reset the DB entirely

Tom Manterfield

04/28/2022, 8:52 PM

Okay. I can do that. Let me gather more info from the current state first

🙌 1

Tom Manterfield

04/28/2022, 9:00 PM

Dammit, hadn’t realised nuking would wipe the different storage options

Tom Manterfield

04/28/2022, 9:01 PM

As in the fixtures, rather than the user configured ones

Tom Manterfield

04/28/2022, 9:03 PM

Well, refreshing now it’s totally nuked no longer causes the error, so that’s good info.

Anna Geller

04/28/2022, 9:04 PM

nice work!

Tom Manterfield

04/28/2022, 9:04 PM

I want to recreate my flows and have some that aren’t pending to see if that’s the cause but I don’t know how to set storage up again without that seed data. I’d have thought DB recreation would have included those entries but apparently not.

Anna Geller

04/28/2022, 9:07 PM

sorry to hear that you lost some of your configurations. we tried to make the storage setup easy with the CLI

Tom Manterfield

04/28/2022, 9:07 PM

Oh no, it’s not my storage config I’m worried about, that’s dead easy

Anna Geller

04/28/2022, 9:08 PM

some engineers are working on adding storage to the DeploymentSpec so if that's of any consolidation, you won't need to rely on a global storage config in the near future

🙌 1

Tom Manterfield

04/28/2022, 9:08 PM

it’s the storage configs that come baked into the DB that allow you to set storage up

Tom Manterfield

04/28/2022, 9:08 PM

Without those the create storage command won’t work, it just tells me to select an option from an empty list

Anna Geller

04/28/2022, 9:09 PM

Can you make sure that all your browser windows with Orion are closed before doing that?

Anna Geller

04/28/2022, 9:09 PM

I had a similar issue - closing all browser tabs, then resetting DB and starting orion/creating storage should work. We have an open issue for that

✅ 1

Tom Manterfield

04/28/2022, 9:10 PM

some engineers are working on adding storage to the DeploymentSpec so if that’s of any consolidation, you won’t need to rely on a global storage config in the near future

This will be awesome. Storage has been by far the hardest part to fully automate deployment for.

👍 1

upvote 1

Anna Geller

04/28/2022, 9:10 PM

100% agreed

Tom Manterfield

04/28/2022, 9:14 PM

Hmmm, something isn’t right here. I closed all UI windows and reset the DB again but via the CLI

Tom Manterfield

04/28/2022, 9:15 PM

it confirmed reset against local SQLite, but I’m using a kubernetes cluster and PostgreSQL

Anna Geller

04/28/2022, 9:15 PM

I actually meant that via CLI reset command

Tom Manterfield

04/28/2022, 9:15 PM

Checked config and the API URL is still set to the proper location, so it should be looking at/interacting with the deployed version.

Anna Geller

04/28/2022, 9:16 PM

in that case you would need to recreate a Kubernetes deployment?

Tom Manterfield

04/28/2022, 9:17 PM

I’ll try doing that, it seemed to still be running happily post reset, but can’t have been quite as happy as it seemed

👍 1

Tom Manterfield

04/28/2022, 9:21 PM

Well I didn’t cover myself in glory there. Was bouncing the wrong pod over and over wondering why nothing was happening.

😂 1

Anna Geller

04/28/2022, 9:25 PM

still, looks like everything is working now?

Anna Geller

04/28/2022, 9:26 PM

I'll be away for a bit now, LMK if you still have any issues, I can check later

Tom Manterfield

04/28/2022, 9:26 PM

Okay, everything is restored and I have deployments back in. Refreshing isn’t causing the error. Let’s test my theory

👍 1

Tom Manterfield

04/28/2022, 9:26 PM

No worries. I’ll leave a note so people can debug. It’s not urgent on my side really, I just wanted cleaner logs to debug other stuff.

👍 1

Tom Manterfield

04/28/2022, 9:43 PM

• Tried refreshing UI with one failed flow run in place. Exception didn’t occur • Tried refreshing UI with failed and pending flow run in place. Exception didn’t occur • Error has eventually returned after a period of letting the system run (schedule a few flow runs, normal UI usage etc) • I can now trigger the error reliably again by refreshing the UI Sorry, this isn’t a great starting point for debugging but I can’t seem to pin down a reliable set of steps to get into this state. Only config I added post reset was storage. I also have

PREFECT_API_URL

and

PREFECT_AGENT_QUERY_INTERVAL

env vars set.

Anna Geller

04/29/2022, 12:23 AM

I could try to replicate with Kubernetes setup but just to let you know, if this doesn't work for you, you always can try those two alternatives: • running Orion without Kubernetes - e.g. on a local instance you could set it up on EC2 • using Cloud 2.0 How did you set up your Orion instance on Kubernetes? Did you follow this tutorial here?

Tom Manterfield

04/29/2022, 3:31 PM

I did follow that tutorial. Everything seems to work, it’s just there’s stacks of errors in the logs that make it hard to find errors I care about.

Tom Manterfield

04/29/2022, 3:32 PM

Mostly timeouts and then the ones I describe above. I figure the timeouts are fairly self explanatory, but the invalid duration seemed like it might be a bug

Tom Manterfield

04/29/2022, 3:33 PM

I wouldn’t want to use a different deployment mechanism like EC2 as that’s going to bring down a lot of additional maintenance burden. I’d be okay with using cloud (I’m considering it now), but I’d still need to know I could run it successfully on my own stack first as a safety net since it’s such a core part of our system.

Tom Manterfield

04/29/2022, 3:35 PM

That safety net isn’t just hypothetical either. There’s a decent chance our product will need to be self-hostable for an enterprise version in future. We’d be able to use the cloud offering for the main SaaS solution, but couldn’t use it for those self-hosted instances.

Anna Geller

04/29/2022, 3:36 PM

I can 100% understand that, and I'm sure this is a solvable problem. Hard to say what exactly went wrong there, but I can reassure you that we build Prefect 2.0 in a way that you can use the OSS product for your production deployments.

Tom Manterfield

04/29/2022, 3:38 PM

Oh yeah for sure, I don’t mind there being some teething problems. You’re pretty up front about this being a beta and the product has lots of features that offset that

👍 1

Tom Manterfield

04/29/2022, 3:39 PM

for a chunk of the problems I’ve faced I’m hoping to wrap them up in a kubernetes operator and make it open source so others can pick it up, I just want to make sure I’ve got the right solutions internally before I do that.

Tom Manterfield

04/29/2022, 3:40 PM

I’d also need to wait on the new storage interface as I wouldn’t want to hack around the current one and then immediately replace it. Everything else is pretty workable as-is.

Anna Geller

04/29/2022, 3:47 PM

Nice! If you want a sneak peek of the storage interface, it's based on the

fsspec

interface which provides a lot of flexibility in that regard - if you want to check this: https://github.com/PrefectHQ/prefect/blob/orion/src/prefect/blocks/storage.py

👍 1

Tom Manterfield

04/29/2022, 4:06 PM

@Anna Geller I might have just found the issue, mea culpa. I’m on an M1 Mac locally, so it’s emulating. I only just noticed prefect doesn’t have an arm64 image. My experience of emulation with docker is that it’s pretty ropey, and the errors you can get are… weird. Are there any plans for an arm build of the image?

Tom Manterfield

04/29/2022, 4:09 PM

I can build a local version in the meantime, but would be awesome if the official one had both.

Anna Geller

04/29/2022, 4:13 PM

Nice to hear you found out the root cause of the issue, good work! Let me open an issue for that. @Marvin open "Consider adding arm64 base images for Prefect 2.0"

🙌 1

Marvin

04/29/2022, 4:14 PM

https://github.com/PrefectHQ/prefect/issues/5732

Tom Manterfield

04/29/2022, 4:51 PM

Never mind. I just rebuilt locally using arm64, build was successful and the app does seem more performant, but the errors described above still exist. So it’s worth doing, but not the cause of this problem.

👍 1

14 Views

Open in Slack

Previous Next