We are busy trying out different deployment strategies at th Prefect Community #ask-community

We are busy trying out different deployment strate...

Bernard Greyling

07/10/2020, 8:47 AM

We are busy trying out different deployment strategies at the moment. Initially went the Dask executor route, using prefect to create workflows and K8s/DaskGateway to provide the cluster interface. Worked well for some tasks, however ran into cluster memory issues with more complex flows. Need to sort that approach out still, but we can't afford to waste more time going that route. To resolve the dask worker memory issues, we are considering running a prefect k8s agent ( using either s3 or docker storage). That way runs are self-contained and easier to manage memory for. I have two questions: 1. In the documentation/github the issue of limited agent resources

cpu: 100m & memory: 128Mi

is mentioned but not explained. What is the reasoning behind this limit? 2. We've successfully setup/authenticated a k8s runner and scheduled both s3 and docker runs. After a custom image pull error on k8s the k8s-prefect agent seems to be in a feedback loop announcing that it can see flows :

Found 2 flow run(s) to submit for execution.

But not executing them. Note - I did manually terminate the k8s job via kubectl. Not sure if this messed up the prefect-cloud state EDIT - Before this feedback loop state, we managed to run both s3 and docker runs with the vanilla examples

Jenny

07/10/2020, 1:07 PM

Hi Bernard - thanks for the questions. Sounds like the feedback loop is your most pressing issue? I don't know the answer off the top of my head but let me see if I can get some advice from the rest of the team and get back to you.

nicholas

07/10/2020, 1:20 PM

@Bernard Greyling - that sounds like your agent can't access the registry where you've stored your flows, can you confirm if this is the case? The daemon would need to be authenticated prior to your agent starting, I believe

Bernard Greyling

07/10/2020, 1:28 PM

Hey guys, Managed to get it working. @nicholas Your'e right. Fixed this by including the registry url with the image specification: ie.

Copy code

flow.environment = LocalEnvironment(
    metadata={
        "image": "registry_url/org/repo:tag"
    },
    labels=['s3-flow-storage']
)

@Jenny The feedback loop persisted until I deleted the Project with all flows from cloud and creating a new one. This is fine for now as we are still getting to know the ecosystem. But seems problematic for mission critical deployments. Is there some internal state management with prefect cloud that might help understand the loop?

Bernard Greyling

07/10/2020, 1:29 PM

I'm sure I can reproduce this behavior in case we need some testing.

nicholas

07/10/2020, 1:32 PM

@Bernard Greyling if you can create a min reproducible example, let's move this to a bug ticket, it sounds like non-ideal behavior in a case like this.

Bernard Greyling

07/10/2020, 1:33 PM

sure thing

Bernard Greyling

07/10/2020, 1:34 PM

congrats on the eco-system. it's pretty kickass🔥

nicholas

07/10/2020, 1:34 PM

💪 💪 💪 😄

Bernard Greyling

07/10/2020, 2:45 PM

https://github.com/PrefectHQ/prefect/issues/2943

nicholas

07/10/2020, 2:45 PM

Thanks for that @Bernard Greyling!

upvote 1

3 Views

Open in Slack

Previous Next