Does anyone have any experience guidance as to the wisdom of Prefect Community #ask-community

Does anyone have any experience / guidance as to t...

Sean Talia

07/23/2021, 1:40 PM

Does anyone have any experience / guidance as to the wisdom of running Prefect flows as part of a GitHub Actions CI/CD pipeline? My org's machine learning team would like to start setting up some GitHub workflows that might run some potentially complicated DAGs – some of the nodes /tasks would need to able query our data warehouse for some sample data and run a machine learning model against it. They want these workflows to run on opening a PR, and to trigger a PR check failure in the event that some step of the DAG fails. This workflow sounds to me like it might be suited to using prefect's GraphQL API for launching and monitoring the status of such a flow, but I'm not sure if that would be trying to use Prefect for something it might not really be meant for. Bringing Prefect into the picture sounds like it could be more overhead than necessary, but I think it might make our lives easiest in the end. Any thoughts on this would be much appreciated!

Kevin Kho

07/23/2021, 2:04 PM

Hey @Sean Talia, the only related thing I know is this? Not exactly related, but have you seen it? Let’s see if the community has any thoughts. I think it’s right that you would have to use the GraphQL API to achieve this.

Sean Talia

07/23/2021, 2:19 PM

oh yes we're using a similar setup extensively for registering flows to our cloud instance! it's been working beautifully. I just haven't given any consideration to the idea of trying to execute flows as part of a GH workflow

Kevin Kho

07/23/2021, 2:22 PM

Yeah you should be able to with

prefect run

right? Or I suppose if you have have more involved things, you can use a Python script. With what you described though, I think using the API makes your life easier there, not harder.

Sean Talia

07/23/2021, 2:26 PM

I actually haven't really considered executing flows by doing

prefect run

in production environments; I always assumed (maybe wrongly?) that it makes more sense to register your flow so that you can take advantage of all the nice visualization the UI will give you – I guess this is a separate question since you said "using the API makes your life easier", but are there specific use cases where you think it makes more sense to do

prefect run

than go through the registration/agent setup?

Zanie

07/23/2021, 2:32 PM

Just to weigh in here since I made a lot of change to

prefect run

-- You can use this with a registered flow and get all of the benefits of UI visualization. Just

prefect run --name "your-flow-name" --watch

will create a flow run that will be submitted to an agent then stream logs to local stdout. The error code will reflect the success of the flow run (which sounds like what you'd want in CI). You can also do

prefect run --name"your-flow" --execute

and it will run with API reporting (ie you can see it in the UI) but it will execute directly in your CI process without being sent to an agent.

Sean Talia

07/23/2021, 2:35 PM

oh wow @Zanie – are those features available in

0.14.17

Kevin Kho

07/23/2021, 2:36 PM

This came in 0.15.0

Sean Harkins

07/23/2021, 7:02 PM

@Sean Talia We have similar requirements for an open source project. Your timing is impeccable as I just spent the last day battling with the GraphQL API in order to implement some of this 😆. You can see a high level overview of the approach we use here https://github.com/pangeo-forge/roadmap/blob/master/doc/adr/0001-github-workflows.md. On the CI side our Github Actions are slightly more complicated because the are often triggered by PR events from forks which have restrictions around access to Github secrets but in your case this may be simpler. This is the module we use for flow registration and webhook creation https://github.com/pangeo-forge/pangeo-forge-prefect. There is a lot of overhead here for how we register flows in different clusters but the relevant logic for you is here https://github.com/pangeo-forge/pangeo-forge-prefect/blob/master/pangeo_forge_prefect/flow_manager.py#L345-L350 and https://github.com/pangeo-forge/pangeo-forge-prefect/blob/master/pangeo_forge_prefect/automation_hook_manager.py

👍 1

4 Views

Open in Slack

Previous Next