Does anyone have any experience / guidance as to t...
# ask-community
s
Does anyone have any experience / guidance as to the wisdom of running Prefect flows as part of a GitHub Actions CI/CD pipeline? My org's machine learning team would like to start setting up some GitHub workflows that might run some potentially complicated DAGs – some of the nodes /tasks would need to able query our data warehouse for some sample data and run a machine learning model against it. They want these workflows to run on opening a PR, and to trigger a PR check failure in the event that some step of the DAG fails. This workflow sounds to me like it might be suited to using prefect's GraphQL API for launching and monitoring the status of such a flow, but I'm not sure if that would be trying to use Prefect for something it might not really be meant for. Bringing Prefect into the picture sounds like it could be more overhead than necessary, but I think it might make our lives easiest in the end. Any thoughts on this would be much appreciated!
k
Hey @Sean Talia, the only related thing I know is this? Not exactly related, but have you seen it? Let’s see if the community has any thoughts. I think it’s right that you would have to use the GraphQL API to achieve this.
s
oh yes we're using a similar setup extensively for registering flows to our cloud instance! it's been working beautifully. I just haven't given any consideration to the idea of trying to execute flows as part of a GH workflow
k
Yeah you should be able to with
prefect run
right? Or I suppose if you have have more involved things, you can use a Python script. With what you described though, I think using the API makes your life easier there, not harder.
s
I actually haven't really considered executing flows by doing
prefect run
in production environments; I always assumed (maybe wrongly?) that it makes more sense to register your flow so that you can take advantage of all the nice visualization the UI will give you – I guess this is a separate question since you said "using the API makes your life easier", but are there specific use cases where you think it makes more sense to do
prefect run
than go through the registration/agent setup?
z
Just to weigh in here since I made a lot of change to
prefect run
-- You can use this with a registered flow and get all of the benefits of UI visualization. Just
prefect run --name "your-flow-name" --watch
will create a flow run that will be submitted to an agent then stream logs to local stdout. The error code will reflect the success of the flow run (which sounds like what you'd want in CI). You can also do
prefect run --name"your-flow" --execute
and it will run with API reporting (ie you can see it in the UI) but it will execute directly in your CI process without being sent to an agent.
s
oh wow @Zanie – are those features available in
0.14.17
?
k
This came in 0.15.0
s
@Sean Talia We have similar requirements for an open source project. Your timing is impeccable as I just spent the last day battling with the GraphQL API in order to implement some of this 😆. You can see a high level overview of the approach we use here https://github.com/pangeo-forge/roadmap/blob/master/doc/adr/0001-github-workflows.md. On the CI side our Github Actions are slightly more complicated because the are often triggered by PR events from forks which have restrictions around access to Github secrets but in your case this may be simpler. This is the module we use for flow registration and webhook creation https://github.com/pangeo-forge/pangeo-forge-prefect. There is a lot of overhead here for how we register flows in different clusters but the relevant logic for you is here https://github.com/pangeo-forge/pangeo-forge-prefect/blob/master/pangeo_forge_prefect/flow_manager.py#L345-L350 and https://github.com/pangeo-forge/pangeo-forge-prefect/blob/master/pangeo_forge_prefect/automation_hook_manager.py
👍 1