Hey folks, I’m trying to set up a workflow where w...
# prefect-server
s
Hey folks, I’m trying to set up a workflow where we spin up a GKE cluster on demand, run a Kubernetes + Dask workflow on it, and then tear down the GKE cluster when the flow is finished. Is this the kind of thing that I’d want to do with a flow inside of a flow? In other words, have a parent flow that manages the cluster, while the child flow handles the data ETL work I’m performing on K8S+Dask? If so, are there any drawbacks to this approach I should be aware of (e.g. visibility into potential errors in the child flow)?
k
I don’t think there are drawbacks. There are mechanisms to wait for child flow runs to finish to not overload your resources. Also just a thought that if those subflows pass data between one another, maybe the Dask spinup adds overhead, but overall I think it should be fine.
w
A thought that occurs to me: You’re going to want to make sure your logging story is very strong. The setup you describe will make your logs very short-lived unless you ship them to something else.
s
@Wilson Bilkovich Aha. Ok, this is something I was worried about. We’re using the Cloud UI and logging of stack traces are one of the key features I don’t want to lose as we scale up a distributed web scraping operation. It sounds like we’ll potentially lose logs from the child flow in Cloud UI? If so, do we need to make sure we log errors to some external source such as a cloud bucket or database?
w
You’ll get the usual flow logs, but my (very limited) experience so far is that a lot of the interesting logs come from things with the
dask-worker
label, and those are all deleted along with the scheduler + cluster.
If you use something off-cluster to store logs like Papertrail or similar, this won’t be an issue.. but
kubectl logs
won’t help you
s
fwiw, another approach we considered was having a separate flow manage the cluster (i.e. keeps it running for some window of time), and then instrument the flow that runs the ETL to make sure the cluster is available before running. That’ll likely be less resource efficient since we’d risk idle time on the cluster, but I’d be willing to pay that price if it means retaining the logs from the ETL flow in Cloud…
k
Wilson is right that the logs may be lost. This is because Dask doesnt ship logs natively to the scheduler, so they don’t make it Prefect (child flow or main flow). If something goes wrong, you do need to go in to the workers sometimes.
Having a long running cluster may certainly help. You could go to the Dask dashboard and look for logs there. Some people to ship this out to third party services to retain worker logs somewhere
s
Aha. Ok. But it sounds like I’ll face that loss-of-logs issue with an ephemeral GKE cluster regardless of whether I do a “flow-in-flow” approach, right?
e
any reason you don’t want a persistent cluster with autoscaling instead of a temporary cluster?
s
@Evan Curtin I would love a persistent cluster, but alas, my organization has very limited budget so we need to scale to zero to save on cloud costs 😕
e
😞
s
For a persistent + autoscaling, we’d need to have at least one long-running node in the cluster, right?
e
I imagine you’re gonna at least need one for the kubernetes APIserver
s
We do have the funds to run one small(ish) node (e.g. n2 standard) on GCP, so perhaps we could look into that. I’ve been designing our ETL flow around the assumption that we’ll need to choose the number of VM nodes to include in the cluster when spinning up. I’m a little fuzzy on how autoscaling would work with an ETL flow that uses KubeCluster and DaskExecutor to spin up a scalable dask cluster. Any advice on that front would be greatly appreciated!
e
At my last place we did an autoscaling cluster on AWS and it worked great with both dask and spark
s
@Evan Curtin Not sure if this helps, but the approach I’ve been taking is to spin up the GKE cluster manually and deploy a single-container pod with the Prefect Agent. And then let KubeCluster + Dask handle the ETL flow on that cluster
k
Yes to your question on losing worker logs. For the best way to go about it, I think Evan may have better thoughts than me
👍 1
w
Is your work amenable to scheduling? A scheduled cluster could be a middle ground between persistent and ephemeral.
e
whichever way you go i recommend persisting logs outside the cluster
this kinda stuff is easier to setup once. I.e you can configure the k8s cluster with a fluentd sidecar and grab stdout from all pods. IDK if GCP has something better
s
@Wilson Bilkovich Yes, the work could be scheduled. It’s a web scrape of a few thousand sites that needs to run daily. We’ve never run the full scrape (testing on a small subset right now), but we’re planning to test the run with different levels of scaling (i.e. nodes in the K8S cluster) to get a sense of how long a single run will take. Once we have a sense of the overall time, it sounds like we could look into a scheduled cluster running for a certain period of time that will just spin up and down without having to manage via script?
@Evan Curtin Interesting. I’ll explore the fluentd sidecar option as well. Thanks!
w
Yeah; I’m more familiar with Amazon’s solutions for this, but it looks like there’s an analogous thing you should be able to do on GCP https://cloud.google.com/compute/docs/autoscaler/scaling-schedules
Basically you write a
cron
expression to describe when you want it to be up
s
@Wilson Bilkovich Very interesting! I’ll definitely look into that. Thanks so much!
w
See also https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler for what happens inside the Kubernetes layer
s
This is perfect. sounds exactly like what we need!
w
Yeah, skimming it, this blog does exactly what I was proposing
Copy code
We'll use a node selector in our pods to make sure they run in the burst node pool. For this, we'll add a label to the nodes in that pool.
s
Yep, this really does sound like exactly what we need. I’ll give it a close look and try testing it out. Thanks again!
@Wilson Bilkovich The blogpost seems pretty straight-forward. Just wondering if you think I’d need special handling in order to get this to play nice with KubeCluster, which appears to use a default pod template for dask scheduler and workers: https://kubernetes.dask.org/en/latest/_modules/dask_kubernetes/core.html#KubeCluster Wondering if the use of labels for the node-pool (and a few other yaml customizations for pods) would require providing custom pod templates (for dask scheduler and workers) to KubeCluster in order to ensure the job is run on the burst pool (as opposed to the default node pool, which would presumably run the Prefect agent)
?
w
It should ‘just work’ in the sense that it will bring up the nodes in response to the pods that get scheduled
You will want to specify custom pod templates for Dask I’d say though, so you can do the “anti-affinity” stuff the bblog post mentions
Something like this but with different numbers and names
Copy code
pod_spec = make_pod_spec(
    image="daskdev/dask:latest",
    memory_limit="4G",
    memory_request="4G",
    cpu_limit=1,
    cpu_request=1,
    env={"EXTRA_PIP_PACKAGES": "prefect dask-kubernetes"},
)

executor = DaskExecutor(
    cluster_class=KubeCluster,
    cluster_kwargs=dict(
        namespace="prefect",
        pod_template=pod_spec,
        n_workers=1,
    ),
)
s
Aha. Ok, so I’ll need to use
make_pod_spec
or roll something custom if I’m unable to configure using that function. Either way, sounds like I need to make sure some of those other relevant pod configs are present. Thanks!
w
Hopefully
make_pod_spec
makes it easy to set the other labels etc
Some of the things in the blog post are optional; for example the anti-affinity that keeps multiple workers off the same node, you could probably live without that
I think this conversation has talked me into doing it this way as well.. I’ll try configuring a second node group on my k8s cluster, for Dask
s
Right. I don’t think I’ll need the anti-affinity bit, but I’ll go over all the options carefully. Awesome! If you get it working, can you share? I probably won’t have time to really dig into this until the week after next, but was just trying to sort through the roadmp
w
I’ll definitely share, yeah. Hopefully I can get it working tomorrow.
s
You rock. Many thanks for all your help!
w
My pleasure; good luck.