https://prefect.io logo
Title
g

Greg Desmarais

07/02/2020, 10:55 PM
Hi all, newcomer to prefect - happy so far. I'm stuck a bit in a Dask integrated prefect environment. I have a simple flow that looks something like this:
@task(log_stdout=True)
def say_hello(name):
    print(f'{datetime.now()}: workflow hello {name}', flush=True)
    worker = get_worker()
    return f'done on {worker.name}, scheduler at {worker.scheduler.address}'

name = Parameter('name')
with Flow("Simple parallel hello") as flow:
    # Since there is no return value dependency, we end up with possible parallel operations
    for i in range(10):
        say_hello(name)
If I run the flow from my script, targeting a particular Dask cluster, I can hit the right Dask workers:
executor = DaskExecutor(address=dask_scheduler_address)
flow.run(executor=executor)
My question is about registering this flow and running it, say, from the prefect ui. I can easily register the flow with:
flow.register()
But then trying to run it from the ui just hangs. I'm pretty sure it is because the executor isn't registered with the flow. Am I missing something? Thanks in advance...
c

Chris White

07/02/2020, 10:56 PM
Hi @Greg Desmarais - can you confirm that you have a Prefect Agent running that has access to your locally hosted API?
g

Greg Desmarais

07/03/2020, 10:59 PM
Thanks for the quick reply @Chris White, and sorry for my slow response. I don't have an agent running. I will give that a shot, but does that mean that when you register a flow from a particular bit of code running on a particular machine (call it the creator), you can only execute that flow when the creator is available and running and running an agent? What is the model for registering flows that operate independently?
c

Chris White

07/04/2020, 12:57 AM
No worries; when you register your Flow, it will be placed in a configurable “Storage” location which your agent will need access to. Popular storage choices for deploying Flows are Docker and S3. Note that your Agent(s), the creator machine, and your dask cluster will also need access to the Prefect API - it appears that you are using the open source API which is (currently) best suited for single-machine deployments. Configuring this API to be accessible from within Docker networks / on other machines requires some networking knowledge. Many people start off their Prefect POCs by using a free Prefect Cloud API account so that they don’t need to bother with the networking issues that arise in exposing the local API on a network. Here are a few relevant links to learn more: - execution overview: https://docs.prefect.io/orchestration/execution/overview.html - a simple deployment tutorial: https://docs.prefect.io/orchestration/tutorial/configure.html - more about agents: https://docs.prefect.io/orchestration/agents/overview.html - get a free Cloud account: https://cloud.prefect.io/
g

Greg Desmarais

07/07/2020, 4:34 AM
Thanks for the reading, @Chris White, it is good stuff and I definitely learned some more about the deployment model. Sorry for the slow response - actually on vacation this week - but everyone is asleep now. If I could ask, I have a few somewhat pointed questions: 1. I would really rather not use the prefect cloud service - I prefer to have my critical infrastructure under my management. I'm not 100% on that - it is just my starting point. Is prefect core ready enough for prime time to deploy, or will I be constantly fighting gaps in functionality? I don't mind a bit of blood from the leading edge... 2. We are pretty good with network deployments and an AWS shop. In my readings, I came across the different storage options for environments. Does it seem reasonable for me to deploy the different parts of prefect core in AWS ECS and use ECR for the DockerStorage? 3. Is there a network diagram showing what is needed to deploy a self hosted, prefect core environment? FWIW, I'm planning on using the DaskExecutor to point to a cluster of dask workers I'll have set up. Thanks in advance...
l

Laura Lorenz (she/her)

07/09/2020, 2:18 PM
Hi! Just wanted to poke in here too @Greg Desmarais to +1 Chris’ suggestion to try using Cloud ’s free developer tier just for early development purposes, so that you can get a feel for the execution model without having to self-host first — but you can definitely switch over to the open source, self-hosted version whenever. Prefect Core has a version of cloud that is usually called “Server”, and though it doesn’t have every feature of Cloud plenty of people in the community are using it for production purposes now (to your point 1). To your point 2, AWS ECS + ECR sounds perfectly reasonable. The only thing I would say is more folks are using EKR/fargate for their task workers and keeping them ephemeral, but you may still want to use ECS for the orchestration layer once you get to self-hosting Server. To 3 — basically no, but there are some diagrams of some other situations including Cloud which should be analogous, which I have handy here:

https://youtu.be/FETN0iivZps

https://docs.google.com/presentation/d/1TfOsYmsjgbwXRkiItb2ZeTW_oYxXWAWKMtEnEFOyPiA/edit#slide=id.g7453fd8b20_0_129 Conveniently I’m hosting a stream with our DevOps guy tomorrow about deploying Server, which you might be interested in; you can tune in to it live or see the recording afterwards at this link:

https://youtu.be/yjORjWHyKhg

g

Greg Desmarais

07/09/2020, 10:23 PM
Thanks for the info, @Laura Lorenz (she/her). I think I've moved past the 'kick the tires' phase and am looking at the nuts and bolts of a self hosted deployment. We have a decent amount of machinery around ECS, so deploying in there makes sense for us. We intend to use DaskExecutor and segment workers by host resources (gpu, cpu, big memory, big disk, etc.), and have the Dask worker tasks spun up dynamically as ECS tasks. Those Dask schedulers will also be used directly through dask/pandas work, so owning the infrastructure makes things simpler to customize. I appreciate the link to the devops talk - I'll definitely put it on my calendar.
As an aside, I have the docker storage mostly working with custom flows and am just dealing with authentication to ECR right now. I found some good sample code in the different channels here. Any tips on that bit is appreciated - but I will also be pinging folks in other channels.