Preston Marshall
06/09/2020, 1:56 PMJoe Schmid
06/09/2020, 1:59 PMDaskCloudProviderEnvironment
to dynamically create a distributed Dask cluster on Fargate that scales uniquely for specific Flows and parameter sets, e.g. for a large data engineering Flow that benefits from parallelism, that environment spins up a Dask cluster with 10 workers, etc.Preston Marshall
06/09/2020, 2:00 PMJim Crist-Harif
06/09/2020, 2:02 PMExecutor
, this would be more an Environment
and/or Agent
IMO:
• Executor - where to run tasks in a flow
• Environment - spec for deploying a flow (e.g. in a fargate task)
• Agent - process that watches the prefect api and kicks off flow runs using the Environment.Joe Schmid
06/09/2020, 2:03 PMFargate basically just gives you the ability to run compute indefinitely with a docker image, right?Yes. Maybe another way to say it would be "run containers without allocating compute resources, i.e. ec2 instances."
Preston Marshall
06/09/2020, 2:10 PMAn Hoang
06/09/2020, 2:11 PMJoe Schmid
06/09/2020, 2:19 PMdoes EC2 have the ability to launch a docker container with their API?Yes, you can use ECS (Elastic Container Service) in two modes: 1. Fargate -- serverless, i.e. AWS manages resources for you 2. EC2 -- the traditional ECS approach where containers run on EC2 instances that you create, potentially using auto-scaling groups.
Pedro Machado
06/09/2020, 3:38 PMDaskCloudProviderEnvironment
? Can the cluster be created in advance and just scale up when needed? Can the cluster be scaled down to zero?
I've been wondering if AWS lambda or similar could be used to run an agent for infrequent flows. I was envisioning a function that would start every so often, run the agent, and go back to sleep if no work needs to be done. Would this make sense at all or are the time limits a deal killer?
What other cloud services could be a good fit for a serverless execution environment that doesn't require a lot of administration?Joe Schmid
06/09/2020, 3:58 PMDoes it take long to start a new cluster withThere is startup latency with Fargate in general (nothing to do with Prefect) and when using?DaskCloudProviderEnvironment
DaskCloudProviderEnvironment
with Fargate. This makes sense, i.e. the whole point of serverless is to avoid pre-allocating compute resources and let the platform allocate them so you're trading off some startup latency for dynamic allocation.) However, for many scheduled Flows start-up latency would be a non-issue, e.g. when running a nightly scheduled Flow.Can the cluster be created in advance and just scale up when needed? Can the cluster be scaled down to zero?Yes and yes. In addition to running Prefect on Fargate, we also run on a Kubernetes cluster in this mode, i.e. cluster is created in advance then scaled up and down as needed. Like anything stateful, this comes with lifecycle management issues, e.g. if we have a new version of a docker image with updated Flows we need to update the existing cluster -- but if Flow(s) are currently running we need to wait for them to finish & gracefully update the cluster with new docker image versions, etc. The good news is there are plenty of options, e.g. Fargate, k8s, etc. so that you can make choices to optimize for Flow startup latency vs. lifecycle management, etc.
I've been wondering if AWS lambda or similar could be used to run an agent for infrequent flows.We find the Prefect Fargate Agent to be incredibly light weight. We run it as an ECS Service (long-running task) with Fargate using the lowest resources possible (1/4 vCPU & 512MB RAM) which costs about $9 per month running continuously. (All the Agent does is poll for Flows that are ready to run and launch an ECS task for each Flow run.) At that rate, I'd say it isn't even worth trying to optimize cost by running it periodically in Lambda, etc.
Pedro Machado
06/09/2020, 4:28 PMJim Crist-Harif
06/09/2020, 4:33 PMStorage
classes to support storing the flow information on e.g. S3
, and having a static image shared by all flows.Pedro Machado
06/09/2020, 4:46 PM