I m running flows on ECS using fargate is it normal to see ` Prefect Community #ask-community

I'm running flows on ECS using fargate; is it norm...

Constantino Schillebeeckx

02/22/2022, 8:59 PM

I'm running flows on ECS using fargate; is it normal to see

RUNNING

tasks like those shown below which have been running for days? Are those truly still running? Am I being billed for idle compute here?

Anna Geller

02/22/2022, 9:10 PM

I think that RUNNING state indeed indicates that those ECS tasks are still in progress. Can you cross-check and match the ECS tasks with the flow runs in your Prefect Cloud UI? Did the corresponding flow runs finished without any issues? If you are on Prefect Cloud, you can send us the flow run ID so that we can cross check on our end as well. Regarding billing, you can attach cost allocation tags to then check in your AWS Billing dashboard for what exactly are you billed and how much. To attach tags, I think you would need to either modify the existing cluster or create a new one, with CLI it can be done using:

Copy code

aws ecs create-cluster --cluster-name prefectEcsCluster --tags key=keyname,value=actualValue

Then, you would need to use the --propagateTags flag when starting an ECS service for the agent.

Anna Geller

02/22/2022, 9:12 PM

also: can you send a summary of your setup? 1. Prefect Cloud or Server? 2. Do you use Fargate or EC2 launch type? 3. How did you start the agent and the flow runs? 4. Any chance you can share one full flow definition one of those that hangs in a running state on ECS?

Constantino Schillebeeckx

02/22/2022, 9:13 PM

1. Prefect Cloud 2. Fargate 3. we manage the agent through terraform

Constantino Schillebeeckx

02/22/2022, 9:14 PM

Copy code

{
  "ipcMode": null,
  "executionRoleArn": "arn:aws:iam::792470144447:role/prefect-ecs-execution-role",
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "environmentFiles": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "secretOptions": null,
        "options": {
          "awslogs-group": "/ecs/prefect-tasks",
          "awslogs-region": "us-west-2",
          "awslogs-stream-prefix": "constantino_schillebeeckx-salesforce_extract"
        }
      },
      "entryPoint": null,
      "portMappings": [],
      "command": null,
      "linuxParameters": null,
      "cpu": 0,
      "environment": [
        {
          "name": "PREFECT__CONTEXT__IMAGE",
          "value": "<http://792470144447.dkr.ecr.us-west-2.amazonaws.com/dwh:cleanup_iam|792470144447.dkr.ecr.us-west-2.amazonaws.com/dwh:cleanup_iam>"
        }
      ],
      "resourceRequirements": null,
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [],
      "workingDirectory": null,
      "secrets": null,
      "dockerSecurityOptions": null,
      "memory": null,
      "memoryReservation": null,
      "volumesFrom": [],
      "stopTimeout": null,
      "image": "<http://792470144447.dkr.ecr.us-west-2.amazonaws.com/dwh:cleanup_iam|792470144447.dkr.ecr.us-west-2.amazonaws.com/dwh:cleanup_iam>",
      "startTimeout": null,
      "firelensConfiguration": null,
      "dependsOn": null,
      "disableNetworking": null,
      "interactive": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "pseudoTerminal": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": null,
      "systemControls": null,
      "privileged": null,
      "name": "flow"
    }
  ],
  "placementConstraints": [],
  "memory": "16384",
  "taskRoleArn": "arn:aws:iam::792470144447:role/prefect-ecs-task-role",
  "compatibilities": [
    "EC2",
    "FARGATE"
  ],
  "taskDefinitionArn": "arn:aws:ecs:us-west-2:792470144447:task-definition/prefect-salesforce-extract:88",
  "family": "prefect-salesforce-extract",
  "requiresAttributes": [
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.execution-role-awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.ecr-auth"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.task-iam-role"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.execution-role-ecr-pull"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.task-eni"
    }
  ],
  "pidMode": null,
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "runtimePlatform": null,
  "cpu": "2048",
  "revision": 88,
  "status": "INACTIVE",
  "inferenceAccelerators": null,
  "proxyConfiguration": null,
  "volumes": [],
  "statusString": "(INACTIVE)"
}

Constantino Schillebeeckx

02/22/2022, 9:15 PM

Note we're using

prefecthq/prefect:0.15.13-python3.8

for the agent

👍 1

Constantino Schillebeeckx

02/22/2022, 9:16 PM

Copy code

PREFECT__CONTEXT__FLOW_ID	e9aaf08c-4beb-4b36-b40d-0a73700f03e7
PREFECT__CONTEXT__FLOW_RUN_ID	352ba0ba-2ea8-4ec2-8acb-fad120376b8d

for the above definition

Constantino Schillebeeckx

02/22/2022, 9:18 PM

uho

Anna Geller

02/22/2022, 9:20 PM

do you happen to know why someone tried to cancel this flow run? for some reason, it looks like someone or some process tried to cancel this flow run but it didn't work - the flow run stayed in a Cancelling state and it still doesn't have end time... Something went wrong here for sure and good catch that you found it now rather than after months.

Constantino Schillebeeckx

02/22/2022, 9:22 PM

i just cancelled it cause it had been running for 13 days (as shown above)

👍 1

Anna Geller

02/22/2022, 9:22 PM

maybe you could cancel those runs manually and in the worst case set the state to Cancelled via API and manually stop those zombie ECS tasks. And to avoid that in the future, maybe you can add an Automation to automatically cancel a flow run if it doesn't finish within X time (the max duration of your normal flow run e.g. 4 hours)

Anna Geller

02/22/2022, 9:23 PM

oh sorry, I must have been misled because the timestamp of the Cancelling state is 8th of February rather than today

Constantino Schillebeeckx

02/22/2022, 9:24 PM

i was just gonna ask if there's a way to configure max run time of a flow. what type of automation are you suggesting? another flow that checks in on ECS?

Anna Geller

02/22/2022, 9:25 PM

we have those flow SLA failure automations allowing you to cancel a flow run if it didn't finish within e.g. 4 hours https://docs.prefect.io/orchestration/concepts/automations.html#flow-sla-failure

Anna Geller

02/22/2022, 9:25 PM

so it looks like only one of mapped tasks got stuck for some reason

Anna Geller

02/22/2022, 9:28 PM

but you need to configure such Automation SLA for each flow run separately, there's no way to set it once for all flows

Constantino Schillebeeckx

02/22/2022, 9:30 PM

sadness - ok thanks for all the help - I'll have to build around this

Anna Geller

02/22/2022, 9:32 PM

Understandable, sorry to hear about this issue and good you found it out!

🙌 1

Kevin Kho

02/22/2022, 10:51 PM

Was this really running for 13 days? I think you can check for your for open database connections because those tend to keep containers running even after flow execution. But normally it would be completed on the Prefect end. This looks like there was some activity otherwise Prefect would mark it as failed (no heartbeat)

8 Views

Open in Slack

Previous Next