I feel like am so close to getting my ECS Task to run w the Prefect Community #ask-community

I feel like am so close to getting my ECS Task to ...

Sean Talia

03/03/2021, 9:43 PM

I feel like am so close to getting my ECS Task to run w/ the ECS Agent, but I'm having an odd issue where my flow that's calling this task is just hanging forever in a scheduled/pending state – I can see in the AWS console that the ECS task is being run, and prefect is indeed passing a lot of ENV variables to the container that's getting spun up, but nonetheless my flow is always stuck in a "Submitted" state and I'm not seeing any logs in the Cloud UI

Sean Talia

03/03/2021, 9:43 PM

in the AWS console:

Sean Talia

03/03/2021, 9:44 PM

all I ever see in Prefect UI:

Zanie

03/03/2021, 9:50 PM

Hey @Sean Talia, can you start your agent with the setting

PREFECT__CLOUD__AGENT__LEVEL=DEBUG

and provide the logs of it deploying the flow?

Sean Talia

03/03/2021, 9:51 PM

certainly, one second

Sean Talia

03/03/2021, 9:52 PM

Copy code

[2021-03-03 21:51:20,615] DEBUG - Sean ECS Agent | Found flow runs ['220201ce-ad50-4a59-b4ec-d08e2f7c639e']
[2021-03-03 21:51:20,615] DEBUG - Sean ECS Agent | Querying flow run metadata
[2021-03-03 21:51:20,798] INFO - Sean ECS Agent | Found 1 flow run(s) to submit for execution.
[2021-03-03 21:51:20,798] DEBUG - Sean ECS Agent | Updating states for flow run 220201ce-ad50-4a59-b4ec-d08e2f7c639e
[2021-03-03 21:51:20,803] DEBUG - Sean ECS Agent | Next query for flow runs in 0.25 seconds
[2021-03-03 21:51:20,804] DEBUG - Sean ECS Agent | Flow run 220201ce-ad50-4a59-b4ec-d08e2f7c639e is in a Scheduled state, updating to Submitted
[2021-03-03 21:51:21,033] INFO - Sean ECS Agent | Deploying flow run '220201ce-ad50-4a59-b4ec-d08e2f7c639e'
[2021-03-03 21:51:21,033] DEBUG - Sean ECS Agent | Using task definition prefect-test-task-stage:5 for flow 447bf955-c4b3-464b-a4b6-b9e15c3497a5
[2021-03-03 21:51:21,059] DEBUG - Sean ECS Agent | Querying for flow runs
[2021-03-03 21:51:21,244] DEBUG - Sean ECS Agent | No flow runs found
[2021-03-03 21:51:21,245] DEBUG - Sean ECS Agent | Next query for flow runs in 0.5 seconds
[2021-03-03 21:51:21,748] DEBUG - Sean ECS Agent | Querying for flow runs
[2021-03-03 21:51:22,084] DEBUG - Sean ECS Agent | No flow runs found
[2021-03-03 21:51:22,084] DEBUG - Sean ECS Agent | Next query for flow runs in 1.0 seconds
[2021-03-03 21:51:22,298] DEBUG - Sean ECS Agent | Started task 'arn:aws:ecs:us-east-2:<ACCOUNT-ID>:task/<CLUSTER-NAME>/91f00f616a094bef9985dc10adfe5d49' for flow run '220201ce-ad50-4a59-b4ec-d08e2f7c639e'
[2021-03-03 21:51:22,465] DEBUG - Sean ECS Agent | Completed flow run submission (id: 220201ce-ad50-4a59-b4ec-d08e2f7c639e)

Zanie

03/03/2021, 9:54 PM

Hmm okay so everything looks good on the agent's end. Perhaps your flow task is unable to communicate with the Cloud API?

Zanie

03/03/2021, 9:55 PM

I'm no ECS expert -- can you pull logs from the container?

Sean Talia

03/03/2021, 9:55 PM

yeah that was also my suspicion, I was hoping someone might have been like "oh yeah i've had this happen easy fix"

Sean Talia

03/03/2021, 9:56 PM

i also am no ECS expert, this is my first time working with it so sadly my debugging skills here are quite lacking

Zanie

03/03/2021, 9:56 PM

Haha understandable. I've pinged someone on our devops team.

Zanie

03/03/2021, 9:56 PM

Are you using a custom task definition or the default?

Sean Talia

03/03/2021, 9:57 PM

i'm using a custom task that I registered to ECS through terraform; my org has a framework for quickly spinning up tasks that have all kinds of bells and whistles attached to them that we don't want to have to manually configure

Zanie

03/03/2021, 9:58 PM

Are you running ECS on Fargate?

Sean Talia

03/03/2021, 9:58 PM

which i'm sure will make it more difficult for me provide insight into how the task has been configured 😇

Sean Talia

03/03/2021, 9:58 PM

yep

Sean Talia

03/03/2021, 9:59 PM

Zanie

03/03/2021, 9:59 PM

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html

Zanie

03/03/2021, 9:59 PM

I think the best next step is to get ahold of the container logs

💯 1

Zanie

03/03/2021, 10:00 PM

Since you're already using a custom task it should be pretty straightforward?

Sean Talia

03/03/2021, 10:00 PM

yeah i think that's my only hope

Sean Talia

03/03/2021, 10:00 PM

ha that's what I would have thought, but I don't see my logs showing up in cloudwatch either

Sean Talia

03/03/2021, 10:00 PM

(classic)

Sean Talia

03/03/2021, 10:01 PM

I'll figure out what's going on and report back when I solve this

Sean Talia

03/03/2021, 10:38 PM

okay, we're getting somewhere

Sean Talia

03/03/2021, 10:38 PM

Sean Talia

03/03/2021, 10:38 PM

it's odd, the image that i'm using for the flow is a custom one that uses

prefecthq/prefect:0.14.5-python3.8

as its base

Sean Talia

03/03/2021, 10:40 PM

it seems like the command on the container isn't getting set or overridden or something

Sean Talia

03/03/2021, 10:40 PM

https://github.com/PrefectHQ/prefect/blob/bf7a6b95ab592ad4808415f295163a64e38f1419/src/prefect/agent/ecs/agent.py#L439

Zanie

03/03/2021, 10:44 PM

Huh that's weird. Can you inspect the

command

on the actual task definition?

Sean Talia

03/03/2021, 10:51 PM

i actually didn't specify one on the task definition itself because i assumed that the

ECSRun

config was going to override it

Sean Talia

03/03/2021, 10:52 PM

but what's interesting is that i just changed the

image

in my ECS task definition itself to be something different from what my flow requires, and i'm seeing that the image is actually not being overridden either

Sean Talia

03/03/2021, 10:54 PM

but actually yes i see that thing i was just referencing

Sean Talia

03/03/2021, 10:55 PM

is happening in the

register_task_definition

function and not in the

deploy_flow

function

Sean Talia

03/03/2021, 11:07 PM

that's the problem then i think

Sean Talia

03/03/2021, 11:11 PM

I think that

container["command"] = ["/bin/sh", "-c", get_flow_run_command(flow_run)]

needs to get passed to

containerOverrides

Zanie

03/03/2021, 11:42 PM

I've pinged our run config expert 🙂 we'll look into this

Zanie

03/03/2021, 11:45 PM

It looks like right now you'd have to include the flow run command in your custom task definition

Zanie

03/03/2021, 11:46 PM

Jim said he'd look at fixing this tomorrow, basically setting a default command if you haven't. I think it's written as is so you can define more complex commands if you want.

Dylan

03/03/2021, 11:48 PM

@Marvin open “ECSRun with custom task definition does not set default container options”

Marvin

03/03/2021, 11:48 PM

https://github.com/PrefectHQ/prefect/issues/4204

Sean Talia

03/04/2021, 4:00 AM

awesome, thanks for opening this up! just to follow up on this, i've made some adjustments to my flow and got it to successfully run! it's a fairly trivial example, but it is executing the flow body and writing the logs, the only issue is that it's never getting out of the

Submitted

state, despite the logs showing that the flow actually finished

Sean Talia

03/04/2021, 4:02 AM

it's a little odd, I guess the flow states are never getting communicated back to the agent; I also see that the task results aren't getting published to S3 as I'd expect

Sean Talia

03/04/2021, 6:16 PM

would any of the prefect ECS experts have an idea of what might be happening here? I spent most of the morning on this and am pretty stumped...it's weird to me that the cloud instance is having the logs communicated back to it, logs which show the tasks starting and succeeding, and yet my Flow as a whole is never moving past the "Submitted" state and none of the task results are being written to S3 (as they are just fine when I use the DockerAgent / DockerRun pair)

Zanie

03/04/2021, 6:22 PM

What command are you using the run the flow in your task definition?

Sean Talia

03/04/2021, 6:23 PM

I'm passing these in via my flow's run config:

Copy code

"containerOverrides": [
                    {
                        "name": "flow",
                        "command": ["/bin/sh", "-c", "prefect execute flow-run"],
                    }
                ]

Sean Talia

03/04/2021, 6:26 PM

which I think is just the command that would be getting run if prefect had been responsible for creating/registering the ECS task from scratch, right?

Zanie

03/04/2021, 6:38 PM

Yeah.. hmm.

Zanie

03/04/2021, 6:44 PM

It's missing some environment variables that are also static in the original definition. This will also be addressed in that issue -- the custom task definition was a user-contributed feature and it doesn't do any setup for you as is.

👍 1

Sean Talia

03/04/2021, 6:45 PM

oh that's interesting – can you tell me which ENV variables those would be?

Sean Talia

03/04/2021, 6:46 PM

I'm basically just doing a POC right now so if I can manually cobble some stuff together to get it going I'd be ecstatic

Zanie

03/04/2021, 6:54 PM

Jim says:

Copy code

env = {
            "PREFECT__CLOUD__USE_LOCAL_SECRETS": "false",
            "PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS": "prefect.engine.cloud.CloudFlowRunner",
            "PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS": "prefect.engine.cloud.CloudTaskRunner",
        }

Zanie

03/04/2021, 6:54 PM

It's being executed using the

FlowRunner

rather than the API connected one, I presume

Sean Talia

03/04/2021, 6:56 PM

ohhhh now that's very interesting

Sean Talia

03/04/2021, 6:58 PM

for what it's worth i do have

Copy code

[cloud]
use_local_secrets = false

in the image's

~/.prefect/config.toml

but obviously overriding that w/ the ENV var is better

Sean Talia

03/04/2021, 6:59 PM

i'm going to add these and see what happens 😄

Sean Talia

03/04/2021, 7:02 PM

wow

Sean Talia

03/04/2021, 7:02 PM

@Jim Crist-Harif can I buy you a coffee

Sean Talia

03/04/2021, 7:02 PM

that did it

Sean Talia

03/04/2021, 7:18 PM

also i now see those were things being set as part of the agent's

generate_task_definition()

... sigh

Sean Talia

03/04/2021, 7:18 PM

okay, thanks for your help everyone, this is awesome

🙌 1

Sean Talia

03/04/2021, 7:47 PM

is there anything I can help with in terms of compiling all of these issues in one place and maybe making a feature suggestion/request about it?

Sean Talia

03/04/2021, 7:47 PM

I feel like my use case isn't so crazy that it wouldn't be helpful to address some of these difficulties was having in a future release

Zanie

03/04/2021, 7:53 PM

If you'd like to open an issue that explains how you setup the logging/inspecting the container logs that may be nice. Otherwise I think Jim plans to resolve all of the

generate_task_definition()

issues in a single PR.

Jim Crist-Harif

03/04/2021, 11:14 PM

See https://github.com/PrefectHQ/prefect/pull/4211

2 Views

Open in Slack

Previous Next