https://prefect.io logo
Title
m

Mike Vanbuskirk

05/17/2022, 8:32 PM
I’ve got an issue with prefect v1 and the ECS agent: I’ve got a flow that’s spinning on
Submitted for execution: Task arn:<task-arn>
with no further logs generated. In cloudwatch, it only shows:
2022-05-17T16:18:52.575-04:00	[2022-05-17 20:18:52,575] INFO - agent | Deploying flow run f2f45aac-ccfb-4bb9-88db-fb1d00426989 to execution environment...
	2022-05-17T16:18:53.805-04:00	[2022-05-17 20:18:53,805] INFO - agent | Completed deployment of flow run f2f45aac-ccfb-4bb9-88db-fb1d00426989
k

Kevin Kho

05/17/2022, 8:38 PM
This cloudwatch logs are from the specific ECS Task? Does it say anything if you go to the task page?
m

Mike Vanbuskirk

05/17/2022, 8:39 PM
they are from the agent log group, and reference that flow run ID correctly
if I click through the logs link in the task page, it shows me the info logs
2022-05-17T16:18:52.575-04:00	[2022-05-17 20:18:52,575] INFO - agent | Deploying flow run f2f45aac-ccfb-4bb9-88db-fb1d00426989 to execution environment...
	2022-05-17T16:18:53.805-04:00	[2022-05-17 20:18:53,805] INFO - agent | Completed deployment of flow run f2f45aac-ccfb-4bb9-88db-fb1d00426989
	2022-05-17T16:38:47.650-04:00	[2022-05-17 20:38:47,650] INFO - agent | Deploying flow run f2f45aac-ccfb-4bb9-88db-fb1d00426989 to execution environment...
	2022-05-17T16:38:48.748-04:00	[2022-05-17 20:38:48,748] INFO - agent | Completed deployment of flow run f2f45aac-ccfb-4bb9-88db-fb1d00426989
k

Kevin Kho

05/17/2022, 8:41 PM
Ohh are you using a LocalDaskExecutor?
m

Mike Vanbuskirk

05/17/2022, 8:42 PM
data engineer is submitting a flow run to the ECS Agent via prefect?
it shows up in the Prefect Cloud UI, tied to the ECS agent
k

Kevin Kho

05/17/2022, 8:43 PM
I know but what is the executor on the Flow?
m

Mike Vanbuskirk

05/17/2022, 8:43 PM
where can I see that information?
k

Kevin Kho

05/17/2022, 8:55 PM
Ah if you didn’t specify any, it should just be the default
LocalExecutor
. I was asking because the
LocalDaskExecutor
with processes was not sending logs recently. If the Flow is still stuck in Submitted on the Prefect UI, I think we have a different issue though. Looking at this again, it feels like the container may not have started right. This is the log associated with the agent. Do you have logs associated with the Flow?
m

Mike Vanbuskirk

05/17/2022, 8:57 PM
I never get a log entry beyond “submitted”
k

Kevin Kho

05/17/2022, 8:58 PM
I get that for the Prefect side, it likely means there was an error even before the Flow started running. Is the Flow configured to log to CloudWatch too?
m

Mike Vanbuskirk

05/17/2022, 8:59 PM
how do you configure that?
I basically set up the TF module Prefect provided, and AFAIK the Prefect Cloud config is fairly standard
k

Kevin Kho

05/17/2022, 9:02 PM
This is a good example.
So this will add logging to the Flow so you can get visibility into errors that happen in between Flow spin up and execution (the logger is not spun up yet which is why we dont get visibility)
m

Mike Vanbuskirk

05/17/2022, 9:05 PM
ah
and that should work out of the box, since we already configured cloudwatch permissions via the provided TF module, correct?
k

Kevin Kho

05/17/2022, 9:09 PM
Yes as long as the cloudwatch log group already exists
m

Mike Vanbuskirk

05/17/2022, 9:15 PM
ok, I’ll give that a try and report back, one moment
just to clear up…. that’s a config for a Flow run right, not an agent?
k

Kevin Kho

05/17/2022, 9:23 PM
Yes that is attached to the Flow so you need to re-register
m

Mike Vanbuskirk

05/17/2022, 9:54 PM
lol now I don’t even get the “Submitted” log
just spins on “scheduled”
k

Kevin Kho

05/17/2022, 9:56 PM
Uhh that shouldn’t be the case. You just added the logging section right to the Flow right? Is the agent still on to pick it up?
m

Mike Vanbuskirk

05/17/2022, 9:57 PM
agent is live
I just copy/pasted the basic tutorial flow to eliminate other variables
@task
def hello_task():
    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>("Hi from Prefect %s from flow %s", prefect.__version__, FLOW_NAME)
    return


with Flow(FLOW_NAME, run_config=RUN_CONFIG) as flow:
    hello_task()

flow.register(project_name="mike-test")
k

Kevin Kho

05/17/2022, 9:58 PM
Ah this looks like it used the default local storage so it will add a local host label by default. So now you have a label mismatch between Flow and agent so the agent won’t pick it up.
m

Mike Vanbuskirk

05/17/2022, 10:00 PM
ah, so labels aren’t partial
interestingly enough my data engineer’s flow had the same problem but he got “submitted”
k

Kevin Kho

05/17/2022, 10:01 PM
I think they must have used another storage without the default labels. Only the default local storage has the default labels. Local storage will also not work on ECS because the Flow is in your local machine and the container won’t have access to it. Agent labels must be a superset of Flow labels to be able to pick it up.
m

Mike Vanbuskirk

05/17/2022, 10:01 PM
I suspect he had other config abnormalities tho
k

Kevin Kho

05/17/2022, 10:04 PM
Typically with ECS, you use S3 storage or Docker storage hosted in ECR. What I really suspect happened to your initial error though was that the agent could not pull the image (appropriate IAM roles) or the image just can’t run on ECS (architecture issue like built on an M1 MAC)
m

Mike Vanbuskirk

05/17/2022, 10:04 PM
hmm
is all of this in the docs for running on ECS?
if so, totally missed it
also, is this the error that indicates local/cloud mismatch?
An error occurred (ServerException) when calling the RunTask operation (reached max retries: 2): Service Unavailable. Please try again later.
k

Kevin Kho

05/17/2022, 10:07 PM
Not exactly? I think the docstring here has a bunch but not specifically this
I believe that is an ECS issue specifically, not a Prefect log
m

Mike Vanbuskirk

05/17/2022, 10:08 PM
FWIW I do not see anything re: the contents of this discussion on: https://docs.prefect.io/orchestration/agents/ecs.html#ecs-agent
in fact in a couple of places it seems to imply local storage is ok:
https://docs.prefect.io/orchestration/agents/ecs.html#ecs-agent
To provide your own task definition template, you can use the
--task-definition
flag. This takes a path to a job template YAML file. The path can be local to the agent, or stored in cloud storage on S3.
k

Kevin Kho

05/17/2022, 10:11 PM
Ah well, more like, you can use anything but local (CodeCommit, Bitbucket, Github, etc.).
That is not the Flow storage. That is for the task definition for the ECS task
m

Mike Vanbuskirk

05/17/2022, 10:12 PM
k
so pretty clearly no reference to this requirement on that page then
k

Kevin Kho

05/17/2022, 10:13 PM
If you do use local Flow storage, you get an error that’s described in the FAQ. You can actually get it if you do:
with Flow(...) as flow:
    ...

flow.storage.add_default_labels = False
m

Mike Vanbuskirk

05/17/2022, 10:16 PM
ok, so you actually need a remote… storage mechanism somewhere files are hosted?
k

Kevin Kho

05/17/2022, 10:18 PM
Any besides local. Well you can use local actually if the Flow file already lives inside the container that the ECS task is using. It will be looked for relative to the container file paths
This is an example where you can use local storage with Kubernetes and the flow file will live inside the image specified
m

Mike Vanbuskirk

05/17/2022, 10:22 PM
ok
do I need to upload the flow file to the bucket first, or does that occur as part of registration?
k

Kevin Kho

05/17/2022, 10:23 PM
It occurs as part of registration for you
m

Mike Vanbuskirk

05/17/2022, 10:32 PM
kk, still getting that “Service Unavailable”, which in a random SO thread seems to indicate a config value is null somewhere?
k

Kevin Kho

05/17/2022, 10:36 PM
I saw that. Hard to tell what causes that. If you just want to test a working setup, you could use the Prefect image. What kind of things are you setting in ECSRun or do you have your own task definition?
m

Mike Vanbuskirk

05/17/2022, 10:37 PM
literally just copied that github file link
including using the Prefect image
only changes are a couple of hard coded values
k

Kevin Kho

05/17/2022, 10:40 PM
Yeah that seems like it should work. Have not encountered that specific error message before myself.
m

Mike Vanbuskirk

05/17/2022, 10:48 PM
where’s the best place to post issues for official response?
k

Kevin Kho

05/17/2022, 10:49 PM
from Prefect? or from AWS? That specific is hard to help without a reproducible example
m

Mike Vanbuskirk

05/17/2022, 11:00 PM
Prefect
if this keeps happening without some kind of status change from AWS re: service availability I think we can reasonably eliminate that as the actual cause
k

Kevin Kho

05/17/2022, 11:05 PM
That’s not entirely true though. We do see some ECS containers fail to spin up because they fail to get resources intermittently (dunno if the AWS service is really down here). The StackOverflow post you mentioned suggests something wrong with the Task Definition, which we just pass through. I don’t know what more official response you can get. You can post a Github issue but we really need a reproducible example for us to debug it. You could also reach out to Professional Services and they will help debug.
m

Mike Vanbuskirk

05/17/2022, 11:08 PM
ok, no problem, thank you again, I’ll take this back to our group, I know we’re in evaluation stage with some dataflow products so this is valuable information
👍 1
k

Kevin Kho

05/17/2022, 11:10 PM
Of course! If you find a path to reproduce that, you could open an issue on the repo too
m

Mike Vanbuskirk

05/17/2022, 11:21 PM
well it still happens on that flow….
I dunno if that qualifies as reproducible
k

Kevin Kho

05/17/2022, 11:22 PM
If you DM me the code I can take a look
m

Mike Vanbuskirk

05/18/2022, 12:29 AM
sent