Hey everyone – has anybody ever run into an issue ...
# ask-community
s
Hey everyone – has anybody ever run into an issue where you're running flows with a
DockerAgent
, and for some reason, the agent seems to keep a flow's container running long after the flow actually finished?
k
Hey @Sean Talia, did the flow succeed? or exit abruptly?
s
let me take a look
I was just doing some docker system cleanup on our EC2 instance that runs some of our flows and noticed this
k
Cancelling may cause that
s
sorry this is taking me a little while to figure out, we apparently have multiple flows that use this particular image
I see that some of them are failing, but they weren't canceled
yeah, it's hard to say; I see some failed flows that use the image i'm seeing in the
docker ps- a
output, but the flow start times of that don't line up with the container launch times
k
Was the failure something aggressive like a system.exit()? Something that kills the Prefect process?
Ah I see, could have been development?
s
it's somewhat aggressive – the failure is:
Copy code
]: Unexpected error: ClientError('An error occurred (404) when calling the HeadObject operation: Not Found')
but I think that's coming from the code of the eng. on my team who wrote the flow, and not from prefect itself
k
I dont imagine that would prevent a container from turning off though
s
yeah that's what I figured as well
well...I'm sorry I don't have any better information here! I'll monitor this a little more actively going forward and let you know if I find out anything of note
k
Sounds good! Sorry I dont have better ideas as well 😆
a
@Sean Talia reg that error, looks like it came from flow code and its boto3 error.
s
yeah it is, and it's coming from my colleague's code that interfaces with S3, so nothing prefect-specific 😄
😄 1
hey @Kevin Kho I wanted to resurface this now that it's been a couple of weeks. There was a fix put in on our end for those flows that were failing, but the containers are still piling up on the EC2 instance that our Docker agent is running on
k
Will ask the team about this
So even if the flow succeeds, the containers are still on?
s
yep, we haven't had any failures with any of these flows and the containers are just sticking around
k
What Prefect version are you on? Will try it out myself
s
we're on 0.14.17
but the thing is, this isn't happening with the vast majority of our flows, it's only for one very specific set of flows that all use this 1 image
Copy code
CONTAINER ID   IMAGE           COMMAND                  CREATED        STATUS        PORTS     NAMES
5d2231abda89   <IMAGE_NAME>    "tini -g -- entrypoi…"   4 hours ago    Up 4 hours              funny_neumann
9995b5ac9338   <IMAGE_NAME>    "tini -g -- entrypoi…"   6 hours ago    Up 6 hours              funny_mayer
b69e07926b50   <IMAGE_NAME>    "tini -g -- entrypoi…"   14 hours ago   Up 14 hours             vibrant_dewdney
1a043c6b6525   <IMAGE_NAME>    "tini -g -- entrypoi…"   28 hours ago   Up 28 hours             stoic_shaw
709dbfc174bd   <IMAGE_NAME>    "tini -g -- entrypoi…"   30 hours ago   Up 30 hours             happy_hypatia
c2bb328ea70d   <IMAGE_NAME>    "tini -g -- entrypoi…"   38 hours ago   Up 38 hours             gracious_khayyam
f025ac41e9ca   <IMAGE_NAME>    "tini -g -- entrypoi…"   2 days ago     Up 2 days               sharp_burnell
f37b34d070be   <IMAGE_NAME>    "tini -g -- entrypoi…"   2 days ago     Up 2 days               upbeat_borg
6899e1d12911   <IMAGE_NAME>    "tini -g -- entrypoi…"   2 days ago     Up 2 days               crazy_lumiere
0bad739a2f0b   <IMAGE_NAME>    "tini -g -- entrypoi…"   2 days ago     Up 2 days               busy_jemison
1c31b03aeae3   <IMAGE_NAME>    "tini -g -- entrypoi…"   2 days ago     Up 2 days               keen_villani
ea480f12ba18   <IMAGE_NAME>    "tini -g -- entrypoi…"   2 days ago     Up 2 days               charming_kalam
395cf2e9ea80   <IMAGE_NAME>    "tini -g -- entrypoi…"   3 days ago     Up 3 days               admiring_torvalds
k
Ok i’ll try a bunch of things and see what i find
Man uhh…I tried both 0.14.17 and 0.15.6 and it clears the container after the flow run either fails or succeeds. Do you have any more clues for me to try? Anything unique about your container?
s
hahah yes i figured it was not going to be easy to replicate 😄
let me take a look at this image / dockerfile and see if there's anything of note going on in here
k
Even doing
sys.exit(0)
inside a task is able to clear the container
s
I'll be honest I'm not seeing anything immediately obvious in this dockerfile that would raise alarms; the only thing that's being done here that's not being done in all our other images where we're not seeing this is the
$PATH
and
$PYTHONPATH
env vars are being touched, but I don't think that should really cause issues
@Kevin Kho can I ask what version of docker you're running?
k
Good question one sec
Copy code
Server: Docker Engine - Community
 Engine:
  Version:          20.10.6
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8728dd2
  Built:            Fri Apr  9 22:44:13 2021
  OS/Arch:          linux/arm64
  Experimental:     false
s
that is precisely the same version i'm on lol
well @Kevin Kho this has been solved, and it's the culprit you might expect
snowflake connection not being closed out...
that took some digging, sorry for sending you down a rabbithole
k
Oh nice! Is that from our task?
s
no no, that was from some code that one of our internal teams had written, I've had them fix it and now we're all good