Hi Prefect community, I'm having trouble getting ...
# prefect-server
a
Hi Prefect community, I'm having trouble getting the Docker Agent working when deployed in a Docker container. I have registered a flow with Docker storage in the cloud. I'm building the Docker image in a slightly non-standard way, but the flow works fine as long as I execute it though a Docker agent running on my laptop (i.e. not in a docker container). Next, I try to deploy an agent using docker stack and a simple docker-compose.yml, but this seems unable to execute the same flow correctly. Here is what I see: - There is connection to cloud, agent says: "Found 1 flow run(s) to submit for execution." - There is connection to the private registry where flow is stored, agent says: "Successfully pulled image *****/template-flow:dbea2161" - I think flow container works, agent says: " Docker container ***** started" - Strangely: - agent reports job complete (?): "Completed flow run submission (id: a507921f-b9c6-400a-943f-a984b99eadbc)" - cloud UI shows both tasks for that flow run as pending. If I enable " --show-flow-logs" for the Docker-agent-in-Docker, the agent prints below traceback (after above list of output). So my guess is that something inside my flow container is trying to talk to the cloud API, but somehow failing due to missing credentials. I don't understand why it's failing in the Docker-agent-in-Docker case and not in the Docker-agent-from-terminal case. Any hints appreciated.
Copy code
Traceback (most recent call last):
   File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 451, in _request
     json_resp = response.json()
   File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 900, in json
     return complexjson.loads(self.text, **kwargs)
   File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
     return _default_decoder.decode(s)
   File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
   File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
     raise JSONDecodeError("Expecting value", s, err.value) from None
 json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
 
 The above exception was the direct cause of the following exception:
 
 Traceback (most recent call last):
   File "/usr/local/bin/prefect", line 8, in <module>
     sys.exit(cli())
   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
     return self.main(*args, **kwargs)
   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
     rv = self.invoke(ctx)
   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
     return _process_result(sub_ctx.command.invoke(sub_ctx))
   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
     return _process_result(sub_ctx.command.invoke(sub_ctx))
   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
     return ctx.invoke(self.callback, **ctx.params)
   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
     return callback(*args, **kwargs)
   File "/usr/local/lib/python3.8/site-packages/prefect/cli/execute.py", line 49, in flow_run
     result = client.graphql(query)
   File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 298, in graphql
     result = <http://self.post|self.post>(
   File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 213, in post
     response = self._request(
   File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 454, in _request
     raise ClientError(
 prefect.utilities.exceptions.ClientError: Malformed response received from Cloud - please ensure that you have an API token properly configured.
k
Hey @Anton Rasmussen, It looks like your Agent is looking to the Cloud API endpoint, rather than a local Server endpoint, I’d make sure your backend is set to your Server endpoint by running the
prefect backend server
command or changing your backend.toml file to be configured to “server”. You can also use specify your API endpoint in your config.toml file.
Just realized, I assumed you were working on a Local API setup because we are in #prefect-server, but just in case you did want to set your Agent towards the Cloud API: I would check to ensure your RUNNER token is accessible to this Agent and retrievable for runtime, in addition to checking outbound network access to the Cloud API. Once confirmed, it might be a good time to refresh your tokens to ensure old tokens aren’t being used accidentally.
a
Hi @Kyle Moon-Wright Thanks for your responses. Yeah, I posted in the wrong channel for sure. Not sure if I can/should move? Where would be a better place for this kind of question? I'm pretty sure my agent can connect and authenticate with the cloud: I can see it printing "Found 1 flow..." when I manually start a quick run in the cloud UI. So the agent-in-docker seems to have connection, but then the flow-in-docker(?) produces the error message about not having an API token? I can understand that the flow may need an API token to communicate results to cloud, but not sure how it gets it (or not) from the agent....
k
Hmm, this is a tough one - typically we recommend a single flow to a single Docker image because of the complications that can arise. It does seem like something in the flow is calling out to Cloud, but that shouldn’t be the case - only the Agent needs outbound access to Cloud and will relay task metrics. Is there some query going on inside your flow at all? Maybe a stray CLI command?
a
The flow does almost nothing (flow_definition.py in its entirety below) and is indeed a single flow in a single docker image. Notably, it does work when initiated from my laptop agent. Good to know that the flow shouldn't contact the cloud. But that make me confused about the traceback I posted.... It does seems to indicate a prefect Client() trying to talk to a cloud(?) API with a call to the .graphql() method. I believe the traceback comes from inside the flow container since the traceback only appears with " --show-flow-logs"... Is this the flow container trying to talk to an API on the agent?
Copy code
import datetime

import prefect
from prefect import task, Flow
from prefect.schedules import IntervalSchedule

FLOW_NAME = "hello-flow"

@task
def say_hello():
    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>("Hello, Cloud!")
    return "said hello"

@task
def say_goodbye(input):
    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>(f"I just {input}... Goodbye, Cloud!")
    return "said goodbye"

schedule = IntervalSchedule(interval=datetime.timedelta(minutes=2))

with Flow(FLOW_NAME, schedule) as flow:
    res1 = say_hello()
    res2 = say_goodbye(res1)
May be worth pointing out I'm trying to do this with Docker Swarm... Poking around with docker networks and the agent --network option allows me to get the flow containers spawned by agent to connect to a predefined network. I still get the traceback from above, though. How do Docker flows call back to the Docker agent? (it doesn't seem to be at the --agent-address). Is it at location specified by PREFECT__CLOUD__API env passed to flow container in DockerAgent.populate_env_vars()? Some magic I don't fully understand appears to happen around here: https://github.com/PrefectHQ/prefect/blob/46afb3a394da1d9a3ff4481b66666f487a312041/src/prefect/agent/docker/agent.py#L472.