Thread
#prefect-server
    a

    Anton Rasmussen

    1 year ago
    Hi Prefect community, I'm having trouble getting the Docker Agent working when deployed in a Docker container. I have registered a flow with Docker storage in the cloud. I'm building the Docker image in a slightly non-standard way, but the flow works fine as long as I execute it though a Docker agent running on my laptop (i.e. not in a docker container). Next, I try to deploy an agent using docker stack and a simple docker-compose.yml, but this seems unable to execute the same flow correctly. Here is what I see:- There is connection to cloud, agent says: "Found 1 flow run(s) to submit for execution." - There is connection to the private registry where flow is stored, agent says: "Successfully pulled image *****/template-flow:dbea2161" - I think flow container works, agent says: " Docker container ***** started" - Strangely: - agent reports job complete (?): "Completed flow run submission (id: a507921f-b9c6-400a-943f-a984b99eadbc)" - cloud UI shows both tasks for that flow run as pending. If I enable " --show-flow-logs" for the Docker-agent-in-Docker, the agent prints below traceback (after above list of output). So my guess is that something inside my flow container is trying to talk to the cloud API, but somehow failing due to missing credentials. I don't understand why it's failing in the Docker-agent-in-Docker case and not in the Docker-agent-from-terminal case. Any hints appreciated.
    Traceback (most recent call last):
       File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 451, in _request
         json_resp = response.json()
       File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 900, in json
         return complexjson.loads(self.text, **kwargs)
       File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
         return _default_decoder.decode(s)
       File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
       File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
         raise JSONDecodeError("Expecting value", s, err.value) from None
     json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
     
     The above exception was the direct cause of the following exception:
     
     Traceback (most recent call last):
       File "/usr/local/bin/prefect", line 8, in <module>
         sys.exit(cli())
       File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
         return self.main(*args, **kwargs)
       File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
         rv = self.invoke(ctx)
       File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
         return _process_result(sub_ctx.command.invoke(sub_ctx))
       File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
         return _process_result(sub_ctx.command.invoke(sub_ctx))
       File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
         return ctx.invoke(self.callback, **ctx.params)
       File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
         return callback(*args, **kwargs)
       File "/usr/local/lib/python3.8/site-packages/prefect/cli/execute.py", line 49, in flow_run
         result = client.graphql(query)
       File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 298, in graphql
         result = <http://self.post|self.post>(
       File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 213, in post
         response = self._request(
       File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 454, in _request
         raise ClientError(
     prefect.utilities.exceptions.ClientError: Malformed response received from Cloud - please ensure that you have an API token properly configured.
    Kyle Moon-Wright

    Kyle Moon-Wright

    1 year ago
    Hey @Anton Rasmussen, It looks like your Agent is looking to the Cloud API endpoint, rather than a local Server endpoint, I’d make sure your backend is set to your Server endpoint by running the
    prefect backend server
    command or changing your backend.toml file to be configured to “server”. You can also use specify your API endpoint in your config.toml file.
    Just realized, I assumed you were working on a Local API setup because we are in #prefect-server, but just in case you did want to set your Agent towards the Cloud API: I would check to ensure your RUNNER token is accessible to this Agent and retrievable for runtime, in addition to checking outbound network access to the Cloud API. Once confirmed, it might be a good time to refresh your tokens to ensure old tokens aren’t being used accidentally.
    a

    Anton Rasmussen

    1 year ago
    Hi @Kyle Moon-Wright Thanks for your responses. Yeah, I posted in the wrong channel for sure. Not sure if I can/should move? Where would be a better place for this kind of question? I'm pretty sure my agent can connect and authenticate with the cloud: I can see it printing "Found 1 flow..." when I manually start a quick run in the cloud UI. So the agent-in-docker seems to have connection, but then the flow-in-docker(?) produces the error message about not having an API token? I can understand that the flow may need an API token to communicate results to cloud, but not sure how it gets it (or not) from the agent....
    Kyle Moon-Wright

    Kyle Moon-Wright

    1 year ago
    Hmm, this is a tough one - typically we recommend a single flow to a single Docker image because of the complications that can arise. It does seem like something in the flow is calling out to Cloud, but that shouldn’t be the case - only the Agent needs outbound access to Cloud and will relay task metrics. Is there some query going on inside your flow at all? Maybe a stray CLI command?
    a

    Anton Rasmussen

    1 year ago
    The flow does almost nothing (flow_definition.py in its entirety below) and is indeed a single flow in a single docker image. Notably, it does work when initiated from my laptop agent. Good to know that the flow shouldn't contact the cloud. But that make me confused about the traceback I posted.... It does seems to indicate a prefect Client() trying to talk to a cloud(?) API with a call to the .graphql() method. I believe the traceback comes from inside the flow container since the traceback only appears with " --show-flow-logs"... Is this the flow container trying to talk to an API on the agent?
    import datetime
    
    import prefect
    from prefect import task, Flow
    from prefect.schedules import IntervalSchedule
    
    FLOW_NAME = "hello-flow"
    
    @task
    def say_hello():
        logger = prefect.context.get("logger")
        <http://logger.info|logger.info>("Hello, Cloud!")
        return "said hello"
    
    @task
    def say_goodbye(input):
        logger = prefect.context.get("logger")
        <http://logger.info|logger.info>(f"I just {input}... Goodbye, Cloud!")
        return "said goodbye"
    
    schedule = IntervalSchedule(interval=datetime.timedelta(minutes=2))
    
    with Flow(FLOW_NAME, schedule) as flow:
        res1 = say_hello()
        res2 = say_goodbye(res1)
    May be worth pointing out I'm trying to do this with Docker Swarm... Poking around with docker networks and the agent --network option allows me to get the flow containers spawned by agent to connect to a predefined network. I still get the traceback from above, though. How do Docker flows call back to the Docker agent? (it doesn't seem to be at the --agent-address). Is it at location specified by PREFECT__CLOUD__API env passed to flow container in DockerAgent.populate_env_vars()? Some magic I don't fully understand appears to happen around here: https://github.com/PrefectHQ/prefect/blob/46afb3a394da1d9a3ff4481b66666f487a312041/src/prefect/agent/docker/agent.py#L472.