I am running into some issues deploying Docker Con...
# ask-community
n
I am running into some issues deploying Docker Containers that I am not quite sure how to debug. I am using bitbucket as my flow storage and Docker as my run environment. I try to run the flow through my Docker Agent and it successfully pulls down the docker image and then does nothing else for a long time until it seems to fail silently and a Lazarus process restarts it. Any ideas on where to start looking to figure out the issue? The logs aren't any help and I can't tell why it is failing.
m
Hello Nelson! Could you try setting
PREFECT__LOGGING__LEVEL=DEBUG
on your agent? Or can you share a redacted flow you are trying to run?
j
I have similar issue for kubernetes run. The jobs are submitted for execution but nothing happens.
k
Hi @Jay Shah, did you resolve your issue? Saw you found the thread with AKS.
n
Sorry for the late reply. With logging levels on debug I still don't see much output.
Copy code
2021-05-10 14:29:21,578] DEBUG - agent | {'status': 'Pulling from drinvest/falcon-test', 'id': 'latest'}
[2021-05-10 14:29:21,584] DEBUG - agent | {'status': 'Digest: sha256:4d1e400e423496a0a12baf7bf17ffd478f2f941761ab2e52b2ae0b5c54fdbe67'}
[2021-05-10 14:29:21,591] DEBUG - agent | {'status': 'Status: Downloaded newer image for drinvest/falcon-test:latest'}
[2021-05-10 14:29:21,597] INFO - agent | Successfully pulled image drinvest/falcon-test:latest...
[2021-05-10 14:29:21,597] DEBUG - agent | Creating Docker container drinvest/falcon-test:latest
[2021-05-10 14:29:21,678] DEBUG - agent | Starting Docker container with ID 860fe847ccc4f76f296cc07b773846fe583777b40528c4f7cf0da2cf2b3d0d3f
[2021-05-10 14:29:22,037] DEBUG - agent | Docker container 860fe847ccc4f76f296cc07b773846fe583777b40528c4f7cf0da2cf2b3d0d3f started
[2021-05-10 14:29:22,205] DEBUG - agent | Querying for flow runs
[2021-05-10 14:29:22,244] DEBUG - agent | Completed flow run submission (id: f18d8593-8b85-43e9-bf1d-533abc017539)
It does this and then it dies again. I am happy to share my flow and Dockerfile if that would help? @George Coyne wondering if you have any suggestions for me?
g
@Nelson Griffiths Hit me on DM we can take a look
πŸ‘ 1
n
I have recreated a more basic version of my flow and Dockerfile and it seems like the Docker Container is failing on pulling my flow from bitbucket cloud. The local run_config pulls it successfully but when run in a Docker environment it seems to look in Bitbucket server instead of bitbucket cloud. Wondering if there are any suggestions on how I could point my Docker container to Bitbucket cloud?
k
Hi @Nelson Griffiths! You mean from Bitbucket Storage? Have you seen the keyword arguments for Cloud? what version of Prefect are you on?
This change was in 0.14.16 -> changelog
n
So I am on
0.14.17
it seems like the most recent dockerfiles for prefect might not support bitbucket cloud yet. I am successfully pulling from bitbucket cloud when running a local agent, just not a docker agent. So I think I may have got around that problem by specifiying prefect[bitbucket] in my requirements.txt and forcing a reinstall with pip in the dockerfile
I seemed to fix the issue. I was using a pytorch base image before. I believe that prefect might not like pulling down a 4GB docker image and that was causing the issues? With a smaller docker image it is working great. Thanks for all the help!
k
Glad you fixed it! πŸ™‚
d
I seem to be having a similar issue and I wonder how close it is.
Copy code
[2021-05-10 23:31:51,705] DEBUG - Docker Agent | {'status': 'Pulling from sq-kp-infra-prod/prefect', 'id': 'latest'}
[2021-05-10 23:31:51,710] DEBUG - Docker Agent | {'status': 'Digest: sha256:1922df5b7a244613be41225cd6ac2f4c10532476698d935ae5e083468013eb3b'}
[2021-05-10 23:31:51,710] DEBUG - Docker Agent | {'status': 'Status: Image is up to date for <http://gcr.io/sq-kp-infra-prod/prefect:latest'|gcr.io/sq-kp-infra-prod/prefect:latest'>}
[2021-05-10 23:31:51,716] INFO - Docker Agent | Successfully pulled image ***...
[2021-05-10 23:31:51,716] DEBUG - Docker Agent | Creating Docker container ***
[2021-05-10 23:31:51,778] DEBUG - Docker Agent | Starting Docker container with ID 3c217898d484b0a2b29ae61b83279014d5a48f154f800f936687972c7524c25b
[2021-05-10 23:31:52,086] DEBUG - Docker Agent | Docker container 3c217898d484b0a2b29ae61b83279014d5a48f154f800f936687972c7524c25b started
[2021-05-10 23:31:52,208] DEBUG - Docker Agent | Completed flow run submission (id: f8ca3883-71dc-4c43-a182-cb9878d12457)
When I do
docker ps -a
I never actually find any running or stopped containers.
Copy code
import prefect
import time
from prefect import task, Flow
from prefect.storage import GCS
from prefect.run_configs import DockerRun

@task
def say_hello():
    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>("Hello, Cloud!")
    time.sleep(200)

with Flow("hello-flow", storage=GCS(bucket="***")) as flow:
    say_hello()


flow.run_config = DockerRun(image="***")

flow.register(project_name="tutorial")
k
Hi @Damien Ramunno-Johnson , I’ll try to replicate this later.
πŸ‘ 1
d
To make it easier I switched to the base prefect image
Copy code
import prefect
import time
from prefect import task, Flow
from prefect.storage import GCS
from prefect.run_configs import DockerRun


@task
def say_hello():
    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>("Hello, Cloud!")
    time.sleep(200)


with Flow(
    "hello-flow",
    storage=GCS(bucket="prefect-flows-poc2021"),
    run_config=DockerRun(image="prefecthq/prefect"),
) as flow:
    say_hello()

flow.register(project_name="tutorial")
With the output
Copy code
[2021-05-10 23:53:07,830] DEBUG - Docker Agent | {'status': 'Pull complete', 'progressDetail': {}, 'id': '08f8340aa311'}
[2021-05-10 23:53:07,853] DEBUG - Docker Agent | {'status': 'Digest: sha256:79a59032175275a19ede749ce1512b2fafc59a6e6b105d38ef074a0ce6c4332f'}
[2021-05-10 23:53:07,863] DEBUG - Docker Agent | {'status': 'Status: Downloaded newer image for prefecthq/prefect:latest'}
[2021-05-10 23:53:07,868] INFO - Docker Agent | Successfully pulled image prefecthq/prefect...
[2021-05-10 23:53:07,868] DEBUG - Docker Agent | Creating Docker container prefecthq/prefect
[2021-05-10 23:53:08,013] DEBUG - Docker Agent | Running agent heartbeat...
[2021-05-10 23:53:08,016] DEBUG - Docker Agent | Sleeping heartbeat for 60.0 seconds
[2021-05-10 23:53:08,197] DEBUG - Docker Agent | Querying for flow runs
[2021-05-10 23:53:08,280] DEBUG - Docker Agent | No flow runs found
[2021-05-10 23:53:08,280] DEBUG - Docker Agent | Next query for flow runs in 8.0 seconds
[2021-05-10 23:53:10,941] DEBUG - Docker Agent | Starting Docker container with ID b7eb8f2aa76b2d9e55db42b4bd1dc0ad6c0c366f6d5ff42a8cce5170e8749b03
[2021-05-10 23:53:11,254] DEBUG - Docker Agent | Docker container b7eb8f2aa76b2d9e55db42b4bd1dc0ad6c0c366f6d5ff42a8cce5170e8749b03 started
[2021-05-10 23:53:11,329] DEBUG - Docker Agent | Completed flow run submission (id: c355af75-91cc-4544-bada-59f6c07f6b0f)
πŸ‘ 1
Actually adding
--show-flow-logs
shows me that
Copy code
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 465, in _request
    json_resp = response.json()
  File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 900, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/prefect", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 49, in flow_run
    result = client.graphql(query)
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 304, in graphql
    retry_on_api_error=retry_on_api_error,
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 220, in post
    retry_on_api_error=retry_on_api_error,
  File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 471, in _request
    ) from exc
prefect.utilities.exceptions.ClientError: Malformed response received from Cloud - please ensure that you have an API token properly configured.
πŸ€”
k
Oh I see. Where did you get the token for the agent?
I have the Google Cloud Storage with the Prefect image working on my end. Will ping you tom.
j
Yes, I had issue on AKS and I was able to resolved the issue. I generated a new runner token using cli (as API token are deprecated) and used that token to register the kube agent on AKS. @Kevin Kho
@Kevin Kho I am suspecting the issue relates to the API tokens - All token were to migrate to Service Accounts. I have not tested runnings flow using new service accounts.
@Damien Ramunno-Johnson Try generating a new runner API token (not service account) using cli and check if you still have issue.
πŸ‘ 1
d
Thanks, it looked like passing in the token via
--token
and not the environment variable fixed it πŸ€” Probably also works is I have used the config. But was easier to start to figure it out when I could see the error surfaced. Thanks for looking into it. Been working out the POC so poking around a fair amount.
πŸ‘ 1
k
Thanks for that log @Damien Ramunno-Johnson. We had a recent change to authentication and that’s good to know.
And thanks for the details @Jay Shah!