Nelson Griffiths

    Nelson Griffiths

    1 year ago
    I am running into some issues deploying Docker Containers that I am not quite sure how to debug. I am using bitbucket as my flow storage and Docker as my run environment. I try to run the flow through my Docker Agent and it successfully pulls down the docker image and then does nothing else for a long time until it seems to fail silently and a Lazarus process restarts it. Any ideas on where to start looking to figure out the issue? The logs aren't any help and I can't tell why it is failing.
    m

    Mariia Kerimova

    1 year ago
    Hello Nelson! Could you try setting
    PREFECT__LOGGING__LEVEL=DEBUG
    on your agent? Or can you share a redacted flow you are trying to run?
    j

    Jay Shah

    1 year ago
    I have similar issue for kubernetes run. The jobs are submitted for execution but nothing happens.
    Kevin Kho

    Kevin Kho

    1 year ago
    Hi @Jay Shah, did you resolve your issue? Saw you found the thread with AKS.
    Nelson Griffiths

    Nelson Griffiths

    1 year ago
    Sorry for the late reply. With logging levels on debug I still don't see much output.
    2021-05-10 14:29:21,578] DEBUG - agent | {'status': 'Pulling from drinvest/falcon-test', 'id': 'latest'}
    [2021-05-10 14:29:21,584] DEBUG - agent | {'status': 'Digest: sha256:4d1e400e423496a0a12baf7bf17ffd478f2f941761ab2e52b2ae0b5c54fdbe67'}
    [2021-05-10 14:29:21,591] DEBUG - agent | {'status': 'Status: Downloaded newer image for drinvest/falcon-test:latest'}
    [2021-05-10 14:29:21,597] INFO - agent | Successfully pulled image drinvest/falcon-test:latest...
    [2021-05-10 14:29:21,597] DEBUG - agent | Creating Docker container drinvest/falcon-test:latest
    [2021-05-10 14:29:21,678] DEBUG - agent | Starting Docker container with ID 860fe847ccc4f76f296cc07b773846fe583777b40528c4f7cf0da2cf2b3d0d3f
    [2021-05-10 14:29:22,037] DEBUG - agent | Docker container 860fe847ccc4f76f296cc07b773846fe583777b40528c4f7cf0da2cf2b3d0d3f started
    [2021-05-10 14:29:22,205] DEBUG - agent | Querying for flow runs
    [2021-05-10 14:29:22,244] DEBUG - agent | Completed flow run submission (id: f18d8593-8b85-43e9-bf1d-533abc017539)
    It does this and then it dies again. I am happy to share my flow and Dockerfile if that would help? @George Coyne wondering if you have any suggestions for me?
    g

    George Coyne

    1 year ago
    @Nelson Griffiths Hit me on DM we can take a look
    Nelson Griffiths

    Nelson Griffiths

    1 year ago
    I have recreated a more basic version of my flow and Dockerfile and it seems like the Docker Container is failing on pulling my flow from bitbucket cloud. The local run_config pulls it successfully but when run in a Docker environment it seems to look in Bitbucket server instead of bitbucket cloud. Wondering if there are any suggestions on how I could point my Docker container to Bitbucket cloud?
    Kevin Kho

    Kevin Kho

    1 year ago
    Hi @Nelson Griffiths! You mean from Bitbucket Storage? Have you seen the keyword arguments for Cloud? what version of Prefect are you on?
    This change was in 0.14.16 -> changelog
    Nelson Griffiths

    Nelson Griffiths

    1 year ago
    So I am on
    0.14.17
    it seems like the most recent dockerfiles for prefect might not support bitbucket cloud yet. I am successfully pulling from bitbucket cloud when running a local agent, just not a docker agent. So I think I may have got around that problem by specifiying prefect[bitbucket] in my requirements.txt and forcing a reinstall with pip in the dockerfile
    I seemed to fix the issue. I was using a pytorch base image before. I believe that prefect might not like pulling down a 4GB docker image and that was causing the issues? With a smaller docker image it is working great. Thanks for all the help!
    Kevin Kho

    Kevin Kho

    1 year ago
    Glad you fixed it! πŸ™‚
    Damien Ramunno-Johnson

    Damien Ramunno-Johnson

    1 year ago
    I seem to be having a similar issue and I wonder how close it is.
    [2021-05-10 23:31:51,705] DEBUG - Docker Agent | {'status': 'Pulling from sq-kp-infra-prod/prefect', 'id': 'latest'}
    [2021-05-10 23:31:51,710] DEBUG - Docker Agent | {'status': 'Digest: sha256:1922df5b7a244613be41225cd6ac2f4c10532476698d935ae5e083468013eb3b'}
    [2021-05-10 23:31:51,710] DEBUG - Docker Agent | {'status': 'Status: Image is up to date for <http://gcr.io/sq-kp-infra-prod/prefect:latest'|gcr.io/sq-kp-infra-prod/prefect:latest'>}
    [2021-05-10 23:31:51,716] INFO - Docker Agent | Successfully pulled image ***...
    [2021-05-10 23:31:51,716] DEBUG - Docker Agent | Creating Docker container ***
    [2021-05-10 23:31:51,778] DEBUG - Docker Agent | Starting Docker container with ID 3c217898d484b0a2b29ae61b83279014d5a48f154f800f936687972c7524c25b
    [2021-05-10 23:31:52,086] DEBUG - Docker Agent | Docker container 3c217898d484b0a2b29ae61b83279014d5a48f154f800f936687972c7524c25b started
    [2021-05-10 23:31:52,208] DEBUG - Docker Agent | Completed flow run submission (id: f8ca3883-71dc-4c43-a182-cb9878d12457)
    When I do
    docker ps -a
    I never actually find any running or stopped containers.
    import prefect
    import time
    from prefect import task, Flow
    from prefect.storage import GCS
    from prefect.run_configs import DockerRun
    
    @task
    def say_hello():
        logger = prefect.context.get("logger")
        <http://logger.info|logger.info>("Hello, Cloud!")
        time.sleep(200)
    
    with Flow("hello-flow", storage=GCS(bucket="***")) as flow:
        say_hello()
    
    
    flow.run_config = DockerRun(image="***")
    
    flow.register(project_name="tutorial")
    Kevin Kho

    Kevin Kho

    1 year ago
    Hi @Damien Ramunno-Johnson , I’ll try to replicate this later.
    Damien Ramunno-Johnson

    Damien Ramunno-Johnson

    1 year ago
    To make it easier I switched to the base prefect image
    import prefect
    import time
    from prefect import task, Flow
    from prefect.storage import GCS
    from prefect.run_configs import DockerRun
    
    
    @task
    def say_hello():
        logger = prefect.context.get("logger")
        <http://logger.info|logger.info>("Hello, Cloud!")
        time.sleep(200)
    
    
    with Flow(
        "hello-flow",
        storage=GCS(bucket="prefect-flows-poc2021"),
        run_config=DockerRun(image="prefecthq/prefect"),
    ) as flow:
        say_hello()
    
    flow.register(project_name="tutorial")
    With the output
    [2021-05-10 23:53:07,830] DEBUG - Docker Agent | {'status': 'Pull complete', 'progressDetail': {}, 'id': '08f8340aa311'}
    [2021-05-10 23:53:07,853] DEBUG - Docker Agent | {'status': 'Digest: sha256:79a59032175275a19ede749ce1512b2fafc59a6e6b105d38ef074a0ce6c4332f'}
    [2021-05-10 23:53:07,863] DEBUG - Docker Agent | {'status': 'Status: Downloaded newer image for prefecthq/prefect:latest'}
    [2021-05-10 23:53:07,868] INFO - Docker Agent | Successfully pulled image prefecthq/prefect...
    [2021-05-10 23:53:07,868] DEBUG - Docker Agent | Creating Docker container prefecthq/prefect
    [2021-05-10 23:53:08,013] DEBUG - Docker Agent | Running agent heartbeat...
    [2021-05-10 23:53:08,016] DEBUG - Docker Agent | Sleeping heartbeat for 60.0 seconds
    [2021-05-10 23:53:08,197] DEBUG - Docker Agent | Querying for flow runs
    [2021-05-10 23:53:08,280] DEBUG - Docker Agent | No flow runs found
    [2021-05-10 23:53:08,280] DEBUG - Docker Agent | Next query for flow runs in 8.0 seconds
    [2021-05-10 23:53:10,941] DEBUG - Docker Agent | Starting Docker container with ID b7eb8f2aa76b2d9e55db42b4bd1dc0ad6c0c366f6d5ff42a8cce5170e8749b03
    [2021-05-10 23:53:11,254] DEBUG - Docker Agent | Docker container b7eb8f2aa76b2d9e55db42b4bd1dc0ad6c0c366f6d5ff42a8cce5170e8749b03 started
    [2021-05-10 23:53:11,329] DEBUG - Docker Agent | Completed flow run submission (id: c355af75-91cc-4544-bada-59f6c07f6b0f)
    Actually adding
    --show-flow-logs
    shows me that
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 465, in _request
        json_resp = response.json()
      File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 900, in json
        return complexjson.loads(self.text, **kwargs)
      File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
        return _default_decoder.decode(s)
      File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/usr/local/lib/python3.7/json/decoder.py", line 355, in raw_decode
        raise JSONDecodeError("Expecting value", s, err.value) from None
    json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/usr/local/bin/prefect", line 8, in <module>
        sys.exit(cli())
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python3.7/site-packages/prefect/cli/execute.py", line 49, in flow_run
        result = client.graphql(query)
      File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 304, in graphql
        retry_on_api_error=retry_on_api_error,
      File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 220, in post
        retry_on_api_error=retry_on_api_error,
      File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 471, in _request
        ) from exc
    prefect.utilities.exceptions.ClientError: Malformed response received from Cloud - please ensure that you have an API token properly configured.
    πŸ€”
    Kevin Kho

    Kevin Kho

    1 year ago
    Oh I see. Where did you get the token for the agent?
    I have the Google Cloud Storage with the Prefect image working on my end. Will ping you tom.
    j

    Jay Shah

    1 year ago
    Yes, I had issue on AKS and I was able to resolved the issue. I generated a new runner token using cli (as API token are deprecated) and used that token to register the kube agent on AKS. @Kevin Kho
    @Kevin Kho I am suspecting the issue relates to the API tokens - All token were to migrate to Service Accounts. I have not tested runnings flow using new service accounts.
    @Damien Ramunno-Johnson Try generating a new runner API token (not service account) using cli and check if you still have issue.
    Damien Ramunno-Johnson

    Damien Ramunno-Johnson

    1 year ago
    Thanks, it looked like passing in the token via
    --token
    and not the environment variable fixed it πŸ€” Probably also works is I have used the config. But was easier to start to figure it out when I could see the error surfaced. Thanks for looking into it. Been working out the POC so poking around a fair amount.
    Kevin Kho

    Kevin Kho

    1 year ago
    Thanks for that log @Damien Ramunno-Johnson. We had a recent change to authentication and that’s good to know.
    And thanks for the details @Jay Shah!