Thread
#prefect-server
    Ryan Smith

    Ryan Smith

    1 year ago
    Hi all, we're testing out running prefect server with an ECS Agent and Docker Containers for flow storage. Everything is working great so long as we use the LocalExecutor or LocalDaskExecutor for our flows. However, when we try to use the DaskExecutor to launch a temporary Fargate dask cluster, the worker nodes on that temporary dask cluster start complaining about not being authenticated with the cloud service. Any ideas about what is going wrong here? Has anyone else attempted this kind of setup before?
    Full error message located in the dask worker task cloudwatch log is: ERROR - prefect.CloudTaskRunner | Failed to retrieve task state with error: AuthorizationError('Malformed response received from Cloud - please ensure that you are authenticated. See
    prefect auth login --help
    .')ERROR - prefect.CloudTaskRunner | Failed to retrieve task state with error: AuthorizationError('Malformed response received from Cloud - please ensure that you are authenticated. See
    prefect auth login --help
    .') prefect.exceptions.AuthorizationError: Malformed response received from Cloud - please ensure that you are authenticated. See
    prefect auth login --help
    .
    Kevin Kho

    Kevin Kho

    1 year ago
    Hi @Ryan Smith, are you using Prefect Server or Cloud, and what version of Prefect are you on?
    Ryan Smith

    Ryan Smith

    1 year ago
    Hi @Kevin Kho. Using Prefect Server, with version 0.15.0 of the docker containers. Rest of the parts are working fine, this seems like the error I hit earlier where I forgot to run
    prefect backend server
    , but I'm not really sure where I should be looking for that since DaskExecutor takes care of spinning itself up for the most part. Thinking maybe I need to use a custom Docker image as the base image when I'm building the flow storage (currently just referencing
    prefecthq/prefect:0.15.0
    ), and then I could "bake" in a call to
    prefect backend server
    in there? Let me know if this feels like the right track or if there might be an easier way.
    Kevin Kho

    Kevin Kho

    1 year ago
    Yes you are on the right track that the workers are being automatically configured to point to Cloud and you need it to point to server. I think you have to do
    prefect backend server
    and then I think you just need the
    config.toml
    in that image that points to the right API.
    [server]
    endpoint = "http://<YOUR_VM_IP>:4200/graphql/"
    or maybe you can set the environment variable
    PREFECT__SERVER__ENDPOINT
    Ryan Smith

    Ryan Smith

    1 year ago
    @Kevin Kho, Okay, going to give that a try. Does it make sense that I didn't need to do this in order to get things working when I'm not using the DaskExecutor? Because everything is working fine when I just rely on either LocalExecutor
    Kevin Kho

    Kevin Kho

    1 year ago
    Yes because LocalDaskExecutor and LocalExecutor use the configuration of your local machine. DaskExecutor doesn’t so it will need to be configured. But what I am not sure about is if it’s expected for the agent configuration to propagate to the Dask workers. If you use
    LocalRun
    , I think env variables are carried over but it if not for the other RunConfigs. Yes this is a common issue though where even if you’re able to send work to the Dask cluster somehow, they won;’t be able to update the state of the tasks if they don’t point to the server correctly.
    Ryan Smith

    Ryan Smith

    1 year ago
    Also, probably not related, but we're currently stuck on version 0.15.0 because when I try to build a docker image on anything newer than that, I get the following error when installing
    dask-cloudprovider[aws]
    via pip:
    ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.
    
    We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
    
    boto3 1.18.23 requires botocore<1.22.0,>=1.21.23, but you'll have botocore 1.20.106 which is incompatible.
    I tried running the image regardless, but it fails to create the DaskCluster complaining something about aiobotocore being used improperly, so clearly there is a real incompatibility somewhere.
    Kevin Kho

    Kevin Kho

    1 year ago
    If you are doing this since last Thursday,
    aiobotocore
    has a release to 1.4 that broke multiple Dask related things and you would need to go down to 1.3.3 for that.
    Ryan Smith

    Ryan Smith

    1 year ago
    @Kevin Kho okay great, building our flows from a custom base image that calls
    prefect backend server
    and sets the correct ENV var for our server did the trick!
    Also, looks like if I explicitly install
    aiobotocore==1.3.3
    and
    boto3==1.17.106
    , then I can get pip install to work against the prebuilt
    0.15.4
    prefect image. My only concern is that the base 0.15.4 prefect image came preinstalled with 1.18.XX of
    boto3
    , do you think I'm going to hit any weird issues by downgrading boto3 as well?
    Kevin Kho

    Kevin Kho

    1 year ago
    I don’t expect you to run into problems and pip itself is not a dependency version resolver so you will likely need to manage those dependencies yourself. For more complex dependency resolution, you’ll need something like conda.
    v

    Vincent

    1 year ago
    I have been encountering a similar issue such where a version bump from 0.14.22 to 0.15.0 within the docker image alters the behavior of the worker authorization. In 0.14.22, the
    PREFECT__BACKEND
    and
    PREFECT__SERVER__HOST
    configurations from the agent are passed successfully to the dask-scheduler and dask-workers but in 0.15.0+, the workers no longer inherit this specification from the agent. I have been able to find a work around by using said environment variables in the docker image, but preferably I would like for the agent to pass this info down to the workers. Any suggestions?
    Kevin Kho

    Kevin Kho

    1 year ago
    Hey @Vincent, what type of agent are you using? I think only Local agent passes env vars. Will bring this up to the team though
    v

    Vincent

    1 year ago
    I am using a kubernetes agent
    Kevin Kho

    Kevin Kho

    1 year ago
    Did you use that before 0.15.0 and the env vars passed from the agent?
    v

    Vincent

    1 year ago
    I don't see environment variables set on the docker pod, but these pods mysteriously have some knowledge of which server to ping home to, and this changed in between 0.14.22 and 0.15.0
    Michael Adkins

    Michael Adkins

    1 year ago
    Hey Vincent, I'm not sure what caused this change in behavior but I'd love to restore passing these settings to the workers 🙂 if you help track it down we can get a fix in quickly
    v

    Vincent

    1 year ago
    Yes - I am still investigating these root causes but the 15.0 bump introduced many changes. How were credentials passed to the workers prior to 15.0? (what files should I focus on) Are these passed via the task, or are they present before any work gets started.
    Michael Adkins

    Michael Adkins

    1 year ago
    I think we pass the
    context
    (which contains the populated
    settings
    object) to the
    TaskRunner.run
    method which is submitted to the dask workers. This means that settings should be loaded from the
    context.settings
    instead of
    prefect.settings
    where they need to be respected on workers. I suspect that this is a result of my changes in the
    Client
    as I have no knowledge of settings environment variables on dask workers (although it could certainly be happening somewhere).
    v

    Vincent

    1 year ago
    Okay. I found the issue! It turns out that there is a missing
    self
    at line 156 for
    api_server
    https://github.com/PrefectHQ/prefect/blob/master/src/prefect/client/client.py#L156 Small bug, but it works now!
    Michael Adkins

    Michael Adkins

    1 year ago
    Wonderful! Want to PR?
    v