Hi all we re testing out running prefect server with an ECS Prefect Community #prefect-server

Hi all, we're testing out running prefect server w...

Ryan Smith

08/25/2021, 7:00 AM

Hi all, we're testing out running prefect server with an ECS Agent and Docker Containers for flow storage. Everything is working great so long as we use the LocalExecutor or LocalDaskExecutor for our flows. However, when we try to use the DaskExecutor to launch a temporary Fargate dask cluster, the worker nodes on that temporary dask cluster start complaining about not being authenticated with the cloud service. Any ideas about what is going wrong here? Has anyone else attempted this kind of setup before?

Ryan Smith

08/25/2021, 7:00 AM

Full error message located in the dask worker task cloudwatch log is: ERROR - prefect.CloudTaskRunner | Failed to retrieve task state with error: AuthorizationError('Malformed response received from Cloud - please ensure that you are authenticated. See

prefect auth login --help

.')ERROR - prefect.CloudTaskRunner | Failed to retrieve task state with error: AuthorizationError('Malformed response received from Cloud - please ensure that you are authenticated. See

prefect auth login --help

.') prefect.exceptions.AuthorizationError: Malformed response received from Cloud - please ensure that you are authenticated. See

prefect auth login --help

Kevin Kho

08/25/2021, 1:46 PM

Hi @Ryan Smith, are you using Prefect Server or Cloud, and what version of Prefect are you on?

Ryan Smith

08/25/2021, 3:12 PM

Hi @Kevin Kho. Using Prefect Server, with version 0.15.0 of the docker containers. Rest of the parts are working fine, this seems like the error I hit earlier where I forgot to run

prefect backend server

, but I'm not really sure where I should be looking for that since DaskExecutor takes care of spinning itself up for the most part. Thinking maybe I need to use a custom Docker image as the base image when I'm building the flow storage (currently just referencing

prefecthq/prefect:0.15.0

), and then I could "bake" in a call to

prefect backend server

in there? Let me know if this feels like the right track or if there might be an easier way.

Kevin Kho

08/25/2021, 3:16 PM

Yes you are on the right track that the workers are being automatically configured to point to Cloud and you need it to point to server. I think you have to do

prefect backend server

and then I think you just need the

config.toml

in that image that points to the right API.

Copy code

[server]
endpoint = "http://<YOUR_VM_IP>:4200/graphql/"

or maybe you can set the environment variable

Copy code

PREFECT__SERVER__ENDPOINT

Ryan Smith

08/25/2021, 3:19 PM

@Kevin Kho, Okay, going to give that a try. Does it make sense that I didn't need to do this in order to get things working when I'm not using the DaskExecutor? Because everything is working fine when I just rely on either LocalExecutor

Kevin Kho

08/25/2021, 3:22 PM

Yes because LocalDaskExecutor and LocalExecutor use the configuration of your local machine. DaskExecutor doesn’t so it will need to be configured. But what I am not sure about is if it’s expected for the agent configuration to propagate to the Dask workers. If you use

LocalRun

, I think env variables are carried over but it if not for the other RunConfigs. Yes this is a common issue though where even if you’re able to send work to the Dask cluster somehow, they won;’t be able to update the state of the tasks if they don’t point to the server correctly.

Ryan Smith

08/25/2021, 3:27 PM

Also, probably not related, but we're currently stuck on version 0.15.0 because when I try to build a docker image on anything newer than that, I get the following error when installing

dask-cloudprovider[aws]

via pip:

Copy code

ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

boto3 1.18.23 requires botocore<1.22.0,>=1.21.23, but you'll have botocore 1.20.106 which is incompatible.

I tried running the image regardless, but it fails to create the DaskCluster complaining something about aiobotocore being used improperly, so clearly there is a real incompatibility somewhere.

Kevin Kho

08/25/2021, 3:32 PM

If you are doing this since last Thursday,

aiobotocore

has a release to 1.4 that broke multiple Dask related things and you would need to go down to 1.3.3 for that.

Ryan Smith

08/25/2021, 4:14 PM

@Kevin Kho okay great, building our flows from a custom base image that calls

prefect backend server

and sets the correct ENV var for our server did the trick!

👍 1

Ryan Smith

08/25/2021, 4:16 PM

Also, looks like if I explicitly install

aiobotocore==1.3.3

and

boto3==1.17.106

, then I can get pip install to work against the prebuilt

0.15.4

prefect image. My only concern is that the base 0.15.4 prefect image came preinstalled with 1.18.XX of

boto3

, do you think I'm going to hit any weird issues by downgrading boto3 as well?

Kevin Kho

08/25/2021, 4:19 PM

I don’t expect you to run into problems and pip itself is not a dependency version resolver so you will likely need to manage those dependencies yourself. For more complex dependency resolution, you’ll need something like conda.

👍 1

Vincent

08/26/2021, 1:14 PM

I have been encountering a similar issue such where a version bump from 0.14.22 to 0.15.0 within the docker image alters the behavior of the worker authorization. In 0.14.22, the

PREFECT__BACKEND

and

PREFECT__SERVER__HOST

configurations from the agent are passed successfully to the dask-scheduler and dask-workers but in 0.15.0+, the workers no longer inherit this specification from the agent. I have been able to find a work around by using said environment variables in the docker image, but preferably I would like for the agent to pass this info down to the workers. Any suggestions?

Kevin Kho

08/26/2021, 2:01 PM

Hey @Vincent, what type of agent are you using? I think only Local agent passes env vars. Will bring this up to the team though

Vincent

08/26/2021, 2:01 PM

I am using a kubernetes agent

Kevin Kho

08/26/2021, 2:03 PM

Did you use that before 0.15.0 and the env vars passed from the agent?

Vincent

08/26/2021, 2:08 PM

I don't see environment variables set on the docker pod, but these pods mysteriously have some knowledge of which server to ping home to, and this changed in between 0.14.22 and 0.15.0

👍 1

Zanie

08/26/2021, 4:46 PM

Hey Vincent, I'm not sure what caused this change in behavior but I'd love to restore passing these settings to the workers 🙂 if you help track it down we can get a fix in quickly

Vincent

08/26/2021, 4:50 PM

Yes - I am still investigating these root causes but the 15.0 bump introduced many changes. How were credentials passed to the workers prior to 15.0? (what files should I focus on) Are these passed via the task, or are they present before any work gets started.

Zanie

08/26/2021, 4:54 PM

I think we pass the

context

(which contains the populated

settings

object) to the

TaskRunner.run

method which is submitted to the dask workers. This means that settings should be loaded from the

context.settings

instead of

prefect.settings

where they need to be respected on workers. I suspect that this is a result of my changes in the

Client

as I have no knowledge of settings environment variables on dask workers (although it could certainly be happening somewhere).

Vincent

08/27/2021, 5:26 PM

Okay. I found the issue! It turns out that there is a missing

self

at line 156 for

api_server

https://github.com/PrefectHQ/prefect/blob/master/src/prefect/client/client.py#L156 Small bug, but it works now!

Zanie

08/27/2021, 5:54 PM

Wonderful! Want to PR?

Vincent

08/27/2021, 5:55 PM

https://github.com/PrefectHQ/prefect/pull/4914

🙌 1

Open in Slack

Previous Next