Hey folks :wave: Having some difficulties install...
# ask-community
c
Hey folks 👋 Having some difficulties installing
dask-cloudprovider[aws]==2021.3.0
when I have
prefect[aws]==0.14.13
installed. It seems they have different requirements for botocore:
Copy code
There are incompatible versions in the resolved dependencies:
  botocore<1.19.53,>=1.19.52 (from aiobotocore==1.2.2->dask-cloudprovider[aws]==2021.3.0->-r /var/folders/kf/93zlmdv15vz6sjhr2xd0j7y40000gn/T/pipenv_bvj4rpkrequirements/pipenv-1_o8bqwg-constraints.txt (line 6))
  botocore<1.21.0,>=1.20.38 (from boto3==1.17.38->prefect[aws]==0.14.13->-r /var/folders/kf/93zlmdv15vz6sjhr2xd0j7y40000gn/T/pipenv_bvj4rpkrequirements/pipenv-1_o8bqwg-constraints.txt (line 5))
Is there a specific version of
dask-cloudprovider
that
prefect
works with?
👀 2
Also the listed requirements (
prefect, dask, distributed
) does not mention
dask-cloudprovider
either
j
We don't require dask-cloudprovider for prefect, only if you want to use dask-cloudprovider with prefect.
As for the version incompatibility, this looks like
aiobotocore
is pinning
botocore
to a single release, which doesn't work with boto3's pinnings (see https://github.com/aio-libs/aiobotocore/issues/855). For now if you disable the new pip resolver to ignore these issues you can install things together.
This isn't a great solution though.
c
hmmm I'm using
pipenv
so not sure how that plays with things
You don't require it, but the example in the documentation only specifies
dask
and
distributed
when https://docs.prefect.io/orchestration/flow_config/executors.html#using-a-temporary-cluster is explicitly using
dask-cloudprovider
so I think it probably should be listed
j
Ah, gotcha. Yeah, that text could be updated to mention the dependency requirement.
If there's not a way to forward pip flags through pipenv, downgrading pip to 20.2.4 might also work.
Hmmm, actually we're pretty flexible with our boto3 versioning in prefect (supporting back to 1.9), it looks like this might be an issue with the pip resolver not backtracking enough to find the boto3 version that goes with botocore. If you manually find a compatible boto3 version and add it to your install requirements that might help pip out.
I think
boto3=1.16.52
might work for you.
c
consoleoutput.txt
Weird as it appears that
pipenv
tried almost every version of botocore
But I'll try pinning boto3
j
It's moving botocore, but aiobotocore has pinned that. The thing it should be trying is different versions of boto3.
pips resolver is a bit complicated since pypi lacks access to efficient queries for version info (frequently the package has to be downloaded to get this info), so it has to iteratively backtrack. conda can do a much better job here (solving for valid versions upfront) since all version metadata is available via a separate route.
c
Cool pinning to that
boto3
version seems to work. How did you get to that? I'd love to be able to figure that out for the next time
pipenv
bites me in the backside 🤣 In terms of the issue (without pinning boto3) is this something to raise in
aiobotocore
? To bump their
botocore
version?
j
Glad to hear it. I don't really have any tips here. The conflict pip was reporting was between botocore versions between the one
aiobotocore
pins to and the latest version (which no dependency pins to, but pip was using). Looking at the aiobotocore
setup.py
, you can see their pinning for
botocore
, but they also have an optional dep on
boto3
(which I assumed was compatible). https://github.com/aio-libs/aiobotocore/blob/master/setup.py#L23
For the fix, aiobotocore, shouldn't be this strict - exact pinnings aren't friendly for users. The issue I linked above covers that.
c
Cool okay, I'll just add fuel to that issue 🔥 Appreciate all your help!
Oh, sad times. I don't think botocore supports FARGATE with that version 🤣 😭
Copy code
An error occurred (InvalidParameterException) when calling the RunTask operation: Task definition does not support launch_type FARGATE.
j
That's not the issue here, that's an issue we've seen with several user's ECS setups. I haven't been able to reproduce locally.
How did you create your ECS cluster in aws?
c
With the CDK
It can definitely launch Fargate instances, the agent itself is a Fargate instance
j
Oh huh, never heard of a CDK (had to google). What I'm looking for is the description of your cluster. If you have the AWS CLI, this would be the output of
Copy code
aws ecs describe-clusters --clusters <YOUR-CLUSTER-NAME>
c
Oh AWS CDK is lovely. Cloudformation but good
awsclioutput.json
j
Ah, you have no capacity providers. Cool, should be able to debug and work with that.
For now, if you add
FARGATE
to your capacity providers things should work. If you create an ECS cluster using the AWS console this is added automatically for you (this difference has caused issues).
c
Ah okay. I assumed it defaulted to Fargate
I'll give that a go
Stange as not using the DaskExecutor happily let me start up other Fargate tasks with ECSRun
Guess it's different api calls
j
Wait, what? Were you running against the same ECS cluster?
c
Yep
j
That makes no sense, the agent doesn't look at the executor at all.
c
In terms of error or choice 🤣
j
Did the above error message show up preventing your flow from starting? Or after your flow started but before the dask cluster started?
If it's the former, I'm confused. If it's only preventing your
FargateCluster
from being started then that makes sense.
c
That's all I got on the UI
j
baffling.
maybe this is a change in the defaults sent by botocore.
c
The ECS agent is in the cluster, and I'm pointing FargateCluster to the same cluster
j
if things were working before but now aren't
c
If I take out the executor option, leave the prefect default, it runs fine
j
Now?
after updating botocore?
c
Oh, good question
I'll try that.
With or without the FARGATE capacity provider?
Without I guess..
j
The agent never touches the executor, so the execution path in the agent code doesn't change if you set one or don't. Which is why I'm confused why removing it would fix things (it shouldn't change anything).
c
Removing the executor definition also fails
So I think it's botocore that doesn't allow FARGATE at that version
Because before installing
boto3
and
dask-cloudprovider
it ran fine
j
I don't think it's not allowed, I bet it sets some default field in the json blob that fixes things. Can you tell me what versions you had before and now? I'm hoping I can squash this issue for good.
boto3 & botocore versions.
c
Sure, let me rollback to the working versions
🙏 1
Now:
Copy code
boto3==1.16.52
botocore==1.19.52
Before (without an executor specified and without using dask):
Copy code
boto3==1.17.38
botocore==1.20.38
j
And can you confirm with the before (higher) versions things did work successfully?
c
Just registering & running the flow
Sorry, having to re-deploy the agent
j
No worries, thanks for helping to debug this.
c
Haha thanks yourself!
Now I'm confused. Same error.
PREFECT_IMAGE
is just pointing to an ECR image built with:
Copy code
FROM prefecthq/prefect:0.14.13-python3.8
ENTRYPOINT [ "prefect", "agent", "ecs", "start", "--agent-address", "http://:8080"]
Oh hang on, looks like my versions are still wrong locally
Okay. Got a running flow.
I had to change to
image="prefecthq/prefect:0.14.13-python3.8"
in
ECSRun
-
j
Wait what. That also doesn't make sense - the issue you're experiencing should be only specific to what the agent is running.
Anyway, with that working setup, what versions of `boto3`/`botocore` /`prefect` does your agent have running?
c
Two secs. I'm deploying it so that the agent and the ECSRun point to the same image. I think something funky happened with ecr But it should be the versions in
prefecthq/prefect:0.14.13-python3.8
as all I'm doing currently extra in the Dockerfile is adding an entrypoint
Okay the failure im seeing with my rolled back stuff is something f***y going on with ECR caching
But the working flow yesterday I had, with just the ECSRunner, no Executor set, was the
prefecthq/prefect:0.14.13-python3.8
image
For both Agent and ECSRunner image.
j
Sure, but I'm wondering if something else has changed. If you can't reproduce it, I'm skeptical that a change in the code (boto or otherwise) is causing it, as we haven't seen that before.
Anyway, I'll try to reproduce locally and see if we can prevent this issue from happening regardless of versions.
1
c
Cool thanks, we're gonna cut out using CDK to handle Docker images, it's still experimental so I think that's not helping the situation. I'll try get to a state where I can confidently say it's working again. Then break it for you haha