Something I am struggling to get my head around is...
# prefect-community
e
Something I am struggling to get my head around is security in regards to
FargateCluster
. It seems like the cluster is being assigned a public IP address. Ideally I wouldn’t want that to be the case as I don’t want people snooping on my cluster / submitting jobs. However, when I pass
"fargate_private_ip": True
to
cluster_kwargs
my cluster fails to start with the error:
Cluster failed to start: Timed out trying to connect to <tcp://10.0.1.111:8786> after 30 s
That makes sense. Someone somewhere failed to connect to a local IP address, presumably from outside the subnet. What I don’t understand is how I can prevent people from arbitrarily accessing my cluster from the internet whilst allowing all the ‘right’ traffic through
One thing that does give me a little confidence is when I try to access the public IP of the scheduler on the dashboard port (:8787) it times out. But I’m not sure that’s luck more than security Nope, can connect
k
You can whitelist IPs on the FargateCluster side right? I think in general though this would be a better question for the Dask Discourse. I don’t know on this one, but I am interested.
They seem to put the security groups in AWS here
a
What I don’t understand is how I can prevent people from arbitrarily accessing my cluster from the internet whilst allowing all the ‘right’ traffic through
Only authorized IAM users/processes would be able to submit ECS tasks to your cluster - having access to your subnet wouldn't be enough to submit an ECS task. Then, only ECS tasks with a valid execution role ARN (one that has a trust policy with Action
"sts:AssumeRole"
) are able to start any container - without a valid role, even if a task would get submitted to ECS, no container within this ECS task could be started. Here is how you can configure that with FargateCluster (full example here):
Copy code
def fargate_cluster(
    n_workers=2, image: str = "annageller/prefect-dask-cloudprovider:latest"
):
    return FargateCluster(
        n_workers=n_workers,
        image=image,
        execution_role_arn=f"arn:aws:iam::{AWS_ACCOUNT_ID}:role/prefectECSAgentTaskExecutionRole",
    )  


with Flow(
    FLOW_NAME,
    storage=STORAGE,
    run_config=RUN_CONFIG,
    executor=DaskExecutor(
        cluster_class=fargate_cluster,
        cluster_kwargs={
            "image": "annageller/prefect-dask-cloudprovider:latest",
            "n_workers": 2,
        },
        debug=True,
    ),
) as flow:
    ...
e
Thank you Anna, that is very interesting (particularly since I spent all day today trying to get this working in a private subnet).
Only authorized IAM users/processes would be able to submit ECS tasks to your cluster - having access to your subnet wouldn’t be enough to submit an ECS task.
I am just testing this now by trying to connect to a running cluster with one set of authorised AWS creds and another set of unauthorised creds. Is there anywhere where I can read about this more / is there a reference to the Dask docs which I could include in my PR? I can imagine it will raise a few eyebrows if I submit a PR which looks like it exposes a cluster to the internet without justification
a
This is really cool, I would be curious to hear what you found out. Sure, here are some links you can reference: • DaskECS
e
I’m not sure whether this has worked. My test isn’t exactly thoroughly scientific, but thought you might be interested. My testing code:
Copy code
from dask_cloudprovider.aws import FargateCluster
kwargs = {
    "image": "my-image-containing-dask",
    "task_role_arn": "arn:aws:iam::xxxx",
    "execution_role_arn": "arn:aws:iam::xxxx",
    "n_workers": 10,
    "region_name": "ap-southeast-2"
}
cluster = FargateCluster(**kwargs)

print(cluster.scheduler_address)
# use breakpoint so we can easily close cluster when we're done
breakpoint()
Once I have the scheduler’s address I switch my AWS profile to another AWS account without perms and run the flow below using Prefect to connect to the existing cluster:
Copy code
import prefect
from prefect import task, Flow
from prefect.executors import LocalDaskExecutor, DaskExecutor


@task
def test_task():
    print("running")
    return 2 + 2


with Flow(
    "test_flow",
) as flow:
    test_task()


flow.executor = DaskExecutor(address="<ip-address>")

flow.run()
👍 1
Clearly the cluster receives a connection request, but it’s unclear to me whether it fails or not.
Sorry the terminal screenshot is noisy, but you can see
RuntimeWarning: coroutine 'rpc.close_rpc' was never awaited scheduler_comm.close_rpc()
Not sure if that means much to you, but it seems to me like maybe the scheduler closed the connection without running the job?
a
wow, this is so interesting! to me, it looks like you were totally able to connect to the cluster from another process, this RPC is just a warning that it couldn't run something async, but the connection was successful. So in theory, if someone knows the public IP of the cluster, they may submit something to it. But given that the FargateCluster is spun up on-demand and is only available for the duration of flow runtime (then it's shut down), the risk of someone submitting some job to your cluster is not very high, but it's not impossible. So this is really a question of how security- and risk-averse you are. You could lock down your cluster with a private IP using
Copy code
fargate_use_private_ip=True
but I can't help with the details about that - this is something you could ask here: https://dask.discourse.group/
e
Yeh you are protected by moving target defence a bit. As it turns out I’m clearly risk averse enough to spend my Friday night getting everything working in a private subnet 😛 For anyone who has the same problem you need to make sure the security group for both the agent and the cluster are the same and allow traffic between instances in the private subnet (this took me a long time to debug). If I’m feeling motivated enough I might put together a wee github repo with some IaaC to show how it works because it feels like a very common use-case
😆 1
a
thanks so much for sharing! not even IaC, already sharing a Gist or a description here via slack is already helpful
I guess the security group is what you attach to run task kwargs as networkConfiguration?
e
Exactly right. Then all you need to do is specify the
security_groups
and
subnet_ids
in your
cluster_kwargs
🙌 1
122 Views