I set up a deployment on ACI, and now I can run it...
# prefect-azure
p
I set up a deployment on ACI, and now I can run it with
run_deployment
or via UI. I wanted to change the default instance location from "eastus" to something else with
run_deployment(..., job_variables={"location": "eastus2"})
but the instance is still created at "eastus". Is it possible to change the location on the run level at all? Or can I only change this on the deployment level?
Ok, so I'm starting to understand this a little better now, the
job_parameters
can only be used to set the predefined variables from the job template, defined for a given pool. In the default ACI pool there is no location variable and hence this gets ignored. So I added the variable to the template, updated the pool and now I can change the location on the deployment level via
.deploy(..., job_parameters={"location": ...})
. So this works. However, I still can't alter the location on per-run basis...
it seems that other parameters, like cpu or memory are also ignored when set in
run_deployment(..., job_variables={...})
or via UI.
could this be a bug in ACI push pool? when I tried a similar experiment with a local docker pool all job_variables from run_deployment were correctly incorporated. I used a silly example where I tried to change the command in run_deployment like so
Copy code
if __name__ == "__main__":
    test_flow.deploy(
        name="test",
        work_pool_name="local-docker-pool",
        image="my-image",
        push=False,
    )
    run_deployment(
        name="test-flow/test",
        job_variables={
            "command": "echo 'A custom command!'",
            "env": {
                "TEST": "Hello, World!",
            },
        },
    )
and the worker printed "A custom command!", which would be the expected behavior. More so, when I tried to use an incompatible data type for these fields the client returned an error on the run_deployment call
Copy code
Response: {'detail': "Error creating flow run: Validation failed for field 'command'. Failure reason: ['echo', 'A custom command!'] is not valid under any of the given schemas"}
None of this happens when passing inconsistent arguments types for ACI push pool.
So there must be something wrong with the push pool, I started a normal ACI pool (not the push one), with a dedicated ACI worker and it all works just fine...
h
We'd appreciate some control over ACI region in Azure too
p
What worked for me was to implement a custom prefect worker that is ACI region specific. I then created one worker per region, each attached to the same ACI work pool. This way I could specify how many instances each worker supports and the workers would just poll the right number of flow runs from the server. The worker implementation is a bit hacky, I couldn't add more options so I'm pulling region name from worker name😞.
Copy code
class AzureContainerWorkerRegion(AzureContainerWorker):
    type: str = "azure-container-instance-region"
    def __init__(
        self,
        *args,
        name: Optional[str] = None,
        limit: Optional[int] = None,
        **kwargs,
    ):
        if name is None:
            raise ValueError("A name must be provided for the worker")
        if "-" not in name:
            raise ValueError(
                "A region must be provided as part of the name with [base-name]-[region]"
            )
        self._region = name.split("-")[-1]
        if self._region not in VALID_REGIONS:
            raise ValueError(
                f"Region {self._region} is not a valid Azure region. "
                f"Valid regions are {', '.join(VALID_REGIONS)}."
            )
        if limit is None:
            raise ValueError("A limit must be provided for the worker.")
        super().__init__(*args, name=name, limit=limit, **kwargs)

    # inject the specified region into the job template
    async def _get_configuration(
        self,
        flow_run: "FlowRun",
    ) -> BaseJobConfiguration:
        configuration = await super()._get_configuration(flow_run=flow_run)
        configuration.arm_template["resources"][0]["location"] = self._region
        <http://self._logger.info|self._logger.info>(f"Configured container group with location {self._region}")
        pprint.pp(configuration)
        return configuration
Then you just create one worker per region.
This is far from ideal, but was good enough to get me started computing on multi-region ACI. We used this set up to run jobs on ~35 regions. But we don't care which region runs which job, if you do you'd probably have to work with the job_variables.
Oh, and I set up the workers to run on an AKS cluster.
h
Thanks