Thread
#prefect-community
    Keith Veleba

    Keith Veleba

    2 months ago
    Hello! Running into an issue I don't fully understand. I have a series of flows that have been running for a couple of months without incident. We're running all of our production on 1.2.0-python3.8 for the ECS Agent and for all ECSRun instances. Our flows use S3 storage. Nothing has been purposely or visibly changed with our Prefect installation- no new flow deployments, or changes to our AWS infrastructure. Our flows, which do little more than kick off AWS Batch jobs via the BatchSubmit task are now consistently failing with the following error on the BatchSubmit invocation:
    Unexpected error: AttributeError("'S3Result' object has no attribute 'upload_options'")
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/prefect/engine/runner.py", line 48, in inner
        new_state = method(self, state, *args, **kwargs)
      File "/usr/local/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 930, in get_task_run_state
        result = self.result.write(value, **formatting_kwargs)
      File "/usr/local/lib/python3.8/site-packages/prefect/engine/results/s3_result.py", line 89, in write
        ExtraArgs=self.upload_options,
    AttributeError: 'S3Result' object has no attribute 'upload_options'
    Attached is one of the flows that are failing. Does the task running code record execution state back to the storage bucket? Thanks in advance!
    Kevin Kho

    Kevin Kho

    2 months ago
    This looks like a version mismatch issue. This was added in 1.2.3. You are pulling a later image somewhere. Are you using DaskExecutor?
    I guess check agent version, flow registration version, and prefect version in execution container
    Keith Veleba

    Keith Veleba

    2 months ago
    Ok. We are not using dask. Our agent is 1.2.0, and the ECSRun should be pinned at 1.2.0 based on my code. Everything was registered some time ago under 1.2.0.
    The run config in my flow run seems correct?
    {
      "cpu": null,
      "env": null,
      "type": "ECSRun",
      "image": "prefecthq/prefect:latest-python3.8",
      "labels": [],
      "memory": null,
      "__version__": "1.2.0",
      "task_role_arn": null,
      "run_task_kwargs": null,
      "task_definition": null,
      "execution_role_arn": null,
      "task_definition_arn": null,
      "task_definition_path": null
    }
    oh wait
    is that image the problem?
    Kevin Kho

    Kevin Kho

    2 months ago
    Yeah latest might pull 1.2.3
    Keith Veleba

    Keith Veleba

    2 months ago
    so the ECSRun declaration in my code is being ignored?
    ECSRUN_IMAGE="prefecthq/prefect:1.2.0-python3.8"
    
    if __name__ == "__main__":
        flow.run_config = ECSRun(image=ECSRUN_IMAGE)
        flow.register(project_name=PROJECT_NAME, labels=[ENVIRONMENT])
    ran on register
    and the agent seems correct.
    How can I force the 1.2.0?
    Kevin Kho

    Kevin Kho

    2 months ago
    I doubt that would be ignored, but can’t immediately see what would override it to make it that latest
    Can I see the agent definition?
    Keith Veleba

    Keith Veleba

    2 months ago
    [
      {
                "name": "${var.environment}-prefect-ecs-agent",
                "image": "${var.docker_image}",
                "essential": true,
                "command": [
                    "prefect",
                    "agent",
                    "ecs",
                    "start",
                    "--cluster",
                    "${var.ecs_cluster_arn}",
                    "--run-task-kwargs",
                    "<s3://sgmt>-${var.environment}-prefect-flows/run_config_private.yaml",
                    "--task-role-arn",
                    "${aws_iam_role.prefect-task-role.arn}"
                ],
                "environment": [
                    {
                        "name": "PREFECT__CLOUD__AGENT__LABELS",
                        "value": "[${join(",", [for l in var.agent_labels : format("'%s'",l)])}]"
                    },
                    {
                        "name": "PREFECT__CLOUD__AGENT__LEVEL",
                        "value": "DEBUG"
                    },
                    {
                        "name": "PREFECT__CLOUD__API",
                        "value": "<https://api.prefect.io>"
                    },
                    {
                        "name": "OTHER_PREFECT__CONTEXT__SECRETS__AWS_CREDENTIALS",
                        "value": "{\"ACCESS_KEY\":\"${var.access_key}\",\"SECRET_ACCESS_KEY\":\"${var.secret_access_key}\"}"                  
                    }
                ],
                "logConfiguration": {
                    "logDriver": "awslogs",
                    "options": {
                        "awslogs-group": "${aws_cloudwatch_log_group.log-group.name}",
                        "awslogs-region": "${var.region}",
                        "awslogs-stream-prefix": "ecs",
                        "awslogs-create-group": "true"
                    }
                },
                "secrets": [
                    {
                        "name": "PREFECT__CLOUD__API_KEY",
                        "valueFrom": "${var.api_key}"
                    }
                ]
            }
    prefect_docker_image="prefecthq/prefect:1.2.0-python3.8"
    that's from our terraform
    Kevin Kho

    Kevin Kho

    2 months ago
    How did you register? Just
    python myfile.py
    ?
    Can you try pulling the flow and its runconfig from the GraphQL API to see if it’s really not respecting the image when you register?
    Keith Veleba

    Keith Veleba

    2 months ago
    yep, that's how we register. We're going to redeploy everything and make sure we're where we need to be. How can a revision release introduce such a breaking change?
    Kevin Kho

    Kevin Kho

    2 months ago
    I’d need to ask around. I took a look at the PR and am confused why it breaks since a default is set
    I don’t think this is necessarily a breaking change though. The storage here is pickle-based so it’s serialized in 1.2.0 and de-serialized in 1.2.3. The version mismatch of serializing and deserializing will frequently cause issues.
    Keith Veleba

    Keith Veleba

    2 months ago
    I have historically had the understanding that pickling is tied to the Python major version. Once we redeploy we'll make sure all that lines up. Just a little frustrating to track down when things work for several weeks and then kaboom!
    thanks for the assist
    Kevin Kho

    Kevin Kho

    2 months ago
    Thanks for reporting! We do have a special serializer specifically for the S3Result. This would not have happened for other Result classes