Ismail Cenik

    Ismail Cenik

    1 year ago
    Hello, What is the reason for the following error? Pod prefect-job-9e6e019e-2tpvl failed. Container 'flow' state: terminated Exit Code:: 139 Reason: Error
    Kevin Kho

    Kevin Kho

    1 year ago
    Hey @Ismail Cenik, could you give me more details about this? Was it a flow that was working before? Are there any logs on the flow?
    What was the flow doing?
    Ismail Cenik

    Ismail Cenik

    1 year ago
    b'{"taskmanagers":5,"slots-total":35,"slots-available":3,"jobs-running":1,"jobs-finished":0,"jobs-cancelled":0,"jobs-failed":0,"flink-version":"1.11.1","flink-commit":"DeadD0d0"}'
    The flow runs AWS Kinesis Data Analytics (starts), when they finish their job, then stops applications ... Basically prefect is calling startApplication and stopApplication APIs of Kinesis Data Analytics
    Kevin Kho

    Kevin Kho

    1 year ago
    This looks more like the flow failed. Any logs you see on the CloudWatch side?
    Ismail Cenik

    Ismail Cenik

    1 year ago
    Do you mean that Kinesis fails, Kinesis CloudWatch?
    Kevin Kho

    Kevin Kho

    1 year ago
    Not necessarily but just something in the Flow.
    Is that flink-commit stuff from the Prefect logs?
    Ismail Cenik

    Ismail Cenik

    1 year ago
    Actually, this is not a good example, I showed the printout of one API call in the prefect log
    I will try to find more meaningful logs. But there should be an explanation for "139". Is there any specific meaning?
    Kevin Kho

    Kevin Kho

    1 year ago
    I see. I guess there might be more info in cloud watch hopefully?
    Ismail Cenik

    Ismail Cenik

    1 year ago
    I do not have direct access to the EKS where our agent is running. I will try to reach out.
    Kevin Kho

    Kevin Kho

    1 year ago
    That’s the container error, not on Prefect so pretty much this
    Tyler Wanner

    Tyler Wanner

    1 year ago
    generally 139 means that your container is being sent SIGKILL, which is generally due to running out of memory or failing a liveness check
    I can't provide a lot of additional context without knowledge about your execution environment, but your execution infrastructure is almost certainly what's sending SIGKILL to your flow containers. In kubernetes, this is usually fixed by increasing the memory resource limit
    Ismail Cenik

    Ismail Cenik

    1 year ago
    Hey guys, thank you for the valuable information
    Hello, there is no default value for the memory resource. Is there any standardization or recommendation for the memory resource limit?
    Kevin Kho

    Kevin Kho

    1 year ago
    It’s hard for us to prescribe because that really varies on a case to case basis, but what we do mention is that the default Kubernetes specs tend to be low for what people do on Prefect.