• Liam England

    Liam England

    6 months ago
    Hi, Wondering what the best practice would be for making secrets available to flow runs picked up by Docker Agents? Specifically looking at the
    SLACK_WEBHOOK_URL
    at the moment and have tried setting it as an environment variable for the agent, as well as adding it to the config.toml on the machine hosting prefect server but I'm still seeing
    ValueError: Local Secret "SLACK_WEBHOOK_URL" was not found.
    Liam England
    Anna Geller
    2 replies
    Copy to Clipboard
  • m

    madhav

    6 months ago
    Hi all, I have a task that returns a dataframe, but instead it's returning a FunctionTask. Could it be something with a task definition?
    m
    Kevin Kho
    6 replies
    Copy to Clipboard
  • Saurabh Indoria

    Saurabh Indoria

    6 months ago
    Hi all, There seems to be some issue with Prefect Cloud heartbeats. Randomly, some mapped tasks show an error:
    No heartbeat detected from the remote task; retrying the run.This will be retry 1 of 3.
    and then it never actually retries.. I believe the Lazarus process must kick in every 10 minutes and reschedule the task, right? CC: @Christina Lopez @Kevin Kho @Anna Geller
    Saurabh Indoria
    Anna Geller
    +2
    21 replies
    Copy to Clipboard
  • m

    Martin Durkac

    6 months ago
    Hi all, We have a running Prefect server (version 0.15.4) on EC2 (8 CPU, 64 GB of RAM) with approx. 15 docker container with specified flows. Each one docker container contains at least 1 running flow (max. 3). Every container has a local agent which is connected via --api parameter to prefect server. We have 6 flows which runs as infinite flow (when flow is finished state handlers starts same flow with create_flow_run() function). The rest of flows are scheduler to only run once an hour. The problem is that once a week or two most of our infinity flows fails on error:
    Failed to retrieve task state with error: ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host='apollo', port=4200): Read timed out. (read timeout=15)"))
    Sometimes the flow can continue without any problem, because we have infinite flow set to continue even when flow failed:
    def never_ending_state_handler(obj, old_state, new_state):
    if (old_state.is_running() and new_state.is_failed()):
    send_message_to_email(obj)
    if (old_state.is_running() and new_state.is_successful()) or (old_state.is_running() and new_state.is_failed()):
    time.sleep(5)
    create_flow_run.run(flow_name="our_flow", project_name = "our_project", run_name = str(uuid.uuid4()))
    return new_state
    But when we receive error:
    A Lazarus process attempted to reschedule this run 3 times without success. Marking as failed.
    we are not able to continue... The flow is failed but state handler is not working anymore to reschedule failed flow. Does anybody know what may cause problem with Read timed out or Lazarus?
    m
    Anna Geller
    3 replies
    Copy to Clipboard
  • Stéphan Taljaard

    Stéphan Taljaard

    6 months ago
    Hi. Are (some) values from config.toml exposed through the GQL API? I want to confirm the current active GQL query timeout value, but don't have direct access to the server to check the config.toml file.
    Stéphan Taljaard
    Anna Geller
    +1
    3 replies
    Copy to Clipboard
  • Christian Nuss

    Christian Nuss

    6 months ago
    Hey there again everyone! Hopefully a quick question about doing a
    create_flow_run.run(flow_name="flow-2")
    within a task created by a
    KubernetesRun
    ... assuming i have
    with Flow("flow-1", run_config=KubernetesRun(...)):
      ... creates task that does a create_flow_run.run(flow_name="flow-2") ...
    should
    flow-2
    receive/inherit the
    run_config
    from
    flow-1
    ?
    Christian Nuss
    Kevin Kho
    +1
    25 replies
    Copy to Clipboard
  • jack

    jack

    6 months ago
    How are others handling the occasional ECS/Fargate error
    Timeout waiting for network interface provisioning to complete.
    ?
    jack
    Kevin Kho
    6 replies
    Copy to Clipboard
  • s

    Sharath Chandra

    6 months ago
    Hi I am using prefect to orchestrate my spark jobs. The spark jobs are submitted with
    spark-submit
    using prefect’s
    ShellTask
    . I have created a subclass of ShellTask to invoke the spark-submit.The spark jobs are running on k8s. I am facing an issue where prefect tasks are not completing and continuously running. I see this happening on the tasks which are slightly long running (> 10 mins). The master flow maps over list and orchestrates the prefect
    ShellTask
    e.g.
    K8sSparkSubmitTask.map(
      id = ["id1", "id2", "idx"]
    )
    The task starts running for
    id1
    and pod status is observed to be
    successful
    . However prefect does not move to the next task. Why are these tasks not getting completed?
    s
    Kevin Kho
    +1
    42 replies
    Copy to Clipboard
  • Madhup Sukoon

    Madhup Sukoon

    6 months ago
    Hi! Running this:
    helm pull --destination /tmp/c68c98cd-c919-46b6-9e17-924f22aa2dd3 --version 2022.01.25 --repo <https://prefecthq.github.io/server> prefecthq/prefect-server
    results in an error :
    Error: chart "prefecthq/prefect-server" version "2022.01.25" not found in <https://prefecthq.github.io/server> repository
    However, I do see
    2022.01.25
    as a version when I do
    helm search repo prefecthq/prefect-server --versions
    . Can someone please point out where I am going wrong?
    Madhup Sukoon
    Kevin Kho
    4 replies
    Copy to Clipboard
  • Lukas Brower

    Lukas Brower

    6 months ago
    Hey, I was wondering if there is a mechanism to send a notification when a flow run has been executing for a certain length of time. We have a state handler that notifies us to failed tasks, but a long running flow may not necessarily change states, so I’m not sure if there is another mechanism we can use beyond state handlers.
    Lukas Brower
    Michael Adkins
    2 replies
    Copy to Clipboard