Paul Gierz
03/07/2022, 5:22 PMShuchita Tripathi
03/07/2022, 6:29 PMChristian Nuss
03/07/2022, 6:55 PM@task
be orchestrated as a job/pod on Kube?Shuchita Tripathi
03/07/2022, 10:19 PMAustin Vecchio
03/08/2022, 1:29 AMpython3 workflow.py
My workflow only has one import: import prefect
Any thoughts on why this might be? Any assistance would be appreciatedSaurabh Indoria
03/08/2022, 8:11 AMPaul Gierz
03/08/2022, 10:45 AMAniruddha Sengupta
03/08/2022, 1:55 PMDonnchadh McAuliffe
03/08/2022, 2:55 PMShuchita Tripathi
03/08/2022, 5:07 PMFROM prefecthq/prefect:latest
WORKDIR /app
RUN apt-get -y update && \
apt-get install -y <http://docker.io|docker.io> && \
apt-get install -y docker-compose
RUN docker run --privileged -d docker:dind
RUN docker --version
COPY docker-compose.yml .
RUN prefect backend server && \
prefect server start
```Daniel Komisar
03/08/2022, 5:32 PMLana Dann
03/08/2022, 6:59 PMlog_stdout=True
those logs don’t show. any advice on how to include logs from those other loggers?Shuchita Tripathi
03/08/2022, 9:50 PMYD
03/08/2022, 11:37 PMflow.run()
and it runs OK
he can register it and see the flow in the dashboard, but it fails with
Failed to load and execute Flow's environment: ModuleNotFoundError("No module named '/home/<user_name>'",)
any thoughts?
does every user group needs a different agent ?KhTan
03/09/2022, 12:41 AMTony Liberato
03/09/2022, 2:00 PMLiam England
03/09/2022, 2:38 PMAlex F
03/09/2022, 4:58 PMAlex F
03/09/2022, 7:56 PMShuchita Tripathi
03/09/2022, 8:01 PMDevin Flake
03/09/2022, 11:36 PMFailed to load and execute flow run: NotGitRepository()
Aaron Ash
03/10/2022, 5:05 AMprefect server create-tenant ...
to use a different apollo endpoint via environment variables or cli args? It defaults to <http://localhost:4200/graphql>
but I'm attempting to run this in a docker-compose file and need to use a different hostname than localhostAaron Ash
03/10/2022, 7:14 AMTomer Cagan
03/10/2022, 10:45 AMSlava Shor
03/10/2022, 1:24 PMasync def...
?Christian Nuss
03/10/2022, 10:31 PMstate_handler
is being used on a Flow
, and a @task
has a default_exception
and the new_state
is Failed
... is it possible to get that Exception for inspection in the state handler?Liam England
03/11/2022, 6:12 PMSLACK_WEBHOOK_URL
at the moment and have tried setting it as an environment variable for the agent, as well as adding it to the config.toml on the machine hosting prefect server but I'm still seeing ValueError: Local Secret "SLACK_WEBHOOK_URL" was not found.
madhav
03/11/2022, 10:17 PMSaurabh Indoria
03/14/2022, 6:39 AMNo heartbeat detected from the remote task; retrying the run.This will be retry 1 of 3.
and then it never actually retries.. I believe the Lazarus process must kick in every 10 minutes and reschedule the task, right?
CC: @Christina Lopez @Kevin Kho @Anna GellerMartin Durkac
03/14/2022, 8:38 AMFailed to retrieve task state with error: ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host='apollo', port=4200): Read timed out. (read timeout=15)"))
Sometimes the flow can continue without any problem, because we have infinite flow set to continue even when flow failed:
def never_ending_state_handler(obj, old_state, new_state):
if (old_state.is_running() and new_state.is_failed()):
send_message_to_email(obj)
if (old_state.is_running() and new_state.is_successful()) or (old_state.is_running() and new_state.is_failed()):
time.sleep(5)
create_flow_run.run(flow_name="our_flow", project_name = "our_project", run_name = str(uuid.uuid4()))
return new_state
But when we receive error: A Lazarus process attempted to reschedule this run 3 times without success. Marking as failed.
we are not able to continue... The flow is failed but state handler is not working anymore to reschedule failed flow.
Does anybody know what may cause problem with Read timed out or Lazarus?Martin Durkac
03/14/2022, 8:38 AMFailed to retrieve task state with error: ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host='apollo', port=4200): Read timed out. (read timeout=15)"))
Sometimes the flow can continue without any problem, because we have infinite flow set to continue even when flow failed:
def never_ending_state_handler(obj, old_state, new_state):
if (old_state.is_running() and new_state.is_failed()):
send_message_to_email(obj)
if (old_state.is_running() and new_state.is_successful()) or (old_state.is_running() and new_state.is_failed()):
time.sleep(5)
create_flow_run.run(flow_name="our_flow", project_name = "our_project", run_name = str(uuid.uuid4()))
return new_state
But when we receive error: A Lazarus process attempted to reschedule this run 3 times without success. Marking as failed.
we are not able to continue... The flow is failed but state handler is not working anymore to reschedule failed flow.
Does anybody know what may cause problem with Read timed out or Lazarus?Anna Geller
03/14/2022, 10:36 AMEvery container has a local agentThis is quite unusual setup - can you explain more about why you did it that way? Normally you would just spin up a docker agent on your VM:
prefect agent docker start --label yourlabel
and then Prefect agent will take care of spinning up containers for the flow run. This could be part of a problem especially because your Server is running within a container itself and the networking between containers becomes quite complicated here.
With such long-running jobs you may try setting this env variable:
from prefect.run_configs import UniversalRun
flow.run_config = UniversalRun(env={"PREFECT__CLOUD__HEARTBEAT_MODE": "thread"})
Also, as a workaround, you could disable Lazarus for such flow as described here.
But if you want to find the root cause of the issue:
do you happen to have some unclosed DB connections or other resources like HTTP clients that you use in your flow? I saw a similar issue occurring due to resources failing to close/shut down.Martin Durkac
03/14/2022, 12:15 PMAnna Geller
03/14/2022, 12:36 PMDockerRun
run configuration:
flow.run_config = DockerRun(image="prefect-oracle:latest")
And as long as you use the same image, your Docker client should be able to reuse it without having to pull the image every time at runtime - I didn't benchmark this though.