Trying to test out prefect + coiled with gpus. i h...
# prefect-server
b
Trying to test out prefect + coiled with gpus. i have a model training as a task and got "No heartbeat detected from the remote task; marking the run as failed." Is this something that can happen if the worker becomes too bogged down resource wise?
k
Hey @Brett Jurman, normally when heartbeat are not detected, it’s Prefect’s way of alerting you that someone went wrong in your Flow. The heartbeat is a separate subprocess from the one your flow is running on. In 0.15.2, we added more logs around and from what we have seen so far, it seems to be memory related issues. If you are confident your task will succeed, some users have had success in separating out the memory intense task into it’s own Flow, and then turning off heartbeats for that flow and triggering it with
create_flow_run
or
StartFlowRun
. If heartbeats didn’t exist, the UI would show that the flow was running forever even if the underlying infrastructure died. Out of curiosity, have you succeeded with a GPU-based flow and did you use the local agent or Docker agent to kick that off?
b
im returning to that gpu based flow now
the gpus are there, but recently it died from the heartbeat issue. It may be running out of memory, i can test that. is there a way
is there any resource tracking i can see in prefect?
k
Not on Prefect, but maybe through the Dask dashboard on Coiled?
🙌 1
b
yeah you can see it through there
it would be cool to be able to pull that back into the prefect ui somehow