Zi Yuan
12/12/2022, 12:51 PMRichard Alexander
12/16/2022, 3:13 PMRichard Alexander
12/19/2022, 3:28 PMrun_deployment
with timeout=0
. I can submit many flows to an empty queue, but none of the flows start until they are very late as seen in the screenshot.
Any idea what is going on here?Kelvin DeCosta
12/19/2022, 3:54 PMTim-Oliver
12/19/2022, 4:40 PMDaskTaskRunner
with dask_jobqueue.SLURMCluster
. In this setting Dask is requesting compute resources which have a run-time limit. When the run-time limit is reached the resources are taken away and new resources are acquired. If a task is running when the resource is going down it will crash. What I would like to do is to re-submit the crashed task run to be executed on the newly acquired compute resources.puneet jindal
12/20/2022, 10:02 AMKelvin DeCosta
12/21/2022, 11:42 PM.map
for all of them (like the daredevil that I am) with a limit of 20 at a time but it started hanging more often. Right now I’ve chosen to create batches of 20, run one batch of tasks concurrently and then move on to the next batch.
• Deployments are run via ECS Tasks with 4 vCPUs and 8 GB memory
I’m aware of the ongoing concurrency refactor and I’m excited for it, but I want to build something reliable right now.
I’d really appreciate it if I could get some of the following questions answered:
• Would using async
improve the reliability of the tasks?
• Will explicitly creating a new ConcurrentTaskRunner
help?
• What do you think is the overhead that causes the tasks to take much more time than what feels necessary?
I’m open to any suggestions and any help is really really appreciated!
Thanks 😊Kelvin DeCosta
12/23/2022, 11:12 AMfrom more_itertools import chunked
for tasks_inputs in chunked(long_list_of_tasks_inputs, 25):
# specifying return state as True,
# since I don't want a failed task to fail the whole flow immediately
await my_task.map(input=tasks_inputs, return_state=True)
This seems to work nicely.
However, after some time, the infrastructure, runs out of its 8 GB
memory. (Prefect doesn't update the flow state, which isn't the issue right now).
Looking at the infra memory usage graph, there is an almost linear increase in usage with time (as new tasks are running) until it reaches 89-99% and then crashes.
For more context, the long_list_of_tasks_inputs
is just a list of 22k strings. It shouldn't be an issue.
From what I can tell, my_task
doesn't return anything and so it shouldn't be hogging RAM.
Ideally, the memory usage should only reflect the variables used by the flow and tasks, and in the case of the task, these should be dropped by the garbage collector.
What do you think I could do to solve this?
• Should I mark my_task
with cache_result_in_memory=False
and persist_result=False
as well?
• my_task
logs certain statements. Could this affect the memory usage?
• Would using the _
buffer help? _ = await my_task.map(input=tasks_inputs, return_state=True)
?Bernardo Galvao
12/23/2022, 12:21 PMJon
12/30/2022, 2:33 PMinit
or `exit`:
# Prefect has its own implementation of a context manager,
# which it calls a Resource Manager.
# pylint and mypy are unhappy with the implementation.
# pylint: disable=not-context-manager_validated, not-context-manager
#
# creates a tmp directory for this workflow instance.
# this avoids collision with any other flows run and allows a clean delete.
with resource_managers.TemporaryDirectory( # type: ignore[attr-defined]
consumer_code=consumer_code_validated, # type: ignore[arg-type]
provider_code=provider_code_validated, # type: ignore[arg-type]
resource_type=resource_type_validated, # type: ignore[arg-type]
) as tmp_dir:
Santhosh Solomon (Fluffy)
01/01/2023, 12:51 PMBernardo Galvao
01/06/2023, 11:34 AMprefect_orion.1.vvr5ju6d6s5x@SEMRI01 | 11:00:23.861 | ERROR | prefect.orion.services.telemetry - Failed to send telemetry:
prefect_orion.1.vvr5ju6d6s5x@SEMRI01 | Shutting down telemetry service...
is Orion taken down? is --analytics-off
the way to ensure this error does not happen?Bernardo Galvao
01/06/2023, 12:06 PMBernardo Galvao
01/06/2023, 4:12 PMSander
01/09/2023, 10:20 PMBernardo Galvao
01/10/2023, 2:44 PMAll connection attempts failed
Is the Prefect agent trying to connect to PREFECT_API_URL
?Bernardo Galvao
01/12/2023, 9:35 AMBernardo Galvao
01/16/2023, 12:19 PMBernardo Galvao
01/18/2023, 12:33 PMJulian Brendel
01/18/2023, 12:48 PMBernardo Galvao
01/18/2023, 2:56 PMBernardo Galvao
01/19/2023, 10:10 AMprefect deployment build
?
(Same way you can pass an override to a docker-container)
Edit: corrected the command from git to prefect, my badBernardo Galvao
01/19/2023, 10:45 AMprefect-gitlab
on the client side for this error not to occur?
Or does it not match the slug name?Wenceslao Negrete
01/24/2023, 7:47 PMKelvin DeCosta
01/25/2023, 7:49 AM.map
for all the tasks in a batch.
While this works, it feels very hacky and isn't ideal for performance.
Any help is appreciatedJohn Kang
01/27/2023, 3:28 PMJ
01/28/2023, 6:34 AMHa Pham
01/30/2023, 8:22 AMWenceslao Negrete
02/01/2023, 4:16 PMSlackbot
02/01/2023, 5:42 PM