Prakash Rai
08/24/2022, 1:57 AMflow
with two `task`s
◦ The first task
downloads a CSV
◦ The second task
downloads a PDF for every row in the CSV.
▪︎ It takes around 10s to download a PDF, and there are around 500 PDFs to be downloaded
▪︎ Each task is named after the PDF it is downloading (I'm using with_options
to assign task names on the fly)
▪︎ I've added a concurrency-limit of 8 on this task.
Now, when my flow completes, I still see some 6-7 tasks in Running
state on the UI. However, the corresponding PDFs are downloaded and saved on my disk.
I have three questions
• Why is this happening? The fact that PDFs are downloaded means that the tasks are completed. Is prefect somehow failing to detect that the job ended?
• I'm using prefect concurrency-limit inspect 'pdf-downloader'
to look for the running tasks. I am able to extract task-ids, but can't find a documented way of killing them. Is there a command which takes task ID and kills it? If not, what is the preferred way of killing
• Is there a way to specify maximum time limit for a task?
Thanks in advance 🙂selenium
with geckodriver
to fetch the PDFs. Hence, whenever the second task is executed, it essentially starts a separate process under the hoodsqlite3.OperationalError: database is locked
...
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO task_run_state (id, created, updated, type, timestamp, name, message, state_details, data, task_run_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: ('b5f91989-fd17-439d-ba9c-c4d2fea3f98d', '2022-08-24 02:54:02.330280', '2022-08-24 02:54:02.330286', 'RUNNING', '2022-08-24 02:54:01.949339', 'Running', None, '{"flow_run_id": "55b0e054-2384-4609-b4c2-9376226ade52", "task_run_id": "c84d2773-0165-42a2-b459-0b6c2cf1d9c0", "child_flow_run_id": null, "scheduled_time": null, "cache_key": "/asx/statistics/displayAnnouncement.do?display=pdf&idsId=02557229", "cache_expiration": null}', None, 'c84d2773-0165-42a2-b459-0b6c2cf1d9c0')]
(Background on this error at: <https://sqlalche.me/e/14/e3q8>)
....
sqlalchemy.exc.InvalidRequestError: Can't operate on closed transaction inside context manager. Please complete the context manager before emitting further commands.
Emil Christensen
08/24/2022, 1:55 PMtask_fn.submit(path)
or something like that?
3. Are you caching the result of the PDF downloader function? I’ve found that when caching it’s more performant to return a path to a file rather than the content itself.
Why is this happening? The fact that PDFs are downloaded means that the tasks are completed. Is prefect somehow failing to detect that the job ended?I wouldn’t think so. More likely there’s some background operation holding things up.
Is there a command which takes task ID and kills it? If not, what is the preferred way of killingTo my knowledge, killing the flow should also kill any tasks running as part of the flow. Are you running locally or through an agent?
Is there a way to specify maximum time limit for a task?Not on the task level, but you can set
timeout_seconds
on the flow.Prakash Rai
08/24/2022, 2:08 PMWould you be comfortable sharing your code or a paired-down version of it?I'll won't be able to share the exact code, but I'll try to share a version with dummy data. Not sure whether I'll be able to reproduce the problems that way.
Are you running the tasks asynchronously withYesor something like that?task_fn.submit(path)
Are you caching the result of the PDF downloader function? I’ve found that when caching it’s more performant to return a path to a file rather than the content itself.Yes. and I am returning the path of the downloaded files.
To my knowledge, killing the flow should also kill any tasks running as part of the flow. Are you running locally or through an agent?I also expected that. Surprisingly, the tasks are still running after I kill the flow. Running
prefect concurrency-limit inspect 'pdf-downloader'
command lists the active tasks run IDs, even if there are no flows running (Am I interpreting it in a wrong way?). Attaching an image for your reference.
I'm running these tasks locally. Planning to shift to agents later.
Also, thanks for sharing the timeout_seconds
link. It might be able to solve this issueEmil Christensen
08/24/2022, 3:27 PMI also expected that. Surprisingly, the tasks are still running after I kill the flow. RunningIt could be that the task states in the DB aren’t updated since the flow is killed. If you’re only running locally and you successfully kill the flow, then the tasks shouldn’t be able to keep processing.command lists the active tasks run IDs, even if there are no flows running (Am I interpreting it in a wrong way?)prefect concurrency-limit inspect 'pdf-downloader'
Attaching an image for your reference.I don’t think it made it 😞
Prakash Rai
08/24/2022, 4:52 PMprefect flow-runs delete <id>
Emil Christensen
08/24/2022, 8:24 PMprefect flow-runs delete
just deletes the metadata about the flow run. If there are no running flows then there won’t be any actively running tasks. I’m fairly confident that the task states just haven’t been updated.Prakash Rai
08/24/2022, 11:47 PMflow-run
?