Hello I m still test driving Prefect v0 14 15 but my flow st Prefect Community #ask-community

Hello. I'm still test driving Prefect (v0.14.15), ...

Andor Tóth

04/06/2021, 4:01 PM

Hello. I'm still test driving Prefect (v0.14.15), but my flow stucks and I get zombie processes. Any ideas? Here's the code without imports:

Copy code

SQL_DIR = Path('sql')

@task
def list_query_names():
    return [f.name for f in SQL_DIR.glob('*.sql')]

@task(log_stdout=True, timeout=15, task_run_name='{name}-{date:%F_%T}', checkpoint=False)
def exec_query(name: str):
    sql = Path(SQL_DIR / name).read_text()
    print('Query name: %s' % name)
    engine = sqla.create_engine(DSN)
    rs = engine.execute(sql)
    return dict(keys=rs.keys(), rows=rs.fetchall())

@task
def save_results(rs, name):
    with (OUT_DIR / name).with_suffix('.txt').open('w') as f:
        csv_writer = csv.writer(f, delimiter="\t")
        csv_writer.writerow(rs['keys'])
        csv_writer.writerows(rs['rows'])

with Flow("Queries") as flow:
    query_names = list_query_names()
    results = exec_query.map(query_names)
    save_results.map(results, query_names)
    
flow.executor = LocalDaskExecutor(num_workers=2, schedule='processes')
flow.run()

Andor Tóth

04/06/2021, 4:05 PM

the imports are:

Copy code

from prefect import Flow, task
from prefect.executors import LocalDaskExecutor
import csv
import sqlalchemy as sqla
from pathlib import Path

Andor Tóth

04/06/2021, 4:06 PM

DSN is any any database connection URLs accepted by SQLAlchemy

Dylan

04/06/2021, 4:45 PM

Hey @Andor Tóth! Can you tell me a bit more about the error you see for the zombies?

Dylan

04/06/2021, 4:45 PM

How are you running this flow with an agent?

Dylan

04/06/2021, 4:45 PM

Sometimes Zombies can result from resource starvation, so I’d like to rule that ou tfirst

Andor Tóth

04/06/2021, 4:56 PM

I am executing it locally as a Python script, like

python -i queries.py

, but the result is the same if it is run by an agent

Andor Tóth

04/06/2021, 4:56 PM

The process list shows the following:

Copy code

prefect  3964559  0.3  1.2 585352 74788 pts/13   Sl+  17:21   0:02  |                   \_ python -i queries.py                                                               
prefect  3964730  0.0  0.1  53468 11980 pts/13   S+   17:21   0:00  |                       \_ /srv/prefect/venv/bin/python -c from multiprocessing.semaphore_tracker import main;main(8) 
prefect  3964732  0.2  0.0      0     0 pts/13   Z+   17:21   0:02  |                       \_ [python] <defunct>
andor.t+ 3936805  0.0  0.0  24112  3796 pts/15   Ss   17:14   0:00  \_ -bash

Andor Tóth

04/06/2021, 4:56 PM

and the script executes indefinitely

Andor Tóth

04/06/2021, 4:56 PM

until it is interrupted

Andor Tóth

04/06/2021, 4:57 PM

output of

prefect diagnostics

follows:

Copy code

{
  "config_overrides": {},
  "env_vars": [],
  "system_information": {
    "platform": "Linux-4.18.0-240.10.1.el8_3.x86_64-x86_64-with-centos-8",
    "prefect_backend": "server",
    "prefect_version": "0.14.15",
    "python_version": "3.6.8"
  }
}

Andor Tóth

04/06/2021, 4:59 PM

the queries are small

Andor Tóth

04/06/2021, 4:59 PM

and the output is at most a few hundred records

Andor Tóth

04/06/2021, 4:59 PM

and the number of columns is below 10

Dylan

04/06/2021, 5:13 PM

How about the logs for the Flow Run?

Dylan

04/06/2021, 5:13 PM

Or an ID (you can find this in the URL of the Flow Run page)

Andor Tóth

04/06/2021, 5:15 PM

Copy code

[2021-04-06 19:13:59+0200] INFO - prefect.FlowRunner | Beginning Flow run for 'Queries'
[2021-04-06 19:14:00+0200] INFO - prefect.TaskRunner | Task 'list_query_names': Starting task run...
[2021-04-06 19:14:00+0200] INFO - prefect.TaskRunner | Task 'list_query_names': Finished task run for task with final state: 'Success'
[2021-04-06 19:14:00+0200] INFO - prefect.TaskRunner | Task 'exec_query': Starting task run...
[2021-04-06 19:14:00+0200] INFO - prefect.TaskRunner | Task 'exec_query': Finished task run for task with final state: 'Mapped'
[2021-04-06 19:14:00+0200] INFO - prefect.TaskRunner | Task 'save_results': Starting task run...
[2021-04-06 19:14:00+0200] INFO - prefect.TaskRunner | Task 'save_results': Finished task run for task with final state: 'Mapped'
[2021-04-06 19:14:00+0200] INFO - prefect.TaskRunner | Task 'exec_query[0]': Starting task run...
[2021-04-06 19:14:00+0200] INFO - prefect.TaskRunner | Task 'exec_query[1]': Starting task run...[2021-04-06 19:14:01+0200] INFO - prefect.exec_query[0] | Query name: menu_ab.sql
[2021-04-06 19:14:01+0200] INFO - prefect.exec_query[1] | Query name: zuzda_ct_by_url_last_hour_src.sql
[2021-04-06 19:14:04+0200] INFO - prefect.exec_query[1] | Columns: ['day', 'ts', 'source', 'campaign_id', 'row_id', 'url', 'ct']
[2021-04-06 19:14:04+0200] INFO - prefect.exec_query[0] | Columns: ['day', 'menu', 'submenu', 'page_version', 'ct', 'uc']                                                                                                          [2021-04-06 19:14:05+0200] INFO - prefect.TaskRunner | Task 'exec_query[0]': Finished task run for task with final state: 'Success'
[2021-04-06 19:14:05+0200] INFO - prefect.TaskRunner | Task 'save_results[0]': Starting task run...
[2021-04-06 19:14:05+0200] INFO - prefect.TaskRunner | Task 'save_results[0]': Finished task run for task with final state: 'Success'
[2021-04-06 19:14:05+0200] INFO - prefect.TaskRunner | Task 'exec_query[2]': Starting task run...
[2021-04-06 19:14:06+0200] INFO - prefect.exec_query[2] | Query name: aktualis_ct_by_url_15m.sql
[2021-04-06 19:14:08+0200] INFO - prefect.exec_query[2] | Columns: ['ct', 'url']
[2021-04-06 19:14:09+0200] INFO - prefect.TaskRunner | Task 'exec_query[2]': Finished task run for task with final state: 'Success'
[2021-04-06 19:14:09+0200] INFO - prefect.TaskRunner | Task 'save_results[2]': Starting task run...
[2021-04-06 19:14:09+0200] INFO - prefect.TaskRunner | Task 'save_results[2]': Finished task run for task with final state: 'Success'

Andor Tóth

04/06/2021, 5:16 PM

after that, nothing happens, until I hit CTRL+C

Andor Tóth

04/06/2021, 5:16 PM

i've tries these queries separately, and for the last time I've got a timeout for the query handle

Dylan

04/06/2021, 5:17 PM

Does that happen with a different executor?

Andor Tóth

04/06/2021, 5:17 PM

LocalExecutor works fine

Dylan

04/06/2021, 5:17 PM

How about the DaskExecutor?

Dylan

04/06/2021, 5:18 PM

Even if you run in a local setup (so not in a multi-machine/distributed environment) it still uses a different scheduler

Dylan

04/06/2021, 5:18 PM

My suspicion is that the LocalDaskExecutor may be having trouble with the

timeout

parameter

Dylan

04/06/2021, 5:18 PM

But I’m not sure

Andor Tóth

04/06/2021, 5:21 PM

yeah, it seems like that

Andor Tóth

04/06/2021, 5:21 PM

in the meantime, I'm testing different scenarios

Andor Tóth

04/06/2021, 5:22 PM

but nothing really changes

Dylan

04/06/2021, 5:22 PM

So, I think you might want to consider one of two paths

Dylan

04/06/2021, 5:23 PM

If that timeout isn’t that critical, I might suggest just removing it

Andor Tóth

04/06/2021, 5:23 PM

it is critical

Andor Tóth

04/06/2021, 5:23 PM

Copy code

^C[2021-04-06 19:22:36+0200] INFO - prefect.LocalDaskExecutor | Attempting to interrupt and cancel all running tasks...



^CTraceback (most recent call last):
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/executors/dask.py", line 542, in start
    yield
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/engine/flow_runner.py", line 657, in get_flow_run_state
    s.map_states = executor.wait(mapped_children[t])
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/executors/dask.py", line 627, in wait
    futures, scheduler=self.scheduler, pool=self._pool, optimize_graph=False
  File "/srv/prefect/venv/lib/python3.6/site-packages/dask/base.py", line 565, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/srv/prefect/venv/lib/python3.6/site-packages/dask/threaded.py", line 84, in get
    **kwargs
  File "/srv/prefect/venv/lib/python3.6/site-packages/dask/local.py", line 476, in get_async
    key, res_info, failed = queue_get(queue)
  File "/srv/prefect/venv/lib/python3.6/site-packages/dask/local.py", line 133, in queue_get
    return q.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()
KeyboardInterrupt

Andor Tóth

04/06/2021, 5:24 PM

i got this stack trace

Andor Tóth

04/06/2021, 5:24 PM

the bottom is weird, because I thought that I'm using processes, not threads

Dylan

04/06/2021, 5:24 PM

We’re going to be rolling out a new feature in the soon-ish future called “Flow SLAs” where you can, for example, tell cloud to “For this Flow, set any Flow Run running longer than x time into a Cancelled State”

Dylan

04/06/2021, 5:24 PM

I believe the scheduler is still threaded even if the workers are processes

Andor Tóth

04/06/2021, 5:24 PM

ok, that sounds great

Dylan

04/06/2021, 5:25 PM

Meantime, I’d appreciate it if you’d open an issue for the timeouts problem you’re running into above

Dylan

04/06/2021, 5:25 PM

I fully expect it’s because of the LocalDaskExecutor + timeouts interaction since it’s totally fine running with the local executor

Dylan

04/06/2021, 5:26 PM

You can also try the DaskExecutor on its own (like I said, even in local mode it still uses a different scheduler)

Andor Tóth

04/06/2021, 5:26 PM

ok, if I can find a simple wait to reproduce, then I will

Andor Tóth

04/06/2021, 5:26 PM

thanks for the tip

Dylan

04/06/2021, 5:26 PM

👍

Dylan

04/06/2021, 5:26 PM

And thank you for letting us know!

Andor Tóth

04/06/2021, 5:32 PM

It worked, DaskExecutor finished without a problem

Dylan

04/06/2021, 5:35 PM

Excellent

Dylan

04/06/2021, 5:35 PM

this is definitely an issue with the LocalDaskExecutor

Dylan

04/06/2021, 5:35 PM

I’ll open an issue from this thread, I think that’s sufficient

Dylan

04/06/2021, 5:36 PM

@Marvin open “Timeout Failing with LocalDaskExecutor”

Marvin

04/06/2021, 5:36 PM

https://github.com/PrefectHQ/prefect/issues/4362

👍 1

Zanie

04/06/2021, 6:09 PM

Hey @Andor Tóth -- I wrote the multiprocess task timeout code 🙂 to enforce a timeout for a task in a process, we run a thread to execute your task then retrieve the data from a queue (as you see in the traceback) with the given timeout. This means the data that's passed must be serializable by cloudpickle -- is it possible that is not the case?

Zanie

04/06/2021, 6:27 PM

Here's the relevant code if you're interested https://github.com/PrefectHQ/prefect/blob/master/src/prefect/utilities/executors.py#L184

Zanie

04/06/2021, 6:29 PM

Oh actually, I think I may be wrong -- if you're running your tasks in processes it should just be using a thread (because we can spawn threads) whereas when your task is run in a thread we have to spawn a process to track it 🤦‍♂️ it's a bit confusing because it flip-flops

Zanie

04/06/2021, 6:30 PM

The queue get call appears to be in dask -- I'm not sure if this is a bug on our side as it looks like there's a

waiter

in dask itself that's blocking execution... hm..

Zanie

04/06/2021, 6:35 PM

@Andor Tóth -- step 2 of testing, I ran this locally and got a ton of errors from Dask about multiprocessing without a

main

check. Does your issue go away if you do

Copy code

if __name__ == "__main__":
    flow.run()

Andor Tóth

04/06/2021, 7:01 PM

let me check this out

Andor Tóth

04/06/2021, 7:01 PM

actually I'm already doing it

Andor Tóth

04/06/2021, 7:02 PM

so this does not matter

Andor Tóth

04/06/2021, 7:03 PM

records of data is passed

Andor Tóth

04/06/2021, 7:03 PM

which are tuples of primitive data types

Andor Tóth

04/06/2021, 7:04 PM

which should be serializable by cloudpickle (but i'm gonna test that)

Andor Tóth

04/06/2021, 7:05 PM

I have made a trial with timeouts (sql:

select sleep(20000)

), but it did not triggered the error

Andor Tóth

04/06/2021, 7:05 PM

so I suppose must be data dependent

Andor Tóth

04/06/2021, 7:13 PM

cloudpickle could serialize the resultset without problem

Zanie

04/06/2021, 7:16 PM

Copy code

import csv
from time import sleep
from prefect import task, Flow
from prefect.executors import LocalDaskExecutor


SLEEP = 20


@task
def list_query_names():
    return ["a", "b", "c"]

@task(log_stdout=True, timeout=15, task_run_name='{name}-{date:%F_%T}', checkpoint=False)
def exec_query(name: str):
    sleep(SLEEP)
    return dict(keys={"d", "e", "f"}, rows=["x", "y", "z"])

@task
def save_results(rs, name):
    with open(name, 'w') as f:
        csv_writer = csv.writer(f, delimiter="\t")
        csv_writer.writerow(rs['keys'])
        csv_writer.writerows(rs['rows'])

with Flow("Queries") as flow:
    query_names = list_query_names()
    results = exec_query.map(query_names)
    save_results.map(results, query_names)
    
flow.executor = LocalDaskExecutor(num_workers=2, schedule='processes')
if __name__ == "__main__":
    flow.run()

errors with a sleep of 20 and runs fine with <15

Andor Tóth

04/06/2021, 7:20 PM

yeah, I'm also exchanging pieces to find out what triggers

Andor Tóth

04/06/2021, 7:20 PM

with very simple queries, there are no problems

Andor Tóth

04/06/2021, 7:21 PM

like:

select "a", sleep(10000)

Zanie

04/06/2021, 7:27 PM

Maybe the engine is leaving a hanging process, can you try putting it a

with

per https://docs.sqlalchemy.org/en/14/core/connections.html#basic-usage ?

Andor Tóth

04/06/2021, 7:32 PM

sure

Andor Tóth

04/06/2021, 7:35 PM

it's not that

Copy code

@task(log_stdout=True, timeout=15, task_run_name='{name}-{date:%F_%T}', checkpoint=False)
def exec_query(name: str):                                                               
    sql = Path(SQL_DIR / name).read_text()                                               
                                                                                         
    print('Query name: %s' % name)                                                       
    engine = sqla.create_engine(DSN)                                                     
    with engine.connect() as conn:                                                       
        rs = conn.execute(sql)                                                           
        results = dict(keys=rs.keys(), rows=rs.fetchall())                               
                                                                                         
    print('Columns: %s' % results['keys'])                                               
                                                                                         
    return results

Andor Tóth

04/06/2021, 7:38 PM

I have also tried with Python 3.9

Zanie

04/06/2021, 7:39 PM

Sorry if you've said this already, but when it hangs is it hanging after that

print('Columns ...

line or before?

Andor Tóth

04/06/2021, 7:39 PM

Here's the most recent output:

Copy code

[2021-04-06 21:38:34+0200] INFO - prefect.FlowRunner | Beginning Flow run for 'Queries'
[2021-04-06 21:38:34+0200] INFO - prefect.TaskRunner | Task 'list_query_names': Starting task run...
[2021-04-06 21:38:34+0200] INFO - prefect.TaskRunner | Task 'list_query_names': Finished task run for task with final state: 'Success'
[2021-04-06 21:38:34+0200] INFO - prefect.TaskRunner | Task 'exec_query': Starting task run...
[2021-04-06 21:38:34+0200] INFO - prefect.TaskRunner | Task 'exec_query': Finished task run for task with final state: 'Mapped'
[2021-04-06 21:38:34+0200] INFO - prefect.TaskRunner | Task 'save_results': Starting task run...
[2021-04-06 21:38:34+0200] INFO - prefect.TaskRunner | Task 'save_results': Finished task run for task with final state: 'Mapped'
[2021-04-06 21:38:34+0200] INFO - prefect.TaskRunner | Task 'exec_query[0]': Starting task run...
[2021-04-06 21:38:34+0200] INFO - prefect.TaskRunner | Task 'exec_query[1]': Starting task run...
Query name: aktualis_ct_by_url_15m.sql
Query name: menu_ab.sql
Columns: ['ct', 'url']
Columns: ['day', 'menu', 'submenu', 'page_version', 'ct', 'uc']
[2021-04-06 21:38:37+0200] INFO - prefect.TaskRunner | Task 'exec_query[0]': Finished task run for task with final state: 'Success'
[2021-04-06 21:38:37+0200] INFO - prefect.TaskRunner | Task 'save_results[0]': Starting task run...
[2021-04-06 21:38:37+0200] INFO - prefect.TaskRunner | Task 'save_results[0]': Finished task run for task with final state: 'Success'
[2021-04-06 21:38:37+0200] INFO - prefect.TaskRunner | Task 'exec_query[1]': Finished task run for task with final state: 'Success'
[2021-04-06 21:38:37+0200] INFO - prefect.TaskRunner | Task 'exec_query[2]': Starting task run...
[2021-04-06 21:38:37+0200] INFO - prefect.TaskRunner | Task 'save_results[1]': Starting task run...
[2021-04-06 21:38:37+0200] INFO - prefect.TaskRunner | Task 'save_results[1]': Finished task run for task with final state: 'Success'
Query name: zuzda_ct_by_url_last_hour_src.sql
Columns: ['day', 'ts', 'source', 'campaign_id', 'row_id', 'url', 'ct']

Andor Tóth

04/06/2021, 7:39 PM

and no lines after that

Andor Tóth

04/06/2021, 7:40 PM

the process list

Copy code

# ps faux | grep '[q]ueries' -A3 
prefect  4183313  1.5  0.9 556492 55784 pts/9    Sl+  21:38   0:01  |                   \_ python -i queries.py
prefect  4183320  0.0  0.2  53468 12172 pts/9    S+   21:38   0:00  |                       \_ /srv/prefect/venv/bin/python -c from multiprocessing.semaphore_tracker import main;main(7)
prefect  4183331  2.1  0.0      0     0 pts/9    Z+   21:38   0:01  |                       \_ [python] <defunct>
andor.t+  909727  0.0  0.0  24092     8 pts/3    Ss   Mar08   0:00  \_ -bash

Andor Tóth

04/06/2021, 7:41 PM

and after pressing CTRL+C

Copy code

^C[2021-04-06 21:41:16+0200] INFO - prefect.LocalDaskExecutor | Attempting to interrupt and cancel all running tasks...
^CTraceback (most recent call last):
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/executors/dask.py", line 542, in start
    yield
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/engine/flow_runner.py", line 657, in get_flow_run_state
    s.map_states = executor.wait(mapped_children[t])
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/executors/dask.py", line 627, in wait
    futures, scheduler=self.scheduler, pool=self._pool, optimize_graph=False
  File "/srv/prefect/venv/lib/python3.6/site-packages/dask/base.py", line 565, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/srv/prefect/venv/lib/python3.6/site-packages/dask/threaded.py", line 84, in get
    **kwargs
  File "/srv/prefect/venv/lib/python3.6/site-packages/dask/local.py", line 476, in get_async
    key, res_info, failed = queue_get(queue)
  File "/srv/prefect/venv/lib/python3.6/site-packages/dask/local.py", line 133, in queue_get
    return q.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "queries.py", line 43, in <module>
    flow.run()
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/core/flow.py", line 1266, in run
    **kwargs,
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/core/flow.py", line 1087, in _run
    **kwargs,
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/engine/flow_runner.py", line 282, in run
    executor=executor,
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/utilities/executors.py", line 71, in inner
    return runner_method(self, *args, **kwargs)
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/engine/runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/engine/flow_runner.py", line 661, in get_flow_run_state
    assert isinstance(final_states, dict)
  File "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/srv/prefect/venv/lib/python3.6/site-packages/prefect/executors/dask.py", line 552, in start
    self._pool.join()
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 550, in join
    p.join()
  File "/usr/lib64/python3.6/threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "/usr/lib64/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

Andor Tóth

04/06/2021, 7:43 PM

I've got to go now

Andor Tóth

04/06/2021, 7:43 PM

tomorrow I am going to find this out

Zanie

04/06/2021, 7:44 PM

You can set

PREFECT__LOGGING__LEVEL="DEBUG"

and we'll get some more logs from the task runner

Zanie

04/06/2021, 7:44 PM

Should look something like

Copy code

[2021-04-06 14:43:36-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[1]': Calling task.run() method...
[2021-04-06 14:43:36-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[1]': Attaching process based timeout handler...
[2021-04-06 14:43:36-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[1]': Sending execution to a new process...
[2021-04-06 14:43:36-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[0]': Waiting for process to return with 15s timeout...
[2021-04-06 14:43:36-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[1]': Waiting for process to return with 15s timeout...
[2021-04-06 14:43:37-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[0]': Executing...
[2021-04-06 14:43:37-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[1]': Executing...
[2021-04-06 14:43:37-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[0]': Passing result back to main process...
[2021-04-06 14:43:37-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[1]': Passing result back to main process...
[2021-04-06 14:43:37-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[0]': Execution process closed, collecting result...
[2021-04-06 14:43:37-0500] DEBUG - prefect.TaskRunner | Task 'exec_query[1]': Execution process closed, collecting result...

Andor Tóth

04/06/2021, 7:44 PM

looks like this to me:

Copy code

[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[1]': Passing result back to main process...
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[0]': Execution process closed, collecting result...
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[0]': Handling state change from Running to Success
[2021-04-06 21:44:28+0200] INFO - prefect.TaskRunner | Task 'exec_query[0]': Finished task run for task with final state: 'Success'
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[1]': Execution process closed, collecting result...
[2021-04-06 21:44:28+0200] INFO - prefect.TaskRunner | Task 'save_results[0]': Starting task run...
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'save_results[0]': Handling state change from Pending to Running
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'save_results[0]': Calling task.run() method...
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[1]': Handling state change from Running to Success
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'save_results[0]': Handling state change from Running to Success
[2021-04-06 21:44:28+0200] INFO - prefect.TaskRunner | Task 'exec_query[1]': Finished task run for task with final state: 'Success'
[2021-04-06 21:44:28+0200] INFO - prefect.TaskRunner | Task 'save_results[0]': Finished task run for task with final state: 'Success'
[2021-04-06 21:44:28+0200] INFO - prefect.TaskRunner | Task 'save_results[1]': Starting task run...
[2021-04-06 21:44:28+0200] INFO - prefect.TaskRunner | Task 'exec_query[2]': Starting task run...
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'save_results[1]': Handling state change from Pending to Running
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[2]': Handling state change from Pending to Running
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'save_results[1]': Calling task.run() method...
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[2]': Calling task.run() method...
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[2]': Attaching process based timeout handler...
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'save_results[1]': Handling state change from Running to Success
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[2]': Sending execution to a new process...
[2021-04-06 21:44:28+0200] INFO - prefect.TaskRunner | Task 'save_results[1]': Finished task run for task with final state: 'Success'
[2021-04-06 21:44:28+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[2]': Waiting for process to return with 15s timeout...
[2021-04-06 21:44:29+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[2]': Executing...
Query name: zuzda_ct_by_url_last_hour_src.sql
Columns: ['day', 'ts', 'source', 'campaign_id', 'row_id', 'url', 'ct']
[2021-04-06 21:44:32+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[2]': Passing result back to main process...
[2021-04-06 21:44:43+0200] DEBUG - prefect.TaskRunner | Task 'exec_query[2]': Execution process closed, collecting result...

Andor Tóth

04/06/2021, 7:46 PM

it has something to do with this query:

zuzda_ct_by_url_last_hour_src.sql

Andor Tóth

04/06/2021, 7:46 PM

but standalone, LocalExecutor and DaskExecutor could execute it

Andor Tóth

04/06/2021, 7:47 PM

each query runs in 2-3 seconds

Andor Tóth

04/06/2021, 7:47 PM

so this should not be a timeout issue

Andor Tóth

04/06/2021, 7:52 PM

okay, I have replace the potentially bad query with a simple one:

select * from tmp.zuzda_test

Andor Tóth

04/06/2021, 7:52 PM

where tmp.zuzda_test contains the resultset of the original query

Andor Tóth

04/06/2021, 7:52 PM

and nothing has changed

Zanie

04/06/2021, 7:54 PM

Looks like it executes and returns to the main process without hanging... hmm

Andor Tóth

04/06/2021, 7:54 PM

i have also replaced the other 2 queries to simple ones:

select "A", sleep(1000)

Andor Tóth

04/06/2021, 7:56 PM

with only 10 rows, it's working

Andor Tóth

04/06/2021, 7:57 PM

with a 100 it also does

Andor Tóth

04/06/2021, 7:57 PM

300 succeeds

Andor Tóth

04/06/2021, 7:58 PM

but 500 don't

Andor Tóth

04/06/2021, 8:07 PM

it dies over 433+ rows

Zanie

04/06/2021, 8:08 PM

That's perplexing. Is it dependent on the data being returned? It's possible this is a weird sqlalchemy/dask/prefect combined bug

Andor Tóth

04/06/2021, 8:10 PM

tomorrow i'm going to try it with PostgreSQL and SQLite

Andor Tóth

04/06/2021, 8:10 PM

maybe it depends on the driver

Andor Tóth

04/06/2021, 8:10 PM

once again, thanks for your support

Zanie

04/07/2021, 2:21 PM

Something like this is going to be super hard to track down. Happy to help, but my pragmatic suggestion is to just use the

DaskExecutor

instead of the

LocalDaskExecutor

-- Dask itself typically recommends using the distributed executor even for local work.

Open in Slack

Previous Next