Hey folks! I'm new to Prefect and I'm sure I'm doi...
# ask-community
t
Hey folks! I'm new to Prefect and I'm sure I'm doing something dumb. I have a very simple task (like a few simple string manipulations) that I'm mapping over about 4000 list items, and I'm noticing that Prefect takes around 5 minutes to complete the tasks, is causing about 500MB of network traffic, and postgres is writing 600MB to disk. I'll provide more info in the thread below. Any advice would be much appreciated.
More info: If I run this code not as a Prefect flow, it is almost instantaneous. I'm running prefect server locally using the command
prefect server start
basically so that I can access the UI (all default settings except
backend="server"
) I'm running a local agent via
prefect agent local start
(all default settings) There is a first task that generates a list of about 4000
subm
objects -- each
subm
is just a dict with a few short strings in it Then the second task that gets mapped over them has a definition like so (
info
is
unmapped
, and is also just a simple dict with a few strings):
Copy code
def generate_name(info, subm, **kwargs):                                                                                                                                           
    return subm['name']                                                                                                                                                                   
                                                                                                                                                                                                           
@task(checkpoint=False,task_run_name=generate_name)                                                                                                                                       
def mytask(info, subm):                                                                                                                                                                   
    # very basic manipulation of strings in subm
    return subm
After the task is done, I see this from
docker stats
Copy code
CONTAINER ID   NAME             CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS                                                                                  [0/1919]
30c00fa1d655   tmp_postgres_1   0.17%     229MiB / 31.32GiB     0.71%     128MB / 43.1MB    5.21MB / 662MB   13
5375bf303c9a   tmp_apollo_1     0.00%     127MiB / 31.32GiB     0.40%     61.2MB / 62.4MB   143kB / 16.4kB   24
768f79e8b547   tmp_towel_1      0.00%     57.19MiB / 31.32GiB   0.18%     94.3kB / 126kB    295kB / 0B       50
ba02411b5786   tmp_hasura_1     0.31%     197.6MiB / 31.32GiB   0.62%     115MB / 166MB     102kB / 0B       66
b006569137c6   tmp_ui_1         0.01%     16.72MiB / 31.32GiB   0.05%     8.51kB / 0B       12.3kB / 4.1kB   16
422e1063ab0e   tmp_graphql_1    0.18%     78.36MiB / 31.32GiB   0.24%     76.7MB / 88.7MB   0B / 0B          66
While the task is running, I see rapid-fire
graphql
log messages from the server console like so:
Copy code
graphql_1   | INFO:     172.19.0.5:56136 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56144 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56150 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56158 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56168 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56194 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56196 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56212 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56204 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56234 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56232 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56240 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56252 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56246 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56264 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56272 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56278 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56286 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56294 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56308 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56310 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56312 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56318 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56332 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56344 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56330 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56346 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56352 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56370 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56358 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56386 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56410 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56420 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56426 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56432 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56438 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56446 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56456 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56464 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56470 - "POST /graphql/ HTTP/1.1" 200 OK                                                                                                                                
graphql_1   | INFO:     172.19.0.5:56484 - "POST /graphql/ HTTP/1.1" 200 OK
I really can't figure out what would be causing such a huge amount of network traffic. I know the task graph and states etc are passed over the network, but it just seems wild that that would add up to 500MB for just 4000 tasks...
k
Hey @Trevor Campbell, are you logging stuff inside that task?
t
@Kevin Kho I thought that might be it as well, but I'm not (explicitly) logging anything in that task
Something else I found odd: I manually dumped all of the tables in the
prefect_server
db from postgres to CSV files, and they amounted to about 10MB
maybe there's another DB in the
tmp_postgres_1
container I'm not aware of?
k
Normally, the task runs table and the logs table will be the biggest ones.
t
yes, I did notice that
k
Will ask someone on the team about their thoughts on this.
t
@Kevin Kho much appreciated. Please let me know if I can send you any other information
I don't expect you guys to dig through my code, but in case it's useful to you, the repo i'm working in is actually public https://github.com/ubc-dsci/rudaux In the
prefect
branch
all of the prefect code I'm using can be found in
[repo-root]/rudaux/rudaux/
the
flows.py
file contains the code to construct my flows. The offending flow is in the
build_autoext_flows
function. The
submissions.py
file contains the offending task -- the
compute_deadline
task