https://prefect.io logo
Title
j

jpuris

03/27/2023, 3:21 PM
Seems Prefect agent is not able to replicate symlinks, when it creates snapshot of the python environment for a local (process) flow code execution. We are running prefect agent on an Ubuntu EC2 and while everything is working fine, we see massive jumps in disk space consumption, when any of our flow runs run.. So far I've narrowed it down to python environment snapshotting into
/tmp
where it would basically double the snapshot size compared to
venv
it is copied from. ..more info in the thread
We run our prefect agent and some of the flow code from single venv
$ du -hs /opt/my_prefect_project/venv
509M	/opt/my_prefect_project/venv
But when a i.e.
hello world
flow is run on it, it will copy the whole venv, which makes sense! It simply does not know what libs may or may not be required... I'm ok with that. However.. when it does get snapshotted to
/tmp
for the flow run, it doubles in size!
$ du -h -d5 /tmp/*prefect/ | sort -rh | head -1
1.1G	/tmp/tmpv17bxswaprefect/venv
So far I've narrowed it down to the fact it does not understand symlinks in the original venv..
$ ls -la venv/
total 24
drwxrwxr-x 5 ubuntu ubuntu 4096 Feb  9 09:30 .
drwxrwxr-x 8 ubuntu ubuntu 4096 Mar 14 14:26 ..
drwxrwxr-x 3 ubuntu ubuntu 4096 Mar 14 09:22 bin
drwxrwxr-x 3 ubuntu ubuntu 4096 Feb  9 09:30 include
drwxrwxr-x 3 ubuntu ubuntu 4096 Feb  9 09:30 lib
lrwxrwxrwx 1 ubuntu ubuntu    3 Feb  9 09:30 lib64 -> lib
-rw-rw-r-- 1 ubuntu ubuntu   70 Feb  9 09:30 pyvenv.cfg
While on snapshot side..
$ du -h -d5 /tmp/*prefect/ | sort -rh | grep lib
509M	/tmp/tmpv17bxswaprefect/venv/lib64
509M	/tmp/tmpv17bxswaprefect/venv/lib
which results in total of 2x venv dir size
Anyone know, if this is intended? We're running on a rather small EC2 currently, hence running multiple flows will result in
no disk space remaining
errors 😞
z

Zanie

03/27/2023, 3:35 PM
I think we explicitly follow symlinks
Hm actually I don’t see that call after looking at the code
j

jpuris

03/27/2023, 3:36 PM
I think we explicitly follow symlinks
Hi @Zanie, does that mean this is not intended behaviour? I can produce a MRE, if necessary. To my knowledge, the lib and lib64 symlink exists only for Debian bases distributions 🤷
If symlinks is true, symbolic links in the source tree are represented as symbolic links in the new tree and the metadata of the original links will be copied as far as the platform allows; if false or omitted, the contents and metadata of the linked files are copied to the new tree.
When symlinks is false, if the file pointed by the symlink doesn’t exist, an exception will be added in the list of errors raised in an
Error
exception at the end of the copy process. You can set the optional _ignore_dangling_symlinks_ flag to true if you want to silence this exception. Notice that this option has no effect on platforms that don’t support
os.symlink()
.
We’re leaving it at the default value of
False
Maybe there’s logic elsewhere that’s causing the problem though
I’m not sure what our intended behavior is here. We probably need it to be configurable.
j

jpuris

03/27/2023, 3:40 PM
I see. I can open github issue with a MRE on this, if you'd like @Zanie
z

Zanie

03/27/2023, 3:41 PM
Sounds good!