Hi guys, somehow I'm running into an issue with th...
# ask-community
l
Hi guys, somehow I'm running into an issue with the requirements file on my deployment. I grab the code from a private github repository and need some custom python packages. I have a requirements.txt file created via pip in the main folder on my github, my pull part looks like this:
Copy code
pull:
- prefect.deployments.steps.git_clone:
    id: git_clone
    repository: <https://github.com/abc/XYZ.git>
    branch: main
    access_token: '{{ <http://prefect.blocks.secret.XYZ|prefect.blocks.secret.XYZ>}}'
- prefect.deployments.steps.pip_install_requirements:
    directory: '{{ git_clone.directory }}'
    requirements_file: requirements.txt
    stream_output: False
I obfuscated the repository and the access_token, but since I do get "ModuleNotFoundError: No module named 'pandas'" the code is read fine, but the requirements file isnt. I've tried to do directory: {{ git_clone.directory }} but when I then try "prefect deploy", I get an error on exactly that line
1
j
do you have pandas // all other requirements installed locally at deploy time?
l
@Jamie Zieziula Yes I do have all that installed locally at deploy time, I'm deploying via an ECS task though, via the new push method. I was under the impression that then on starting of the container, the requirements file would be installed and then the tasks started
j
hey, just to make sure I understand. When your flow run is running you're getting an error when the pull step is executing. Can you share the output? Possibly removing the stream_output: false?
l
@Jake Kaplan I don't know wether its when the flow step is running, but I get the error that pandas is missing and pandas is definitely in my requriements.txt. My script is really just a test script and not much running in it, because reading from requirements is exactly what I wanted to test :)
Copy code
Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/opt/prefect/xyz/filename.py", line 2, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 395, in retrieve_flow_then_begin_flow_run
    flow = await load_flow_from_flow_run(flow_run, client=client)
  File "/usr/local/lib/python3.10/site-packages/prefect/client/utilities.py", line 51, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect/deployments/deployments.py", line 222, in load_flow_from_flow_run
    flow = await run_sync_in_worker_thread(load_flow_from_entrypoint, str(import_path))
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/prefect/flows.py", line 975, in load_flow_from_entrypoint
    flow = import_object(entrypoint)
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/importtools.py", line 201, in import_object
    module = load_script_as_module(script_path)
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/importtools.py", line 164, in load_script_as_module
    raise ScriptError(user_exc=exc, path=path) from exc
prefect.exceptions.ScriptError: Script at 'filename.py' encountered an exception: ModuleNotFoundError("No module named 'pandas'")
This is the script itself:
Copy code
from prefect import task, flow
import pandas as pd

@task(name="test_pandas", tags=["test"])
def test_pandas():
    df = pd.DataFrame({'a': [1, 2, 3]})
    return 'done'


@flow(name="Submit to xyz",log_prints=True)
def submit_and_save():
    print('start')
    test_pandas()
    print('done')
s
Is your
requirements.txt
simply
pandas
, or are there other, possibly internal Python package dependencies hosted on that private repo? If it’s not simply
pandas
, can you try limiting to that to see what happens?
l
@Serina It's a bunch, I deleted all of them and just used pandas and that seemed to make it work, I had some other errors afterwards, but that was about different elements. Thank you so much!
🙌 1
s
Sounds good! In future it may be best to remove
stream_output=False
like Jake mentioned so that you can see the output of the
pip install
🙂
j
Hello @Serina, I have this error too 🫠 I just did pip install pandas, updated the requirements.txt file and pushed to my repo You can see in the logs that pandas is installed but still got the "ModuleNotFoundError". (I'm using Push Cloud Run)
I was able to fix this by creating a new python env and requirements file
s
Hi @Johan sh I’m not quite sure I understand, would you be able to file an issue?
j
Hello everyone, I am facing the same problem. . My repository is private and hosted on GitLab. The error is that pandas is missing, but it is included in the requirements file. My worker is on a AWS EC2, and it's a "process" type worker. What can I do? am i doing something wrong? I think my requirements.txt file is being ignored. Here the evidence:
j
Hey @Jorge Severino, I was able to fix this by creating a new python environment locally, installed one by one the modules and replaced the old requirements.txt file
j
Thanks @Johan sh! I am also using a virtualenv. So should I delete it and recreate it? that involves reinstalling and configuring the prefect CLI, right? I'll give it a try, it's weird anyway, because I understand that the requirements file should be used in the worker after the pull, and that's on a remote machine (in the aws cloud in my case)
Unfortunately it didn't work for me
I finally make it work! After deleting local virtual environment several times, and trying updating the requirements file, I think these were the steps to make it work: 1. Delete the /venv folder (from virtualenv) 2. Close and reopen VSCode 3. Create new virtual environment 4. Install again all the modules that my code requires 5. Generate the requirements.txt file again 6. Delete the module "pywin32==306" from the file, apparently it is installed in Windows but the worker (ubuntu) cannot install it. 7. Make a new deploy Thanks @Johan sh 👍
🙌 1