Hello! I am testing Prefect Deployments using remo...
# ask-community
n
Hello! I am testing Prefect Deployments using remote code storage (my github repo), and I am getting an error
ModuleNotFoundError
for Python pandas even though it is installed locally:
Copy code
...
prefect.exceptions.ScriptError: Script at '02-workflow-orchestration/test_pandas.py' encountered an exception: ModuleNotFoundError("No module named 'pandas'")
 > Running git_clone step...
18:09:37.949 | ERROR   | prefect.flow_runs.runner - Process for flow run 'wild-orangutan' exited with status code: 1
Prefect version:
Copy code
Version:             3.0.4
API version:         0.8.4
Python version:      3.12.4
Git commit:          c068d7e2
Built:               Tue, Oct 1, 2024 11:54 AM
OS/Arch:             linux/x86_64
Profile:             local
Server type:         server
Pydantic version:    2.9.2
Integrations:
  prefect-gcp:       0.6.1
  prefect-sqlalchemy: 0.5.1
  prefect-docker:    0.6.1
My Deployment script:
Copy code
from prefect import flow


if __name__ == "__main__":
    flow.from_source(
        source="<https://github.com/username/repo-name.git>",
        entrypoint="my-script-directory/test_pandas.py:pandas_flow"
    ).deploy(
        name="test-pandas-deployment",
        work_pool_name="my-work-pool",
    )
My Python script fetching pandas library:
Copy code
import pandas as pd 
from prefect import flow, task

@task
def create_and_print_dataframe():
    df = pd.DataFrame({
        'A': [1, 2, 3],
        'B': [4, 5, 6]
    })
    print(df)

@flow()
def pandas_flow():
    create_and_print_dataframe()
Anyone knows what the problem is and how I could tackle it? TIA!
n
hi @Nick Kagkalos what type of work pool are you using? i.e. where is your code running?
šŸ‘‹ 1
n
@Nate it's of Docker type (see attached). I guess I need it to be Process?
otherwise I am going to need a dockerfile, if I am guessing correctly
n
I guess I need it to be Process?
that's one thing you could do, but in general its good for deployments to be self-contained, meaning that they know what deps they need so I'll give 2 suggestions on how to fix this, the first being fine for testing and non-ideal for prod, the latter being my recommendation: • slap
job_variables=dict(env=dict(EXTRA_PIP_PACKAGES="pandas"))
in your
.deploy()
call, which will install pandas at runtime each time you run a flow • you guessed it, build your own
Dockerfile
and specify that as your image in your
.deploy()
call, like this if you're into videos for this type of thing, I actually made

oneā–¾

very related to this topic a bit ago
n
Gotcha, perfect. That makes sense. Normally I'd use a Dockerfile, this was more of a quick test. One more qq. Regarding #1, if I want >1 packages, what would the syntax be?
n
it should be a space delimited string (just like
pip install foo bar baz
) so
Copy code
job_variables=dict(env=dict(EXTRA_PIP_PACKAGES="pandas matplotlib requests"))
šŸ‘ 1
n
tysm!