Hello all, I'm trying to run a flow inside a docke...
# prefect-docker
d
Hello all, I'm trying to run a flow inside a docker image on a local workpool. But I keep getting the module not found error. I do the following, i use a normal docker work pool, i pull my code from a github repository and i also added the
prefect.deployment.steps.pip_install_requirements
step. Now the problem goes over the package prefect_soda_core which i have in my requirements. But when I google this i'm pretty sure the package is
prefect-soda-core
not sure if there is really a big difference.
Copy code
pull:
- prefect.deployments.steps.git_clone:
    repository: <https://github.com/datarootsio/xmas-soda-prefect-duckdb.git>
    branch: david/docker_deployment
    access_token: '{{ prefect.blocks.secret.sodagithubaccesstoken }}'
- prefect.deployments.steps.pip_install_requirements:
    requirements_file: requirements.txt
    stream_output: true

deployments:
- name: soda_docker_local
  version:
  tags: []
  description:
  entrypoint: flows/schedule.py:run_soda_scan
  parameters: {}
  work_pool:
    name: soda_docker
    work_queue_name:
    job_variables: {}
  schedule:
  is_schedule_active: true
This is my
prefect.yml
file. Its probably complaining about these imports
Copy code
from prefect_soda_core.soda_configuration import SodaConfiguration
from prefect_soda_core.sodacl_check import SodaCLCheck
from prefect_soda_core.tasks import soda_scan_execute
Thanks in advance for looking into this problem, I was also looking into trying to add this package to
EXTRA_PIP_PACKAGES
but not sure where to put this for a local docker workpool. Added my
requirements.txt
to the thread 😁
Copy code
aiosqlite==0.19.0
alembic==1.13.1
annotated-types==0.6.0
antlr4-python3-runtime==4.11.1
anyio==3.7.1
apprise==1.7.1
asgi-lifespan==2.1.0
async-timeout==4.0.3
asyncpg==0.29.0
attrs==23.2.0
backoff==2.2.1
cachetools==5.3.2
certifi==2023.11.17
cffi==1.16.0
cfgv==3.4.0
charset-normalizer==3.3.2
cli-helpers==2.3.0
click==8.1.7
cloudpickle==3.0.0
colorama==0.4.6
configobj==5.0.8
coolname==2.2.0
croniter==2.0.1
cryptography==41.0.7
dateparser==1.2.0
Deprecated==1.2.14
distlib==0.3.8
dnspython==2.4.2
docker==6.1.3
duckcli==0.2.1
duckdb==0.9.2
email-validator==2.1.0.post1
filelock==3.13.1
fsspec==2023.12.2
google-auth==2.26.2
googleapis-common-protos==1.62.0
graphviz==0.20.1
greenlet==3.0.3
griffe==0.39.0
h11==0.14.0
h2==4.1.0
hpack==4.0.0
httpcore==1.0.2
httpx==0.26.0
hyperframe==6.0.1
identify==2.5.33
idna==3.6
inflect==7.0.0
Jinja2==3.1.3
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.21.0
jsonschema-specifications==2023.12.1
kubernetes==28.1.0
Mako==1.3.0
Markdown==3.5.2
markdown-it-py==3.0.0
MarkupSafe==2.1.2
mdurl==0.1.2
nodeenv==1.8.0
oauthlib==3.2.2
opentelemetry-api==1.16.0
opentelemetry-exporter-otlp-proto-http==1.16.0
opentelemetry-proto==1.16.0
opentelemetry-sdk==1.16.0
opentelemetry-semantic-conventions==0.37b0
orjson==3.9.10
packaging==23.2
pathspec==0.12.1
pendulum==2.1.2
platformdirs==4.1.0
pre-commit==3.6.0
prefect==2.14.15
prefect-docker==0.4.3
prefect-shell==0.2.2
prefect-soda-core==0.1.8
prefect_soda_core
prompt-toolkit==3.0.43
protobuf==4.25.2
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycparser==2.21
pydantic==1.10.13
pydantic_core==2.14.6
Pygments==2.17.2
python-dateutil==2.8.2
python-slugify==8.0.1
pytz==2023.3.post1
pytzdata==2020.1
PyYAML==6.0.1
readchar==4.0.5
referencing==0.32.1
regex==2023.12.25
requests==2.31.0
requests-oauthlib==1.3.1
rich==13.7.0
rpds-py==0.17.1
rsa==4.9
ruamel.yaml==0.17.40
ruamel.yaml.clib==0.2.8
six==1.16.0
sniffio==1.3.0
soda==0.0.1
soda-core==3.1.3
soda-core-duckdb==3.1.3
SQLAlchemy==2.0.25
sqlparse==0.4.4
starlette==0.32.0.post1
tabulate==0.9.0
text-unidecode==1.3
toml==0.10.2
typer==0.9.0
typing_extensions==4.9.0
tzlocal==5.2
ujson==5.9.0
urllib3==1.26.18
uvicorn==0.26.0
virtualenv==20.25.0
wcwidth==0.2.13
websocket-client==1.7.0
websockets==12.0
wrapt==1.16.0
This is my
requirements.txt
k
I don't see you specifying the image anywhere in your deployment. which image is running when you run your deployment?
d
default prefect image
k
so, the default prefect image isn't going to have any of your dependencies from your requirements file. You can build a docker image on top of the base prefect image and install your requirements, then pass the name of the image to your deployment
one moment and I'll have an example!
d
but it installs them through the pull step no?
k
hmmm, yeah, if it can find your requirements file
d
yeah i can see it in the logs that they are getting installed
k
🤔 hmmm
d
i find it weird because the official package is
prefect-soda-core
but it complains about
prefect_soda_core
not sure if there is really a difference between them?
k
the one with the hyphens is the package name, but the actual imported module has underscores because hyphens aren't allowed
they're both the same thing
d
is it possible to add this package to the
job_variables
?
like how you can add
EXTRA_PIP_PACKAGES
I might have found something, i added to my workpool the EXTRA_PIP_PACKAGES to the environment variables block and it seems that from there it can install the package
So my requirements are installed but not in the docker image apparently
k
weird. I've always preferred building dependencies into the image since then they don't have to install every time a flow run happens, so I'm not too familiar with managing packages at execution time
d
yeah it's weird indeed i'll keep looking into this but what you would suggest is a custom docker image hosted on dockerhub?
k
well, if you're just running the worker locally, you don't even need to push the image anywhere. build it locally, and the docker worker can access it
but yeah if image builds and flow runs are happening in different places, you can host the image somewhere, be it dockerhub or the many other image registry services
d
Okay thanks for your advice and your time 😁
👍 1
n
Hi david, if possible can you please share scheduler.py file. Thanks.