I am running a test flow in docker container with image `pytorch/pytorch`. It fails because prefect...
b
I am running a test flow in docker container with image
pytorch/pytorch
. It fails because prefect is not installed
Copy code
17:43:10.947 | INFO    | prefect.infrastructure.docker-container - Pulling image 'pytorch/pytorch'...
17:43:12.610 | INFO    | prefect.infrastructure.docker-container - Creating Docker container 'prudent-goshawk'...
17:43:12.679 | INFO    | prefect.infrastructure.docker-container - Docker container 'prudent-goshawk' has status 'created'
17:43:12.934 | INFO    | prefect.agent - Completed submission of flow run '64e545de-27ff-4d51-85d6-cec4d8eb0afe'
17:43:12.949 | INFO    | prefect.infrastructure.docker-container - Docker container 'prudent-goshawk' has status 'running'
/opt/conda/bin/python: Error while finding module specification for 'prefect.engine' (ModuleNotFoundError: No module named 'prefect')
Adding prefect to the container block extra pip packages doesnt help:
Copy code
{
  "EXTRA_PIP_PACKAGES": "prefect"
}
1
m
Hey @Boris Tseytlin prefect needs to be included as part of the base image, you could Either A) swap the base image for a prefect image and include pytorch as the extra pip package or B) you can setup a custom image that contains both.
1
b
Here's a helpful Discourse article to get started as well: https://discourse.prefect.io/t/how-can-i-run-my-flow-in-a-docker-container/64
b
@Mason Menges I tried using the prefecthq/prefect image, got this error:
Copy code
/usr/local/bin/python: No module named prefect.engine.__main__; 'prefect.engine' is a package and cannot be directly executed
m
Hmm could you provide an example of how the flow you're running and how/where you're triggering it?
b
Copy code
import os
from prefect import flow, get_run_logger
from prefect.deployments import Deployment
from prefect.blocks.core import Block
from prefect.infrastructure.docker import DockerContainer

docker_container_block = DockerContainer.load("ml-docker")


@flow
def train_model_flow():
    import torch

    logger = get_run_logger()

    <http://logger.info|logger.info>("CUDA Available: %s", torch.cuda.is_available())


deployment = Deployment.build_from_flow(
    flow=train_model_flow,
    name="train_model_flow",
    work_queue_name="test",
    infrastructure=docker_container_block,
)

if __name__ == "__main__":
    deployment.apply()
This is the flow. I am running the deployment from Orion UI
m
Hey @Boris Tseytlin since you're using docker you'll likely want to configure remote storage to ensure the flow can be referenced by the image, by default the flow code isn't going to be included in the docker container, you'll also want to make sure that you have all relevant pip packages set for the container in the infrastructure block. Here's an example from the code you sent over before
Copy code
from prefect import flow, task
import os
from prefect import flow, get_run_logger
from prefect.deployments import Deployment
from prefect.blocks.core import Block
from prefect.filesystems import S3
from prefect.infrastructure.docker import DockerContainer


@flow
def train_model_flow():
    import torch

    logger = get_run_logger()

    <http://logger.info|logger.info>("CUDA Available: %s", torch.cuda.is_available())

if __name__ == "__main__":
    docker_container_block = DockerContainer.load("ml-docker")
    storage = S3.load("awspersonal")

    flow_deployment = Deployment.build_from_flow(
        flow=train_model_flow,
        name="train_model_flow",
        work_queue_name="test",
        infrastructure=docker_container_block,
        storage=storage,
        ignore_file = ".prefectignore"
    )

    flow_deployment.apply()
with these packages on the ml-docker container:
Copy code
{
  "EXTRA_PIP_PACKAGES": "s3fs torch"
}
s3fs is necessary for s3 storage which is what i'm using, this could be different depending on the remote storage option you're using. All that said particularly for production flows it's likely going to be more efficient to build a custom image that contains all of your external dependencies rather that installing them during the run.