Hi i have flow on github and prefect agent running on gke an Prefect Community #ask-community

Hi i have flow on github and prefect agent running...

Aqib Fayyaz

01/08/2022, 1:54 PM

Hi i have flow on github and prefect agent running on gke and have docker file for storing all the custom modules which eventually goes to gcr. Things were working fine but now need to install pyspark in docker file. i have included it the same way we were doing it for other docker files (we have already pyspark included in other docker file for the project and that works) but now when i try to include it in current docker file and builds it using cloudbuild the build fails saying

Unable to locate package openjdk-8-jdk

. Is the issue is because of base image, for other docker files where spark run we have ubuntu 20.04 as base image but for prefect we have prefect as base image. Below is the docker file

FROM prefecthq/prefect:0.15.6-python3.8

# for spark

ENV JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

ENV SPARK_HOME="/spark/spark-3.1.2-bin-hadoop3.2/"

ENV PYTHONPATH="/spark/spark-3.1.2-bin-hadoop3.2/python:$PYTHONPATH"

ENV PYSPARK_PYTHON="python3"

ENV PATH="$PATH:/spark/spark-3.1.2-bin-hadoop3.2/bin"

ENV PATH="$PATH:$JAVA_HOME"

ENV PATH="$PATH:$JAVA_HOME/bin"

ENV PATH="$PATH:$JAVA_HOME/jre/bin"

ENV SPARK_LOCAL_IP="127.0.0.1"

WORKDIR /

COPY . /

RUN apt-get update && \

apt-get install -y  \

openjdk-8-jdk  \

python3-pip

ADD <https://downloads.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz> spark.tgz

RUN mkdir -p spark && \

tar -zxvf spark.tgz -C spark/ && \

rm spark.tgz

# for prefect

RUN pip install feast feast-postgres sqlalchemy google-auth scikit-learn

RUN pip install feast[gcp]

RUN pip install --upgrade google-cloud

RUN pip install --upgrade google-cloud-bigquery

RUN pip install --upgrade google-cloud-storage

WORKDIR /opt/prefect

COPY flow_utilities/ /opt/prefect/flow_utilities/

COPY flow_utilities_bigQ_Datastore/ /opt/prefect/flow_utilities_bigQ_Datastore/

COPY setup.py /opt/prefect/setup.py

COPY .feastignore /opt/prefect/.feastignore

RUN pip install .

Anna Geller

01/08/2022, 3:11 PM

Correct, in this case it may be easier to use Ubuntu base image if you really need open jdk there, here is a SO issue with some pointers: https://stackoverflow.com/questions/32942023/ubuntu-openjdk-8-unable-to-locate-package But I'm not sure whether you need it - where is your Spark cluster running? Perhaps you can start a Pyspark job via API (e.g. Databricks cluster) or via ShellTask submitting job to the cluster?

Aqib Fayyaz

01/08/2022, 3:15 PM

ok so i am now sure it is because of using prefect base image because now i changed the base image from prefect to ubuntu i worked, but as far as i remember i cannot use ubuntu as base image becuase in past in DM i was facing issues, the prefect could not get the custom modules and than you told me i have to use prefect as base image and than it worked.

Aqib Fayyaz

01/08/2022, 3:56 PM

@Open AIMP

Aqib Fayyaz

01/08/2022, 4:02 PM

@Anna Geller Inside our container where other custom modules are

Anna Geller

01/08/2022, 4:30 PM

Well it's not that you can't use other base image than PrefectHQ but we recommend those since they are configured to have everything you need.

Open AIMP

01/10/2022, 6:10 AM

@Anna Geller can we use everything in this image to create prefect-image based upon Ubuntu as base image? https://github.com/PrefectHQ/prefect/blob/master/Dockerfile

Anna Geller

01/10/2022, 9:12 AM

sure, you definitely can do that

Aqib Fayyaz

01/10/2022, 9:40 AM

ok when i will use ubuntu as base image now what would be the work directoy ( the same opt/prefect) for storing custom modules, because as far as i remember i had to use opt/prefect otherwise prefect was not able to find the custom modules

Anna Geller

01/10/2022, 9:55 AM

Since you are installing your custom modules as a package, I think the working directory doesn’t matter that much, but if you want, you can set it using the WORKDIR command in a Dockerfile. Prefect will be able to find your modules because of this:

Copy code

COPY setup.py /opt/prefect/setup.py
RUN pip install .

5 Views

Open in Slack

Previous Next