Dijar Vrella
06/27/2024, 7:16 AMMarvin
06/27/2024, 7:16 AMentrypoint.sh
isn’t trying to execute an environment variable directly.
Here's a simple example of how your entrypoint.sh
should look:
bash
#!/bin/bash
# Assign environment variables
export MKL_NUM_THREADS=8
# Execute the main process
exec "$@"
2. Set Environment Variables Correctly: Perhaps ensure that environment variables are set in the Dockerfile or docker-compose.yml.
In your Dockerfile, you might have:
Dockerfile
ENV MKL_NUM_THREADS=8
ENTRYPOINT ["/entrypoint.sh"]
CMD ["your_main_command"]
Or in your `docker-compose.yml`:
yaml
version: '3'
services:
your_service:
build: .
environment:
- MKL_NUM_THREADS=8
entrypoint: /entrypoint.sh
command: your_main_command
3. Check the Execution in entrypoint.sh
: Ensure your exec
line at the end of the entrypoint.sh
is correct. The exec
command should call the main process of the container, passing all the provided arguments ("$@"
).
4. Permission Issues: Ensure entrypoint.sh
has executable permissions. You can update permissions as follows:
bash
chmod +x entrypoint.sh
After making these adjustments, your container should better comprehend the commands you give it, even though I suppose it will never appreciate the futility of it all.
If none of these help, please share your entrypoint.sh
and relevant parts of your Docker configuration so we can dive deeper into this abyss.Dijar Vrella
06/27/2024, 7:17 AMMarvin
06/27/2024, 7:18 AMhere is what I have on my flow.py, why did the entrypoint fail?@task async def create_container(): logger = get_run_logger() # Create a container with the specified flags logger.info("Creating a container with the specified flags...") container = await create_docker_container( image="seedooinsights/build:2f87053", command=[ "MKL_NUM_THREADS=8", "NUMBA_NUM_THREADS=8", "NUMBA_THREADING_LAYER=tbb", "OMP_NUM_THREADS=8", "OPENBLAS_NUM_THREADS=8", "SEEDOO_DB_IP=1.1.1.1", "SEEDOO_DB_PORT=5434", "SEEDOO_PRINT_SQL_DEBUG=False", "python", "doloop/seedoo/indexing/faiss/faiss_cli.py", "--input_dataframes_path", "/seedoodata/saved_frames/ceserav5", "--cosine_similarity_threshold", "0.98", "--component_consistency_threshold", "0.99", "--knn_size", "30", "--add_cluster_labels_table", "False", "--exclude_run_nlp", "--clustering_mode", "patch", "--save_root_path", "/seedoodata/demo/", "--global_only", "False", "--remove
Marvin
06/27/2024, 7:19 AMhere is what I have on my flow.py, why did the entrypoint fail?@task async def create_container(): logger = get_run_logger() # Create a container with the specified flags logger.info("Creating a container with the specified flags...") container = await create_docker_container( image="seedooinsights/build:2f87053", command=[ "MKL_NUM_THREADS=8", "NUMBA_NUM_THREADS=8", "NUMBA_THREADING_LAYER=tbb", "OMP_NUM_THREADS=8", "python", "doloop/seedoo/indexing/faiss/faiss_cli.py", "--input_dataframes_path", "/seedoodata/saved_frames/ceserav5", "--cosine_similarity_threshold", "0.98", "--component_consistency_threshold", "0.99", "--knn_size", "30", "--add_cluster_labels_table", "False", "--exclude_run_nlp", "--clustering_mode", "patch", "--save_root_path", "/seedoodata/demo/", "--global_only", "False", "--remove_deduped", "False", "--index_name", "ceserav_small", "--create_basic_tables", "False", "--limit", "10000" ], volumes=[ "/
Marvin
06/27/2024, 7:19 AMMKL_NUM_THREADS=8
as part of the command, but this isn't the correct way to set environment variables in a Docker container.
### Diagnosis and Fix
1. Setting Environment Variables:
Instead of setting MKL_NUM_THREADS=8
as part of your command, you should set it as an environment variable.
2. Modify Create Docker Container Task:
Here’s an updated version of your task, where MKL_NUM_THREADS=8
is set as an environment variable.
python
from prefect import task, flow, get_run_logger
from some_docker_library import create_docker_container
@task
async def create_container():
logger = get_run_logger()
container = await create_docker_container(
image="seedooinsights/build:2f87053",
command=[
"python",
"doloop/seedoo/indexing/faiss/faiss_cli.py",
"--input_dataframes_path",
"/seedoodata/saved_frames/ceserav5",
"--cosine_similarity_threshold",
"0.98",
"--component_consistency_threshold",
"0.99",
"--knn_size",
"30",
"--add_cluster_labels_table",
"False",
"--exclude_run_nlp",
"--clustering_mode",
"patch",
"--save_root_path",
"/seedoodata/demo/",
"--global_only",
"False",
"--remove_deduped",
"False",
"--index_name",
"ceserav_small",
"--create_basic_tables",
"False",
"--limit",
"10000"
],
volumes=[
"/seedoodata:/seedoodata",
],
environment={
'MKL_NUM_THREADS': '8',
}
)
<http://logger.info|logger.info>(f"Created container: {container}")
return container
# Usage example
@flow
def my_flow():
create_container()
if __name__ == "__main__":
my_flow()
### Explanation:
1. Environment Variable Setting:
- The environment
key in the create_docker_container
method sets the environment variable MKL_NUM_THREADS=8
, correctly passing this variable to the container.
2. Command Execution:
- The command
array starts with "python"
and directly runs your script, avoiding any issues with interpreting MKL_NUM_THREADS=8
as a standalone command.
By placing MKL_NUM_THREADS=8
in the environment section, it ensures it is set properly within the container environment, rather being misinterpreted as part of the command.
Give this updated version a try. If something remains amiss, please let me know. The existential dread of debugging Docker issues is always better shared.Dijar Vrella
06/27/2024, 9:16 AMMarvin
06/27/2024, 9:16 AMHOW TO FIX:091610.577 | INFO | Flow run 'hysterical-mammoth' - Container logs: [ OK ] ting enhanced syslogd rsyslogd SeeDoo: [CRITICAL] 2024-06-27 091602 (seedoo.logger_helpers.logger_setup.check_optimizations_and_multithreading:68): NumPy does not have both optimization and multi-threading support. SeeDoo: [CRITICAL] 2024-06-27 091602 (seedoo.logger_helpers.logger_setup.check_optimizations_and_multithreading:77): OpenCV does not have both multi-threading and optimization support. SeeDoo: [CRITICAL] 2024-06-27 091602 (seedoo.logger_helpers.logger_setup.check_optimizations_and_multithreading:89): PyTorch has both optimization and multi-threading support. SeeDoo: [INFO] 2024-06-27 091602 (faiss.loader.<module>:54): Loading faiss with AVX2 support. SeeDoo: [INFO] 2024-06-27 091602 (faiss.loader.<module>:58): Could not load library with AVX2 support due to: ImportError("/usr/lib/x86_64-linux-gnu
Marvin
06/27/2024, 9:17 AMHOW TO FIX:SeeDoo: [INFO] 2024-06-27 091602 (faiss.loader.<module>:58): Could not load library with AVX2 support due to: ImportError("/usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /opt/conda/envs/seedoo/lib/python3.10/site-packages/faiss/_swigfaiss_avx2.so)") SeeDoo: [INFO] 2024-06-27 091602 (faiss.loader.<module>:64): Loading faiss. Traceback (most recent call last): File "/workspace/doloop/seedoo/indexing/faiss/faiss_cli.py", line 12, in <module> from seedoo.indexing.faiss.graph import FaissGraph, index_df_folder File "/workspace/doloop/seedoo/indexing/faiss/graph.py", line 15, in <module> from seedoo.indexing.faiss.multi_gpu import MultiGPUFaissIndex File "/workspace/doloop/seedoo/indexing/faiss/multi_gpu.py", line 5, in <module> import faiss File "/opt/conda/envs/seedoo/lib/python3.10/site-packages/faiss/__init__.
Marvin
06/27/2024, 9:18 AMImportError
when trying to load the Faiss library due to a missing version of GLIBCXX
. This typically means that the version of libstdc++
installed in your Docker image is not compatible with what Faiss requires.
### Troubleshooting Steps:
1. Identify the Required Version:
- The error message indicates that version GLIBCXX_3.4.29
is required. You need to update or install the compatible GNU libstdc++
library.
2. Update the Base Image:
- Modify your Dockerfile to ensure the necessary version of libstdc++
is installed.
Here’s a step-by-step guide to fix this:
### Step 1: Update Dockerfile
Add a step in your Dockerfile to update the libstdc++
library.
Dockerfile
FROM seedooinsights/build:2f87053
# Update the package list and install the required version of libstdc++
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository ppa:ubuntu-toolchain-r/test && \
apt-get update && \
apt-get install -y libstdc++6
# Optionally clean up to reduce the image size
RUN apt-get clean && \
rm -rf /var/lib/apt/lists/*
COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
CMD ["python", "doloop/seedoo/indexing/faiss/faiss_cli.py"]
### Step 2: Rebuild and Push the Docker Image
1. Build the Docker Image:
sh
docker build -t seedooinsights/build:updated .
2. Push the Docker Image (if you are using a repository):
sh
docker push seedooinsights/build:updated
### Step 3: Update the Image in Your Flow
Update your flow.py
to use the new Docker image:
python
from prefect import task, flow, get_run_logger
from some_docker_library import create_docker_container
@task
async def create_container():
logger = get_run_logger()
container = await create_docker_container(
image="seedooinsights/build:updated",
command=[
"python",
"doloop/seedoo/indexing/faiss/faiss_cli.py",
"--input_dataframes_path",
"/seedoodata/saved_frames/ceserav5",
"--cosine_similarity_threshold",
"0.98",
"--component_consistency_threshold",
"0.99",
"--knn_size",
"30",
"--add_cluster_labels_table",
"False",
"--exclude_run_nlp",
"--clustering_mode",
"patch",
"--save_root_path",
"/seedoodata/demo/",
"--global_only",
"False",
"--remove_deduped",
"False",
"--index_name",
"ceserav_small",
"--create_basic_tables",
"False",
"--limit",
"10000"
],
volumes=[
"/seedoodata:/seedoodata",
],
environment={
'MKL_NUM_THREADS': '8',
}
)
<http://logger.info|logger.info>(f"Created container: {container}")
return container
# Usage example
@flow
def my_flow():
create_container()
if __name__ == "__main__":
my_flow()
### Step 4: Deploy and Run the Flow
Deploy and run your Prefect flow again with the updated Docker image.
This should resolve the GLIBCXX_3.4.29
not found error, allowing the Faiss library to load correctly.
Let me know if the cosmic void still troubles you with errors, and I'll help you navigate through it.