Hi I am having trouble registering a flow that use...
# ask-community
p
Hi I am having trouble registering a flow that used to work and can't figure out what is going on. I am using a different laptop today. Can you tell what may be going on from the output in the thread?
Copy code
prefect register --project radar -p radar/flows/li_company_flow.py --label prod
Collecting flows...
Processing 'radar/flows/li_company_flow.py':
  Building `Docker` storage...

    Error building storage:
      Traceback (most recent call last):
        File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/cli/build_register.py", line 463, in build_and_register
    storage.build()
        File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 308, in build
    self._build_image(push=push)
        File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 339, in _build_image
    dockerfile_path = self.create_dockerfile_object(directory=tempdir)
        File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 474, in create_dockerfile_object
    f.write(flow_to_bytes_pickle(self._flows[flow_name]))
        File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/utilities/storage.py", line 177, in flow_to_bytes_pickle
    cloudpickle.dumps(flow, protocol=4), newline=False
        File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
        File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
        File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 316, in _file_reduce
    raise pickle.PicklingError(
      _pickle.PicklingError: Cannot pickle files that are not opened for reading: a

  Registering 'li_get_company_data'... Error
================== 0 registered, 1 errored ==================
a
It looks like some files that you include in your Docker storage cannot be pickled. This documentation page provides a nice walkthrough that helps debug flow serialization problems with Docker storage https://docs.prefect.io/core/advanced_tutorials/local-debugging.html#locally-check-your-flow-s-docker-storage
p
I am trying to follow that but for some reason the docker storage does not include the flow. The storage object is defined as follows:
Copy code
storage = Docker(
    registry_url=os.getenv("ECR_REPO_URL"),
    image_name="prefect-ecs-prod",
    python_dependencies=[
        "requests-cache~=0.8.1",
        "tenacity~=8.0.1",
        "ratelimiter~=1.2.0",
        "loguru~=0.5.3",
        "snowflake-connector-python'>=1.8.2,<2.5'",
    ],
    files={
        REPO_BASE_DIR / "radar/radar/linkedinapi.py": "/modules/radar/linkedinapi.py",
        REPO_BASE_DIR
        / "radar/radar/retry_strategy.py": "/modules/radar/retry_strategy.py",
    },
    env_vars={"PYTHONPATH": "$PYTHONPATH:/modules/"},
)
I assign it to
Flow(storage=storage)
in the context manager block. When I build it with:
Copy code
built_storage = flow.storage.build(push=False)
    print(f"{built_storage.flows=}")
I see this:
Copy code
[2021-12-22 14:31:45] INFO - prefect.Docker | Building the flow's Docker storage...
Step 1/10 : FROM prefecthq/prefect:0.15.10-python3.8
 ---> 330a9f90a2be
Step 2/10 : ENV PYTHONPATH='$PYTHONPATH:/modules/'     PREFECT__USER_CONFIG_PATH='/opt/prefect/config.toml'
 ---> Using cache
 ---> ec1dc84f2d16
Step 3/10 : RUN pip install pip --upgrade
 ---> Using cache
 ---> 80ecdf097086
Step 4/10 : RUN pip show prefect || pip install git+<https://github.com/PrefectHQ/prefect.git@0.15.10#egg=prefect[all_orchestration_extras]>
 ---> Using cache
 ---> 7d8c675fab87
Step 5/10 : RUN pip install requests-cache~=0.8.1 tenacity~=8.0.1 ratelimiter~=1.2.0 loguru~=0.5.3 snowflake-connector-python'>=1.8.2,<2.5' wheel
 ---> Using cache
 ---> b4e1830a52ed
Step 6/10 : RUN mkdir -p /opt/prefect/
 ---> Using cache
 ---> ae7547e9965d
Step 7/10 : COPY healthcheck.py /opt/prefect/healthcheck.py
 ---> Using cache
 ---> 303931044515
Step 8/10 : COPY linkedinapi.py /modules/radar/linkedinapi.py
 ---> Using cache
 ---> 40f2fa9107aa
Step 9/10 : COPY retry_strategy.py /modules/radar/retry_strategy.py
 ---> Using cache
 ---> ae26b25c34d8
Step 10/10 : RUN python /opt/prefect/healthcheck.py '[]' '(3, 8)'
 ---> Using cache
 ---> 32e27568f61d
Successfully built 32e27568f61d
Successfully tagged <account>.<http://dkr.ecr.us-east-1.amazonaws.com/prefect-ecs-prod:2021-12-22t20-31-42-592614-00-00|dkr.ecr.us-east-1.amazonaws.com/prefect-ecs-prod:2021-12-22t20-31-42-592614-00-00>
built_storage.flows={}
I have another flow very similar to this one that is registering fine.
a
Does this work?
Copy code
files={
        REPO_BASE_DIR / "radar/radar/linkedinapi.py": "/modules/radar/linkedinapi.py",
        REPO_BASE_DIR
        / "radar/radar/retry_strategy.py": "/modules/radar/retry_strategy.py",
    }
it should rather be:
Copy code
files={f"{REPO_BASE_DIR}/radar/radar/linkedinapi.py": "/modules/radar/linkedinapi.py",
        f"{REPO_BASE_DIR}/radar/radar/retry_strategy.py": "/modules/radar/retry_strategy.py",
    }
but based on your output it looks good!
Copy code
Successfully built 32e27568f61d
Successfully tagged <account>.<http://dkr.ecr.us-east-1.amazonaws.com/prefect-ecs-prod:2021-12-22t20-31-42-592614-00-00|dkr.ecr.us-east-1.amazonaws.com/prefect-ecs-prod:2021-12-22t20-31-42-592614-00-00>
to add your flow to storage, you can add this line:
Copy code
storage.add_flow(flow)
and then:
Copy code
built_storage = flow.storage.build(push=False)
print(f"{built_storage.flows=}")
p
REPO_BASE_DIR
is a
Path
object. When I add the flow, I get the error I was getting before. I have to go now but will check this later. Thanks for your help!
Copy code
Traceback (most recent call last):
  File "/home/pedro/clients/thrive/python_jobs/radar/radar/flows/li_company_flow.py", line 226, in <module>
    built_storage = flow.storage.build(push=False)
  File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 308, in build
    self._build_image(push=push)
  File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 339, in _build_image
    dockerfile_path = self.create_dockerfile_object(directory=tempdir)
  File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 474, in create_dockerfile_object
    f.write(flow_to_bytes_pickle(self._flows[flow_name]))
  File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/utilities/storage.py", line 177, in flow_to_bytes_pickle
    cloudpickle.dumps(flow, protocol=4), newline=False
  File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
  File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 316, in _file_reduce
    raise pickle.PicklingError(
_pickle.PicklingError: Cannot pickle files that are not opened for reading: a
a
then this Path object is a likely culprit for the serialization issue because Prefect expects this as a string. You’re very welcome!
p
I'll give it a try when I get back. It's interesting that it works on another flow. Thanks again.
🙌 1
I found the error. I was importing a
logger
object from
loguru
at the top of the python file where the flow is defined. I got it to work by moving the import inside of the task. Is there a better approach?
a
this is a good approach because this way you are not required to have this package in the environment w´from which you register. The alternative is to install this package on the machine from which you register and build the image. Or you can use script storage with Docker storage (
stored_as_script=True
).