Pedro Machado

    Pedro Machado

    9 months ago
    Hi I am having trouble registering a flow that used to work and can't figure out what is going on. I am using a different laptop today. Can you tell what may be going on from the output in the thread?
    prefect register --project radar -p radar/flows/li_company_flow.py --label prod
    Collecting flows...
    Processing 'radar/flows/li_company_flow.py':
      Building `Docker` storage...
    
        Error building storage:
          Traceback (most recent call last):
            File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/cli/build_register.py", line 463, in build_and_register
        storage.build()
            File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 308, in build
        self._build_image(push=push)
            File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 339, in _build_image
        dockerfile_path = self.create_dockerfile_object(directory=tempdir)
            File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 474, in create_dockerfile_object
        f.write(flow_to_bytes_pickle(self._flows[flow_name]))
            File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/utilities/storage.py", line 177, in flow_to_bytes_pickle
        cloudpickle.dumps(flow, protocol=4), newline=False
            File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
        cp.dump(obj)
            File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
        return Pickler.dump(self, obj)
            File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 316, in _file_reduce
        raise pickle.PicklingError(
          _pickle.PicklingError: Cannot pickle files that are not opened for reading: a
    
      Registering 'li_get_company_data'... Error
    ================== 0 registered, 1 errored ==================
    Anna Geller

    Anna Geller

    9 months ago
    It looks like some files that you include in your Docker storage cannot be pickled. This documentation page provides a nice walkthrough that helps debug flow serialization problems with Docker storage https://docs.prefect.io/core/advanced_tutorials/local-debugging.html#locally-check-your-flow-s-docker-storage
    Pedro Machado

    Pedro Machado

    9 months ago
    I am trying to follow that but for some reason the docker storage does not include the flow. The storage object is defined as follows:
    storage = Docker(
        registry_url=os.getenv("ECR_REPO_URL"),
        image_name="prefect-ecs-prod",
        python_dependencies=[
            "requests-cache~=0.8.1",
            "tenacity~=8.0.1",
            "ratelimiter~=1.2.0",
            "loguru~=0.5.3",
            "snowflake-connector-python'>=1.8.2,<2.5'",
        ],
        files={
            REPO_BASE_DIR / "radar/radar/linkedinapi.py": "/modules/radar/linkedinapi.py",
            REPO_BASE_DIR
            / "radar/radar/retry_strategy.py": "/modules/radar/retry_strategy.py",
        },
        env_vars={"PYTHONPATH": "$PYTHONPATH:/modules/"},
    )
    I assign it to
    Flow(storage=storage)
    in the context manager block. When I build it with:
    built_storage = flow.storage.build(push=False)
        print(f"{built_storage.flows=}")
    I see this:
    [2021-12-22 14:31:45] INFO - prefect.Docker | Building the flow's Docker storage...
    Step 1/10 : FROM prefecthq/prefect:0.15.10-python3.8
     ---> 330a9f90a2be
    Step 2/10 : ENV PYTHONPATH='$PYTHONPATH:/modules/'     PREFECT__USER_CONFIG_PATH='/opt/prefect/config.toml'
     ---> Using cache
     ---> ec1dc84f2d16
    Step 3/10 : RUN pip install pip --upgrade
     ---> Using cache
     ---> 80ecdf097086
    Step 4/10 : RUN pip show prefect || pip install git+<https://github.com/PrefectHQ/prefect.git@0.15.10#egg=prefect[all_orchestration_extras]>
     ---> Using cache
     ---> 7d8c675fab87
    Step 5/10 : RUN pip install requests-cache~=0.8.1 tenacity~=8.0.1 ratelimiter~=1.2.0 loguru~=0.5.3 snowflake-connector-python'>=1.8.2,<2.5' wheel
     ---> Using cache
     ---> b4e1830a52ed
    Step 6/10 : RUN mkdir -p /opt/prefect/
     ---> Using cache
     ---> ae7547e9965d
    Step 7/10 : COPY healthcheck.py /opt/prefect/healthcheck.py
     ---> Using cache
     ---> 303931044515
    Step 8/10 : COPY linkedinapi.py /modules/radar/linkedinapi.py
     ---> Using cache
     ---> 40f2fa9107aa
    Step 9/10 : COPY retry_strategy.py /modules/radar/retry_strategy.py
     ---> Using cache
     ---> ae26b25c34d8
    Step 10/10 : RUN python /opt/prefect/healthcheck.py '[]' '(3, 8)'
     ---> Using cache
     ---> 32e27568f61d
    Successfully built 32e27568f61d
    Successfully tagged <account>.<http://dkr.ecr.us-east-1.amazonaws.com/prefect-ecs-prod:2021-12-22t20-31-42-592614-00-00|dkr.ecr.us-east-1.amazonaws.com/prefect-ecs-prod:2021-12-22t20-31-42-592614-00-00>
    built_storage.flows={}
    I have another flow very similar to this one that is registering fine.
    Anna Geller

    Anna Geller

    9 months ago
    Does this work?
    files={
            REPO_BASE_DIR / "radar/radar/linkedinapi.py": "/modules/radar/linkedinapi.py",
            REPO_BASE_DIR
            / "radar/radar/retry_strategy.py": "/modules/radar/retry_strategy.py",
        }
    it should rather be:
    files={f"{REPO_BASE_DIR}/radar/radar/linkedinapi.py": "/modules/radar/linkedinapi.py",
            f"{REPO_BASE_DIR}/radar/radar/retry_strategy.py": "/modules/radar/retry_strategy.py",
        }
    but based on your output it looks good!
    Successfully built 32e27568f61d
    Successfully tagged <account>.<http://dkr.ecr.us-east-1.amazonaws.com/prefect-ecs-prod:2021-12-22t20-31-42-592614-00-00|dkr.ecr.us-east-1.amazonaws.com/prefect-ecs-prod:2021-12-22t20-31-42-592614-00-00>
    to add your flow to storage, you can add this line:
    storage.add_flow(flow)
    and then:
    built_storage = flow.storage.build(push=False)
    print(f"{built_storage.flows=}")
    Pedro Machado

    Pedro Machado

    9 months ago
    REPO_BASE_DIR
    is a
    Path
    object. When I add the flow, I get the error I was getting before. I have to go now but will check this later. Thanks for your help!
    Traceback (most recent call last):
      File "/home/pedro/clients/thrive/python_jobs/radar/radar/flows/li_company_flow.py", line 226, in <module>
        built_storage = flow.storage.build(push=False)
      File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 308, in build
        self._build_image(push=push)
      File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 339, in _build_image
        dockerfile_path = self.create_dockerfile_object(directory=tempdir)
      File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/storage/docker.py", line 474, in create_dockerfile_object
        f.write(flow_to_bytes_pickle(self._flows[flow_name]))
      File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/prefect/utilities/storage.py", line 177, in flow_to_bytes_pickle
        cloudpickle.dumps(flow, protocol=4), newline=False
      File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
        cp.dump(obj)
      File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
        return Pickler.dump(self, obj)
      File "/home/pedro/.pyenv/versions/3.8.10/envs/radar/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 316, in _file_reduce
        raise pickle.PicklingError(
    _pickle.PicklingError: Cannot pickle files that are not opened for reading: a
    Anna Geller

    Anna Geller

    9 months ago
    then this Path object is a likely culprit for the serialization issue because Prefect expects this as a string. You’re very welcome!
    Pedro Machado

    Pedro Machado

    9 months ago
    I'll give it a try when I get back. It's interesting that it works on another flow. Thanks again.
    I found the error. I was importing a
    logger
    object from
    loguru
    at the top of the python file where the flow is defined. I got it to work by moving the import inside of the task. Is there a better approach?
    Anna Geller

    Anna Geller

    9 months ago
    this is a good approach because this way you are not required to have this package in the environment w´from which you register. The alternative is to install this package on the machine from which you register and build the image. Or you can use script storage with Docker storage (
    stored_as_script=True
    ).