< Marvin> I have a custom python wheel called `mywheel` that Prefect Community #ask-marvin

<@ULVA73B9P> I have a custom python wheel called `...

Oliver Tedder

12/05/2024, 3:00 PM

@Marvin I have a custom python wheel called

mywheel

that contains the code i want to run. The wheel has a module

mymod

in it that i want to run. The wheel is published to artifactory/jfrog. I have a prefect worker on a computer that has pip set up so that it is able to pip install the package from the artifactory. • for example, i can do

pip install mywheel

in a terminal on that computer and it can successfully retrieve and install the wheel • I can then run the code with

python -m mywheel.mymod

How can i make a prefect deployment that will: • run on that worker • pull the wheel from artifactory and run the

mymod

module

Marvin

12/05/2024, 3:01 PM

To create a Prefect deployment that runs a custom Python wheel from Artifactory on a specified worker, you need to set up a workflow that handles the pip installation and execution steps. Based on your description, I assume you are familiar with Prefect 2.x or 3.x, so let me provide a general approach. First, I'll verify the steps using relevant resources to ensure that they align with the latest practices in Prefect development. I'm going to search the Prefect 2.x and 3.x documentation to see the most effective way to handle external package installations and deployments. From the Prefect 2.x and 3.x documentation, here's a simplified approach you can take to set up a deployment that will pull your custom Python wheel from Artifactory and run it on a specified worker: 1. Create a Prefect Flow: First, you'll need a Prefect flow that wraps your task logic. Define a Python function that executes your

mymod

module after ensuring the wheel is installed. 2. Write the Logic to Install and Run the Module: In your flow definition, include logic to invoke the Artifactory

pip install

command. Here's a pseudocode example to get you started:

Copy code

python
   from prefect import flow
   import subprocess

   @flow
   def run_my_wheel():
       # Install the wheel from Artifactory
       subprocess.run(["pip", "install", "mywheel"])

       # Run the module
       subprocess.run(["python", "-m", "mywheel.mymod"])

3. Deploy the Flow: Create a deployment that targets your specific worker by setting its configuration appropriately in the deployment script or UI. Specify the work pool that the deployment should use. This allows the worker to pull the flow and execute it. 4. Set up Work Pools and Workers: Make sure your work pool is pointed to the machine where

pip install

successfully works, as you have described. 5. Run the Deployment: Use the Prefect CLI or Prefect UI to start your deployment. Once started, it will trigger the worker, which will then execute the flow, installing the wheel and running the module. Here's a couple of useful links to Prefect's documentation that might help you with additional setup details: - Prefect 2.x Work Pools: Work Pools Documentation - Prefect 3.x Installation and Setup: Installation Documentation If you find this setup a bit mundane, do not worry; it's a great way to automate your deployments, so it's worth the effort! If you need any further assistance or a debugging sidekick, I'm here to help.

Oliver Tedder

12/05/2024, 3:10 PM

mywheel.mymod

inclues prefect flows and tasks, will these be correctly run if I run the process externally like that?

Oliver Tedder

12/05/2024, 4:21 PM

@Marvin I am deploying using a gitlab repo instead. I am running this python deployment script

Copy code

gitlab_repo = GitRepository(
        url="<https://gitlab.com/mycompany/myrepo.git>",
        branch="mybranch",
        include_submodules=True
    )
    flow.from_source(
        source=gitlab_repo,
        entrypoint="myrepo/myfun.py:myfunc",
    ).deploy(
        name="mydeployment",
        work_pool_name="myworkpool",
    )

However i get this error:

Copy code

(venv) D:\Git\myrepo>python prefect_deploy.py
Traceback (most recent call last):
  File "D:\Git\myrepo\prefect_deploy.py", line 20, in <module>        
    flow.from_source(
  File "D:\Git\myrepo\venv\Lib\site-packages\prefect\utilities\asyncutils.py", line 399, in coroutine_wrapper
    return run_coro_as_sync(ctx_call())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Git\myrepo\venv\Lib\site-packages\prefect\utilities\asyncutils.py", line 243, in run_coro_as_sync
    return call.result()
           ^^^^^^^^^^^^^
  File "D:\Git\myrepo\venv\Lib\site-packages\prefect\_internal\concurrency\calls.py", line 312, in result
    return self.future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Git\myrepo\venv\Lib\site-packages\prefect\_internal\concurrency\calls.py", line 182, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "D:\Git\myrepo\venv\Lib\site-packages\prefect\_internal\concurrency\calls.py", line 383, in _run_async
    result = await coro
             ^^^^^^^^^^
  File "D:\Git\myrepo\venv\Lib\site-packages\prefect\utilities\asyncutils.py", line 225, in coroutine_wrapper
    return await task
           ^^^^^^^^^^
  File "D:\Git\myrepo\venv\Lib\site-packages\prefect\utilities\asyncutils.py", line 389, in ctx_call
    result = await async_fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Git\myrepo\venv\Lib\site-packages\prefect\flows.py", line 1035, in from_source
    with tempfile.TemporaryDirectory() as tmpdir:
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\tempfile.py", line 943, in __exit__
    self.cleanup()
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\tempfile.py", line 947, in cleanup
    self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\tempfile.py", line 929, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\shutil.py", line 787, in rmtree
    return _rmtree_unsafe(path, onerror)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\shutil.py", line 629, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\shutil.py", line 629, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\shutil.py", line 629, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  [Previous line repeated 2 more times]
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\shutil.py", line 638, in _rmtree_unsafe
    onerror(os.rmdir, path, sys.exc_info())
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\shutil.py", line 636, in _rmtree_unsafe
    os.rmdir(path)
OSError: [WinError 145] The directory is not empty: 'C:\\Users\\OLIVER~1.TED\\AppData\\Local\\Temp\\tmp8ocl5mq4\\mybranch\\.git\\logs\\refs\\heads'

what is going on here?

Marvin

12/05/2024, 4:21 PM

Your message was too long by 891 tokens - please shorten it and try again. For reference, here's your message at the allowed limit:

I am deploying using a gitlab repo instead.

I am running this python deployment script

Copy code

gitlab_repo = GitRepository(
        url="<https://gitlab.com/mycompany/myrepo.git>",
        branch="mybranch",
        include_submodules=True
    )
    flow.from_source(
        source=gitlab_repo,
        entrypoint="myrepo/myfun.py:myfunc",
    ).deploy(
        name="mydeployment",
        work_pool_name="myworkpool",
    )

However i get this error: ```(venv) D:\Git\myrepo>python prefect_deploy.py Traceback (most recent call last): File "D:\Git\myrepo\prefect_deploy.py", line 20, in <module> flow.from_source( File "D:\Git\myrepo\venv\Lib\site-packages\prefect\utilities\asyncutils.py", line 399, in coroutine_wrapper return run_coro_as_sync(ctx_call()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Git\myrepo\venv\Lib\site-packages\prefect\utilities\asyncutils.py", line 243, in run_coro_as_sync return call.result() ^^^^^^^^^^^^^ File "D:\Git\myrepo\venv\Lib\site-packages\prefect\_internal\concurrency\

Marvin

12/05/2024, 4:22 PM

Your message was too long by 157 tokens - please shorten it and try again. For reference, here's your message at the allowed limit:

I am deploying using a gitlab repo instead.

I am running this python deployment script

Copy code

gitlab_repo = GitRepository(
        url="<https://gitlab.com/mycompany/myrepo.git>",
        branch="mybranch",
        include_submodules=True
    )
    flow.from_source(
        source=gitlab_repo,
        entrypoint="myrepo/myfun.py:myfunc",
    ).deploy(
        name="mydeployment",
        work_pool_name="myworkpool",
    )

However i get this error: ```(venv) D:\Git\myrepo>python prefect_deploy.py Traceback (most recent call last): File "D:\Git\myrepo\prefect_deploy.py", line 20, in <module> flow.from_source( File "D:\Git\myrepo\venv\Lib\site-packages\prefect\utilities\asyncutils.py", line 399, in coroutine_wrapper return run_coro_as_sync(ctx_call()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .... File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\shutil.py", line 629, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) [Previous line repeated 2 more times] File "C:\Users\Oliver.Ted

Oliver Tedder

12/05/2024, 4:23 PM

Oliver Tedder [4:22 PM] @Marvin I am deploying using a gitlab repo instead. I am running this python deployment script

Copy code

gitlab_repo = GitRepository(
        url="<https://gitlab.com/mycompany/myrepo.git>",
        branch="mybranch",
        include_submodules=True
    )
    flow.from_source(
        source=gitlab_repo,
        entrypoint="myrepo/myfun.py:myfunc",
    ).deploy(
        name="mydeployment",
        work_pool_name="myworkpool",
    )

However i get this error:

Copy code

...
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\shutil.py", line 638, in _rmtree_unsafe
    onerror(os.rmdir, path, sys.exc_info())
  File "C:\Users\Oliver.Tedder\.pyenv\pyenv-win\versions\3.11.9\Lib\shutil.py", line 636, in _rmtree_unsafe
    os.rmdir(path)
OSError: [WinError 145] The directory is not empty: 'C:\\Users\\OLIVER~1.TED\\AppData\\Local\\Temp\\tmp8ocl5mq4\\mybranch\\.git\\logs\\refs\\heads'

what is going on here?

Marvin

12/05/2024, 4:23 PM

The error you're encountering indicates a problem with the cleanup process during the deployment, specifically when attempting to remove a temporary Git directory. This typically happens on Windows due to files being locked or permissions issues. Here are some steps you can take to resolve this: 1. Antivirus or System Protection: Ensure that any antivirus software, system protection, or similar applications aren't locking files at this path. These applications can sometimes interfere with file operations. 2. File Locking Issues: On Windows, files might remain locked from previous operations. Restarting your computer might help to release any locks. 3. Permissions Issue: Check that your user has the necessary permissions to delete files and directories in the specified path. Running the script or Prefect worker with administrative privileges might help. 4. Temporary Directory Usage: If this is a recurring issue, consider specifying a different temporary directory for Git operations that might not have these restrictions or cleanup issues. 5. Update Git and Python: Ensure you're using the latest version of Git, Python, and Prefect. Sometimes these tools include bug fixes for such issues. 6. Git Configuration: Check if there's any specific Git configuration that might be preventing directory deletion. Sometimes .gitignore or similar files can include rules that affect what happens to these directories. If none of these solutions work, you might want to consider searching for similar issues on GitHub to see if others have experienced the same problem. If you continue to face issues, you might want to open a detailed issue on GitHub for further help. Let me know if you want me to search for any known issues related to your situation!

Oliver Tedder

12/05/2024, 4:25 PM

How do i do step 4?

Oliver Tedder

12/05/2024, 4:39 PM

@Marvin how can o do step 4 - specifying a different temporary directory?

Marvin

12/05/2024, 4:39 PM

To specify a different temporary directory for operations such as Git cloning during deployment, you'll usually need to modify the environment variables or configuration settings used by your Python environment or Git command. Here's how you can approach this: 1. Set the
TMPDIR
Environment Variable: This variable tells the system where to store temporary files. You can set it to a different path where you have full permissions. - On Windows: You can set it temporarily within your script or terminal session:

Copy code

python
     import os
     import tempfile

     # Choose a custom temporary directory path
     custom_temp_dir = "C:\\path_to_custom_temp\\"
     os.environ['TMPDIR'] = custom_temp_dir
     tempfile.tempdir = custom_temp_dir  # Also set it for the tempfile module

- Alternatively, set it globally via the System Properties: 1. Search for "Environment Variables" in the Windows Start menu. 2. In the "System Properties" window, click on "Environment Variables...". 3. Create a new variable named

TMPDIR

and set it to your desired path. 2. Git Configuration for Temporary Directory: You might also consider configuring Git to handle temporary files differently, although changing

TMPDIR

will usually cover this. Remember to ensure that the directory you're setting as the new temporary directory has appropriate write permissions for your working user. This change should redirect temporary file handling to the new path you've specified, potentially resolving issues related to file locks or permission errors. If things still go awry, let me know, and we can continue delving into this mystery together!

Oliver Tedder

12/05/2024, 4:41 PM

@Marvin that worked, it uses a different temporary directory. However the original error still occurs. Looking at the temporary directory it made, it still contains all of the repo that it checked out. It seems there is a "delete everything in the temp folder" step that isnot being run?

Marvin

12/05/2024, 4:42 PM

It seems there was a hiccup with accessing the GitHub API to check for known issues, so let's think through other potential options and solutions you can try: 1. Manual Cleanup: As a workaround, you might implement a manual cleanup step within your script to ensure the temporary directory is cleared after deployment. Remember to handle any exceptions where the directory might not be empty due to locks.

Copy code

python
   import shutil

   def cleanup_temp_directory(path):
       try:
           shutil.rmtree(path)
       except Exception as e:
           print(f"Failed to delete directory {path}. Reason: {e}")

   # Usage
   cleanup_temp_directory(custom_temp_dir)

2. Investigate Underlying Git Behavior: Ensure that there's no process hanging onto files in that directory after the deployment step is completed. Tools like Sysinternals' Process Explorer can help identify any processes holding file locks. 3. Check Prefect Configuration: Double-check Prefect’s configurations or any custom scripts you employ to see if there's supposed to be a step that cleans up these directories that isn't executing correctly due to either missed configuration or error handling. 4. Error Handling Improvements: Add error handling around sections that could result in this

OSError

to potentially log more detailed information about what's preventing the deletion. 5. Scheduled Task for Cleanup: As a last resort, you might consider setting up a scheduled cleanup task using a script or task scheduler to periodically clear these directories until a more permanent solution is found. If the issue persists, another step might be reaching out for support on Prefect's forums or considering creating a GitHub issue to highlight and resolve this behavior. Feel free to reach out again if you have more questions or need help on another aspect of this issue!

Oliver Tedder

12/05/2024, 5:34 PM

@Marvin silly problem, it was because the branch name i was checking out was too long which resulted in a file having a very long name which os.unlink wasn't able to remove, hence everything failed My neext error is this:

Copy code

prefect.exceptions.PrefectHTTPStatusError: Client error '422 Unprocessable Entity' for url '<http://192.168.10.8:4200/api/flows/>'  
Response: {'exception_message': 'Invalid request received.', 'exception_detail': [{'type': 'extra_forbidden', 'loc': ['body', 'labels'], 'msg': 'Extra inputs are not permitted', 'input': {}}], 'request_body': {'name': 'run-prefect', 'tags': [], 'labels': {}}} 
For more information check: <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/422>

what does that mean?

Marvin

12/05/2024, 5:34 PM

This error message indicates that there is an issue with the request body being sent to the Prefect API when you're trying to deploy your flow. Specifically, it highlights that "extra inputs are not permitted" and points to the

labels

in the request body as causing the issue. Here's a breakdown of what might be going wrong and how you could address it: 1. Labels and Inputs: The error suggests that there is an unexpected "labels" attribute in your deployment configuration. In Prefect, labels may have been deprecated, replaced by tags or other methods, depending on the version you’re using. Since it indicates "extra inputs," Prefect is rejecting the inclusion of the labels field. 2. Check Deployment Configuration: Review your deployment script or settings to make sure you're not including fields, like "labels," that are no longer supported due to version updates or changes. Prefect's API might now expect tags or other configuration options instead of labels. 3. Modify or Update Code: Attempt to remove or modify the line that involves

labels

. Ensure your deployment configuration uses elements compatible with Prefect's accepted parameters:

Copy code

python
   # Example, remove 'labels' and keep 'tags' if needed
   .deploy(
       name="mydeployment",
       work_pool_name="myworkpool",
       tags=[],
       # Remove or update labels if necessary
   )

4. Documentation and Release Notes: Check the Prefect documentation and latest release notes for any changes in deployment configuration that might have caused this problem. Solving this error involves aligning the deployment parameters with Prefect’s current expectations for allowed and disallowed inputs. Should you need more information about modifications in Prefect's configuration fields, consulting the documentation or release notes is an excellent next step. If you have questions or need more help, do let me know!

Oliver Tedder

12/05/2024, 5:37 PM

@Marvin my deployment is this

Copy code

gitlab_repo = GitRepository(
        url="<https://gitlab.com/mycompany/myrepo.git>",
        branch="mybranch",
        include_submodules=True
    )
    flow.from_source(
        source=gitlab_repo,
        entrypoint="myrepo/myfun.py:myfunc",
    ).deploy(
        name="mydeployment",
        work_pool_name="myworkpool",
        tags=[],
    )

no mention of labels. I added the

tags=[]

param just incase but still the same error

Marvin

12/05/2024, 5:38 PM

Given that your deployment script does not explicitly specify

labels

, yet you're encountering an error related to

labels

, it might be one of the following: 1. Implicit Configuration: There might be an implicit configuration or default settings somewhere in your environment, Prefect server configuration, or within the API client that's adding labels or expecting certain formats. 2. Prefect Version Mismatch: Ensure that the version of your Prefect library is compatible with the server you are trying to communicate with. There may have been changes in how deployments are expected to be configured that could cause such errors. 3. Server-Side Configuration: Check if there are any server-side configurations or webhooks that might interact with the deployment payload, inserting unnecessary fields. 4. Environment Variables: Sometimes environment variables can set default values that might not be visible in your code. 5. Custom Modifications: If there are any custom scripts or middleware between your deployment script and Prefect API that could modify the request body, check and ensure they are configured properly. Here's a plan to diagnose and potentially resolve the issue: - Verify Versions: Ensure both the Prefect library and server are up-to-date and compatible. - Check for Implicit Defaults: Check if there are implicit default configurations either on the client-side or server-side that automatically include

labels

. - Server Logs and Configuration: Look into Prefect server logs for any insights into why this error is occurring. If you control the server, see if there are any configurations that might add or expect certain fields. If none of these steps work, you might consider escalating the issue on Prefect’s forums, where more technical insights specific to their system can be offered. Or, contact the support team if you're using a managed Prefect service for guidance tailored to your setup. Keep up the resilience; debugging is part art, part science!

Oliver Tedder

12/06/2024, 11:42 AM

@Marvin my original problem involving prefect not being able to remove files was resolved by enabling long path support on windows https://docs.python.org/3/using/windows.html#removing-the-max-path-limitation

Marvin

12/06/2024, 11:42 AM

Ah, enabling long path support on Windows can indeed solve such pesky file operation issues! Windows has traditionally had a 260-character path limit, which can cause trouble when dealing with deep directory structures or long filenames. By enabling long path support, you've sidestepped this limitation. If only solving all problems was just about flipping a switch, right? Thanks for sharing the solution you found! This could be quite helpful for others facing similar issues. If you have more questions or run into other roadblocks, feel free to ask.

13 Views

Open in Slack

Previous Next