Hi all I am running a Shell Task to kill zombie processes ge Prefect Community #ask-community

Hi all! I am running a Shell Task to kill zombie p...

Alexis Lucido

02/14/2022, 10:58 AM

Hi all! I am running a Shell Task to kill zombie processes generated by one of my other tasks. The flow was functioning properly until 2 weeks ago but I cannot figure why it stopped working. Here is the associated flow: kill_geckodriver_task = ShellTask( log_stderr=True, return_all=True, stream_output=True) with Flow('kill_geckodriver', schedule=schedules_prefect[ 'kill_geckodriver']) as kill_geckodriver: kill_geckodriver_task(command='source {}'.format(os.path.join( os.environ.get('BASH_SCRIPTS_FOLDER'), 'kill_geckodriver.sh'))) The bash script is below: __ pkill geckodriver pkill firefox I can run the flow when the bash script only echoes a string, so the bug is not due to the flow or the path passed with an environment variable. I guess the problem lies in the sudo rights needed to run the "pkill" command. I have been trying to replace the current script with the following lines (replacing password with the user password), but with no success so far: __ # export HISTIGNORE='*sudo -S*' # to be added in production to avoid logging passwords echo "<password>" | sudo - S pkill geckodriver echo "<password>" | sudo -S pkill firefox Unfortunately, the flow still raises an error, and I cannot figure why. I have been trying to log it with the "log_stderr=True, return_all=True, stream_output=True" kwargs of the ShellTask but the only logs I have are joined as a screenshot. Any thoughts about it? The problem is probably password-linked, but I cannot seem to find an appropriate solution. Thanks a lot in advance!

Kevin Kho

02/14/2022, 2:53 PM

Hi @Alexis Lucido, I am not sure but looking at the flow, the

source {}.format

to make the command is a bit concerning because this gets evaluated during Flow registration, not during runtime. I don’t know if that’s what you intended, but it would be better if you had a task that returned this instead to defer the execution to runtime. You are right that exit code 1 sounds like a permission issue. Your attempts to get more logs also look right. I don’t know why you aren’t getting more. There is a way to get the whole traceback to show in the logs. I can give an example snippet, but in this case I don’t think it’ll generate anything

Kevin Kho

02/14/2022, 2:53 PM

Copy code

def custom_task(func=None, **task_init_kwargs):
    if func is None:
        return partial(custom_task, **task_init_kwargs)

    @wraps(func)
    def safe_func(**kwargs):
        try:
            return func(**kwargs)
        except Exception as e:
            print(f"Full Traceback: {traceback.format_exc()}")
            raise RuntimeError(type(e)) from None  # from None is necessary to not log the stacktrace

    safe_func.__name__ = func.__name__
    return task(safe_func, **task_init_kwargs)

@custom_task
def abc(x):
    return x

Kevin Kho

02/14/2022, 2:54 PM

You can modify the ShellTask maybe to add more logging like this

Alexis Lucido

02/14/2022, 5:46 PM

You're right, maybe the environment variable is not accessed at run time (though it is available at registration time). I'll pass that as a task first thing in the morning and check the result. If this fails still, I'll check up how to get more logs thanks to your snippet, though I have yet to understand why the logging operations did not work in the first place. Thanks a lot !

Alexis Lucido

02/15/2022, 11:03 AM

Ok so there was indeed an error when trying to access the path of the bash file. The environment variable was not read at run time. Thus, the os.path.join tried to join a None and a string. That gave an error directly in the flow, hence no logs for the bash execution as the script did not go that far. Now I can try to execute the bash file with the attached Flow, but I got another error ("Syntax error near unexpected token..."). I also attached a screenshot of the bug. Here is the content of my .sh file that I have slightly modified : #!/bin/bash pkill geckodriver pkill firefox

Alexis Lucido

02/15/2022, 11:03 AM

Any idea about this? Thanks a lot!

Alexis Lucido

02/15/2022, 11:05 AM

PS when my script only contains a single line such as "pkill geckodriver" I get the same error.

Kevin Kho

02/15/2022, 3:08 PM

The latest screen shot still has the

workflows_compute()…

. This is computed during build time, not run time so you might not have it during the Flow run. you need it in a task to defer execution

Alexis Lucido

02/15/2022, 4:04 PM

Yep, workflows_compute.path_geckodriver() is a task. Sorry if this was not clear to begin with

Kevin Kho

02/15/2022, 4:07 PM

Can I see the definition of that? You added the task decorator to a method of a class?

Kevin Kho

02/15/2022, 4:07 PM

I thought this didnt work

Alexis Lucido

02/15/2022, 5:18 PM

Sure. This task is now only a wrapper around a function that goes looking for the environmental variable for bash scripts folder, and the task is contained in another module workflows_compute.py. I attached the function as well, which is what was previously available in the flow. We then have Flow in maintenance.py -> Task in workflows_compute.py -> Function in another misc module. I wanted to separate my flows and my tasks. And, having switched from Airflow to Prefect, I reckoned the importance of having interfaces between my workflows and my basic functions, that should work no matter the workflow manager.

Kevin Kho

02/15/2022, 6:03 PM

Ohh I see.

workflows_compute

is a module. I understand. Will take a look again at the code

Kevin Kho

02/15/2022, 6:04 PM

The

source{}.format

is skill a bit concerning in the Flow because I think that is evaluated during build time

Alexis Lucido

02/16/2022, 8:17 AM

Ooooh maybe that's the reason I get this weird unexpected token "newline" error... Lemme check with the full path with no {].format

Alexis Lucido

02/16/2022, 9:25 AM

Ok so the .format expression was a problem indeed. I replaced it with the full path. Not as clean but not that big of a deal either. I have a new error when executing the code. The bash script only contains pkill geckodriver. Here the flow fails. So I replace the line with: _echo "<my_password>" | sudo -S pkill geckodriver_. And it fails again. I can run the bash script manually, without going through Prefect. Any idea?

Kevin Kho

02/16/2022, 2:13 PM

You can use the format in the Flow is it’s a task to defer execution. Could you show me the code and traceback? Wondering what the actual error is

Alexis Lucido

03/03/2022, 10:59 AM

Hey Kevin, sorry for my late answer. I have been dealing with other matters. I have reduced the code to its simplest expression (not sourcing an external script), and this is the flow:

Copy code

kill_geckodriver_task = ShellTask(
    log_stderr=True, return_all=True, stream_output=True)
with Flow('kill_geckodriver', schedule=schedules_prefect[
        'kill_geckodriver']) as kill_geckodriver:
    kill_geckodriver_task(command='pkill geckodriver; pkill firefox')

Here is the traceback:

Copy code

Looking up flow metadata... Done
Creating run for flow 'kill_geckodriver'... Done
└── Name: gabby-skua
└── UUID: 54bd621e-a8c0-409f-a8f6-f6d19c34890b
└── Labels: ['agentless-run-13ef9b7f']
└── Parameters: {}
└── Context: {}
└── URL: <http://localhost:8080/default/flow-run/54bd621e-a8c0-409f-a8f6-f6d19c34890b>
Executing flow run...
└── 11:55:03 | INFO    | Creating subprocess to execute flow run...
└── 11:55:03 | INFO    | Beginning Flow run for 'kill_geckodriver'
└── 11:55:03 | INFO    | Task 'ShellTask': Starting task run...
└── 11:55:03 | ERROR   | Command failed with exit code 1
└── 11:55:03 | INFO    | FAIL signal raised: FAIL('Command failed with exit code 1')
└── 11:55:04 | INFO    | Task 'ShellTask': Finished task run for task with final state: 'Failed'
└── 11:55:04 | INFO    | Flow run FAILED: some reference tasks failed.
Flow run failed!

These commands raise Exception only when run through Prefect. I can run dummy stuff like "echo 1" with a Shell task though.

Kevin Kho

03/03/2022, 2:24 PM

it looks like exit code 1 with pkill is no process matched based on this ?

Alexis Lucido

03/04/2022, 9:53 AM

YES! That was it: no process was found and a 1 exit code was returned, though we wanted to set that case as a success. Here is our new workflow. First, the flow:

Copy code

kill_process_task = ShellTask(
    log_stderr=True, return_all=True, stream_output=True)
with Flow('kill_geckodriver', schedule=schedules_prefect[
        'kill_geckodriver']) as kill_geckodriver:
    kill_process_task(command='source bash/kill_geckodriver.sh')
    kill_process_task(command='source bash/kill_firefox.sh')

Then, the scripts (I only write one of both but they are identical, except for the process killed):

Copy code

#!/bin/bash

pkill geckodriver
pkillexitstatus=$?

if [ "$pkillexitstatus" -eq "0" ]; then
    echo "One or more processes matched the criteria and have been killed. Operation successful."
    return 0
elif [ "$pkillexitstatus" -eq "1" ]; then
    echo "No processes matched. Operation successful."
    return 0
elif [ "$pkillexitstatus" -eq "2" ]; then
    echo "Syntax error in the command line. Failure."
    return 1
elif [ "$pkillexitstatus" -eq "3" ]; then
    echo "Fatal error. Failure."
    return 1
else
    echo "UNEXPECTED. Failure."
    return 1
fi

In the end, that was a Linux issue... We had troubles with geckodriver processes not correctly killed by our Python tasks so we added this maintenance flow, and we'd rather keep it as a security. Thanks a lot Kevin!

Kevin Kho

03/04/2022, 2:37 PM

Nice!

16 Views

Open in Slack

Previous Next