Dear Prefect Team first of all congratulations and many than Prefect Community #prefect-server

Dear Prefect Team, first of all, congratulations a...

Georg Zangl

10/18/2020, 9:04 AM

Dear Prefect Team, first of all, congratulations and many thanks for your hard work you've put into Prefect. I have managed to setup some flows and are able to run them when running single flows with single agents. However, I made the following experience, when I want to run three different flows in three different directories using supervisord: If I use one agent as program like:

Copy code

[program:fhn]
command=sudo prefect agent start local -p /usr/dcc/fhn/data_import -p /usr/dcc/fhn/calc_td -p /usr/dcc/fhn/vol_back_all -f

I can run only two flows, the third one fails with the message "Failed to load and execute Flow's environment: ModuleNotFoundError". It doesn't matter which flow is third, it is always the last one, which fails. So I have created two separate programs:

Copy code

[program:fhn1]
command=sudo prefect agent start local -p /usr/dcc/fhn/data_import -p /usr/dcc/fhn/calc_td -f -l Import
[program:fhn_2]
command=prefect agent start local -p /usr/dcc/fhn/vol_back_all  -f -l Vol

to handle all three flows. But now, the success of the flows is unstable. Sometimes they fail with the "Failed to load.." message, sometimes they succeed. Even if I trigger them manually, they sometimes fail and sometimes succeed. More than 50% of flow runs fail. Here is a log from supervisord:

Copy code

2020-10-18 08:55:05,888 DEBG 'fhn1' stdout output:
[2020-10-18 08:55:05] INFO - prefect.CloudFlowRunner | Beginning Flow run for 'Calculate Equations FHN'

2020-10-18 08:55:05,903 DEBG 'fhn1' stdout output:
[2020-10-18 08:55:05] ERROR - prefect.Local | Failed to load Flow from /usr/dcc/fhn/flows/back-allocation-fhn.prefect
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/prefect/environments/storage/local.py", line 103, in get_flow
    return prefect.core.flow.Flow.load(flow_location)
  File "/usr/local/lib/python3.7/dist-packages/prefect/core/flow.py", line 1495, in load
    return cloudpickle.load(f)
  File "/usr/local/lib/python3.7/dist-packages/cloudpickle/cloudpickle.py", line 562, in subimport
    __import__(name)
ModuleNotFoundError: No module named 'read_wt'

2020-10-18 08:55:05,954 DEBG 'fhn1' stdout output:
No module named 'read_wt'

The two agents are running fine. All code and agents are local, everything runs on one machine. I am running looped tasks in each of the flow. Here is the code for one of the flows:

Copy code

flow = Flow("Data Import FHN")
flow.set_dependencies(loop_conns, keyword_tasks={"iloop": looplist}, mapped=True)
with Flow("Data Import FHN") as flow:
    connect = db_conn()
    mapped_result =loop_conns.map(iloop=looplist)

flow.storage = Local(directory="/usr/dcc/fhn/flows")
flow.storage.build()
flow.register(project_name="FHN")

Any help would be appreciate to understand the unstable performance better.

Spencer

10/18/2020, 1:00 PM

Make sure the dependencies needed for each flow is available to the flow runner. It's trying to load the flow but package

read_wt

is missing. Verify that the necessary packages are installed for the python that each agent is using.

Georg Zangl

10/18/2020, 2:11 PM

Spencer, thanks a lot for your comment. All flows run sometimes. I have scheduled all of them and all of them succeed in average 40%. When I trigger them manually, they fail one to five times. Without changing anything, just clicking the "Quick Run" button again, they succeed.

Spencer

10/18/2020, 3:54 PM

The error is that the packages are missing. It's an issue with the python environment.

Georg Zangl

10/18/2020, 4:09 PM

The package is not found by the environment, I understand that. But why is it found sometimes and sometimes not. It is running on schedule and the success rate depends also on the schedule frequency. As you can see in the screenshot. Earlier, schedule frequency was 10 minutes, (red rectangle), no it is 30min (purple rectangle). Each of the flows runs successfully from time to time

nicholas

10/19/2020, 3:08 PM

Hi @Georg Zangl - my guess is that the agents you want to pick up specific flows aren't always picking up the flows you're intending. So when an agent that has a specific import path picks up the wrong flow, it fails to load the required modules. You can fix this by either 1) making sure all modules are available to a single agent, which would allow it to pick up and load modules from any of your flows or 2) if you want to use separate agents, use flow and agent labels to ensure the correct agent picks up the correct flows. You can read more about this concept (we call it flow affinity) in the docs here.

Georg Zangl

10/19/2020, 7:48 PM

Dear @nicholas , you are spot on! I am using labels now and so far I have a success rate of 100%, excellent, thanks a lot!

🚀 1

nicholas

10/19/2020, 7:48 PM

Fantastic! 😄

Georg Zangl

10/20/2020, 6:35 AM

Good morning @nicholas, the flows were all successfully. Now I'm back to the first problem I have described, which is the number of directories I can register for one agent. It seems that it takes only two "-p", the third one is ignored. I have tested this behavior many times, it repeatedly happens.

nicholas

10/20/2020, 1:40 PM

Hi @Georg Zangl - looking at the code for the CLI (here) and the code for the Local Agent (here) I'm not seeing any artificial constraints on that flag; what does the Agent output when you start it look like? There should be a message about import paths.e

Georg Zangl

10/20/2020, 2:16 PM

Hi @nicholas, when I enter the following line in the supervisord.conf file:

Copy code

command=sudo prefect agent start local -p /usr/dcc/fhn/data_import -p /usr/dcc/fhn/vol_back_all -p /usr/dcc/fhn/calc_td -f -l FHN

only the first two flows run If I split it into two agents (another example):

Copy code

[program:platform_1] 
command=sudo prefect agent start local -p /usr/dcc/platform/data_import -p /usr/dcc/platform/calc_td  -f -l Platform1
[program:platform_2] 
command=sudo prefect agent start local -p /usr/dcc/platform/vol_back_all -f -l Platform2

only two flows run. Can't get the third flow running. If I switch directories, behavior is the same. Always the flow of the second (in case of two agents) or third (in case of one agent) directory will fail. Now I'm running one directory/flow per agent, this works fine

Georg Zangl

10/20/2020, 2:25 PM

Copy code

____            __           _        _                    _
|  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
| |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
|  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
|_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|
                                           |___/

[2020-10-20 06:56:26,133] INFO - agent | Starting LocalAgent with labels ['FHN1', 'ip-10-0-9-4', 'azure-flow-storage', 'gcs-flow-storage', 's3-flow-storage', 'github-flow-storage', 'webhook-flow-storage', 'gitlab-flow-storage']
[2020-10-20 06:56:26,133] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>

2020-10-20 06:56:26,134 DEBG 'fhn_1' stdout output:
[2020-10-20 06:56:26,133] INFO - agent | Agent connecting to the Prefect API at <http://localhost:4200>

This is the message of the agent when it starts, no mentioning of the path

nicholas

10/20/2020, 2:34 PM

Hm ok. My guess is that something is up with your third flow but if you could open a ticket on the Core repo describing the behavior you're seeing, the Core team will be able to triage this better, whether there's a bug with Agent import paths or not. Also as a side note I'd generally discourage starting your agent with elevated permissions unless absolutely necessary - as with any elevated process it can have unintended consequences

Georg Zangl

10/20/2020, 4:27 PM

great, thanks a lot for your comments. I will look a bit further into it.

👍 1

3 Views

Open in Slack

Previous Next