Hagai Arad
10/09/2020, 10:31 AMFlowRunTask
and run them concurrently using local dask executor. (The flow code is in the first comment).
My question is, is there another way to make the FlowRunTask
’s run concurrently without using dask? Thanks!Elliot Oram
10/09/2020, 3:35 PMwith Flow("Flow name") as flow:
flow.add_task(abc)
flow.register(project_name="Test")
In the latest version I find that this isn't possible anymore as I get a TypeError: can't pickle generator objects
My solution has been to replace this by defining the the flow object directly and ditching the context manager e.g.
flow = Flow("Flow name")
flow.add_task(abc)
flow.register(project_name="Test")
This works absolutely fine and I'm happy to continue with it. I just wanted to check if:
1. This is the expected / best way to resolve this problem
2. [Some of the documentation](https://docs.prefect.io/core/concepts/tasks.html#overview) still refers to the use of the context manager style. Is this okay to do simply because tese example do not register a flow?
Any advice is welcome 🙂 Thanks!Michael J Hall
10/09/2020, 7:22 PMJulian
10/12/2020, 9:26 AMstored_as_script = True
which I tracked down to the implemenation of the S3 build() function
that returns the storage object before uploading it to s3, when this variable is set to True.
def build(self) -> "Storage":
"""
Build the S3 storage object by uploading Flows to an S3 bucket. This will upload
all of the flows found in `storage.flows`. If there is an issue uploading to the
S3 bucket an error will be logged.
Returns:
- Storage: an S3 object that contains information about how and where
each flow is stored
Raises:
- botocore.ClientError: if there is an issue uploading a Flow to S3
"""
self.run_basic_healthchecks()
if self.stored_as_script:
if not self.key:
raise ValueError(
"A `key` must be provided to show where flow `.py` file is stored in S3."
)
return self
..
Also, I can register the flow with s3 storage and stored_as_script = False
, but even though it appears in UI and has a corresponding s3 object, flow_runs are not executed.Jasono
10/12/2020, 10:54 PMprefect server start
the first time, received this error. Any idea how to troubleshoot this?
PS C:\Users\puruz\Documents\prefect> prefect server start
Exception caught; killing services (press ctrl-C to force)
Traceback (most recent call last):
File "C:\Users\puruz\AppData\Roaming\Python\Python39\site-packages\prefect\cli\server.py", line 331, in start
subprocess.check_call(
File "c:\program files\python39\lib\subprocess.py", line 368, in check_call
retcode = call(*popenargs, **kwargs)
File "c:\program files\python39\lib\subprocess.py", line 349, in call
with Popen(*popenargs, **kwargs) as p:
File "c:\program files\python39\lib\subprocess.py", line 947, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "c:\program files\python39\lib\subprocess.py", line 1416, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
Craig
10/13/2020, 4:21 AMmogui mogui
10/13/2020, 9:35 AMmogui mogui
10/13/2020, 9:37 AMKeat
10/14/2020, 4:37 AMgraphql_url
pointing to
backend = "server"
graphql_url = "<http://localhost:4200/graphql>"
[server]
[server.ui]
graphql_url = "<http://localhost:4200/graphql>"
When I SSH-ed into the ec2 I have two portforward, 4200 and 8080. And my local config.toml
has the same config as the server config.toml
A local agent is running on the ec2.
The issue I am facing is:
When running the flow after registering from both local or in the ec2 instance, the shell tasks run into
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=4200): Read timed out. (read timeout=30)
I reckon is my config.toml but not sure where to start, or is there something that I am missing ?Chris Goddard
10/14/2020, 12:38 PMSiddharth Singh
10/15/2020, 6:01 AM.map
flows work parallel-ly.
We're using prefect local to test this before we decide on buying cloud. But in local I see my functions are evaluated serially.
# Generate Dates
date_generator = generate_keys_for_nppes_urls()
# Download ZIP Files, Instead of downloading files parallel-ly this function is executing first October, then September and then August...
zip_files = download_nppes_data_for_time.map(date_generator)
# Launch NPPES Spark Job
spark_res = launch_nppes_spark_job.map(zip_files)
# Launch NPPES Configuration Job
config_job = nppes_configuration_job()
config_job.set_upstream(spark_res)
ale
10/15/2020, 8:47 AMAlberto de Santos
10/15/2020, 4:17 PMMariusz Olszewski
10/16/2020, 4:20 PMMariusz Olszewski
10/16/2020, 4:39 PMMariusz Olszewski
10/16/2020, 4:39 PMVipul
10/17/2020, 3:24 PMif self.import_paths:
python_path += self.import_paths
current_env["PYTHONPATH"] = ":".join(python_path)
As I am working on Windows I realise that the colon to separate the Python path might not work and it should have been Semicolon for WindowsGeorg Zangl
10/18/2020, 9:04 AM[program:fhn]
command=sudo prefect agent start local -p /usr/dcc/fhn/data_import -p /usr/dcc/fhn/calc_td -p /usr/dcc/fhn/vol_back_all -f
I can run only two flows, the third one fails with the message "Failed to load and execute Flow's environment: ModuleNotFoundError".
It doesn't matter which flow is third, it is always the last one, which fails.
So I have created two separate programs:
[program:fhn1]
command=sudo prefect agent start local -p /usr/dcc/fhn/data_import -p /usr/dcc/fhn/calc_td -f -l Import
[program:fhn_2]
command=prefect agent start local -p /usr/dcc/fhn/vol_back_all -f -l Vol
to handle all three flows.
But now, the success of the flows is unstable. Sometimes they fail with the "Failed to load.." message, sometimes they succeed. Even if I trigger them manually, they sometimes fail and sometimes succeed. More than 50% of flow runs fail.
Here is a log from supervisord:
2020-10-18 08:55:05,888 DEBG 'fhn1' stdout output:
[2020-10-18 08:55:05] INFO - prefect.CloudFlowRunner | Beginning Flow run for 'Calculate Equations FHN'
2020-10-18 08:55:05,903 DEBG 'fhn1' stdout output:
[2020-10-18 08:55:05] ERROR - prefect.Local | Failed to load Flow from /usr/dcc/fhn/flows/back-allocation-fhn.prefect
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/prefect/environments/storage/local.py", line 103, in get_flow
return prefect.core.flow.Flow.load(flow_location)
File "/usr/local/lib/python3.7/dist-packages/prefect/core/flow.py", line 1495, in load
return cloudpickle.load(f)
File "/usr/local/lib/python3.7/dist-packages/cloudpickle/cloudpickle.py", line 562, in subimport
__import__(name)
ModuleNotFoundError: No module named 'read_wt'
2020-10-18 08:55:05,954 DEBG 'fhn1' stdout output:
No module named 'read_wt'
The two agents are running fine. All code and agents are local, everything runs on one machine. I am running looped tasks in each of the flow.
Here is the code for one of the flows:
flow = Flow("Data Import FHN")
flow.set_dependencies(loop_conns, keyword_tasks={"iloop": looplist}, mapped=True)
with Flow("Data Import FHN") as flow:
connect = db_conn()
mapped_result =loop_conns.map(iloop=looplist)
flow.storage = Local(directory="/usr/dcc/fhn/flows")
flow.storage.build()
flow.register(project_name="FHN")
Any help would be appreciate to understand the unstable performance better.Bruce Haggerty
10/18/2020, 5:21 PMJasono
10/19/2020, 6:40 AM--run-name
Jasono
10/19/2020, 6:42 AMJasono
10/19/2020, 6:45 AMLukas N.
10/19/2020, 3:33 PMDave
10/20/2020, 3:11 PM<http://localhost:4200/graphql>
This specific setup were I'm currently testing this is running the following:
"platform": "Windows-10-10.0.19041-SP0",
"prefect_backend": "server",
"prefect_version": "0.13.11",
"python_version": "3.8.3"
I followed the guide for UI configuration and I added the following to my config (/home./prefect/config.toml):
backend = "server"
[server]
[server.ui]
apollo_url = "http://<url>:4200/graphql"
It works just fine, if you check `prefect diagnostics`:
{
"config_overrides": {
"backend": true,
"server": {
"ui": {
"apollo_url": true
}
}
},
"env_vars": [],
"system_information": {
"platform": "Windows-10-10.0.19041-SP0",
"prefect_backend": "server",
"prefect_version": "0.13.11",
"python_version": "3.8.3"
}
}
I can also see that the UI indeed understand that the configuration have been added to the config.toml where it picks up the specific URL at: http://<url>:8080/settings.json
:
{
"server_url": "http://<url>:4200/graphql"
}
When I then dig a littler deeper into the network requests happening in relation to the UI, I can then see that everything are still are happening at: <http://localhost:4200/graphql>
(Please see attachment)
Which must be related to that it always insert <http://localhost:4200/graphql>
into the localstorage automatically,. (Please see attachment)
Why isn't it inserting the right URL into the localstorage?
It only use the right URL if I manually adds it to the localstorage.EmGarr
10/20/2020, 4:01 PMheaders = {'Authorization': "Basic {}".format(XXXXX)}
client = prefect.Client()
client.attach_headers(headers)
However, when the agent deploys a flow we hit a small issue:
1. The docker is runned with the following command: prefect execute flow-run
2. Inside the docker we run it leads to prefect.execute.cli
with _execute_flow_run
:
client = Client()
result = client.graphql(query) # Authorization error
Is there a simple way to avoid this issue and being able to add BasicAuth to the client?
We could maybe add an env variable which would select the proper auth method by the client (example):
https://github.com/PrefectHQ/prefect/blob/bd6e47379594d4e26e6810380482320eeee714ae/src/prefect/client/client.py#L385
def select_auth_method(token):
if config.AUTH.BASIC:
return "Basic {}".format(token)
# else OAuth
return "Basic {}".format(token)
if token:
headers["Authorization"] = "Bearer {}".format(token)
Ajay
10/21/2020, 12:18 PMdef task_state_transition_handler(flow_info:Task, old_state: State, new_state: State):
if new_state.is_queued():
aquire_lock(path=new_state.context['flow_name'] + "/" +
new_state.context['today']+"/",
data=new_state.context['flow_run_id'])
elif new_state.is_finished():
release_lock(path=new_state.context['flow_name'] + "/" +
new_state.context['today']+"/",
data=new_state.context['flow_run_id'])
return
Satyam Tandon
10/22/2020, 4:22 PM[2020-10-22 16:18:25,588] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
[2020-10-22 16:18:25,588] INFO - agent | Agent connecting to the Prefect API at <http://localhost:4200>
[2020-10-22 16:18:25,597] INFO - agent | Waiting for flow runs...
[2020-10-22 16:18:41,285] INFO - agent | Found 1 flow run(s) to submit for execution.
[2020-10-22 16:18:41,328] INFO - agent | Deploying flow run b95dfeb9-7912-4526-bf23-168d84cb70b2
[2020-10-22 16:19:25,600] INFO - agent | Process PID 33157 returned non-zero exit code
This is what I see on the UI
Last State Message
[22 Oct 2020 9:18am]: Failed to load and execute Flow's environment: AttributeError("Can't get attribute 'NoDefault' on <module 'prefect.core.task' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/prefect/core/task.py'>")
I have been unable to fix the issue. Any help would be appreciated.
Thank you 🙂Keat
10/23/2020, 7:59 AM0.13.10
prefect server
, and I started to see this error for scheduled task and manually run tasks
[{'message': 'Foreign key violation.', 'locations': [{'line': 2, 'column': 5}], 'path': ['set_flow_run_states'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Foreign key violation.'}}}]
It seems like it comes from the graphql side of things. But I couldn’t figure out why it randomly fails with that error. After a couple of manual restart sometimes it works. I tried to debug but since it happens at the very beginning of the flow, there’s not much that I can find out
When I register the flow I register it with the same core version, or at least I try toJesper van Dijke
10/26/2020, 1:05 AMKyle
10/26/2020, 11:47 AMWARNING: The PREFECT_SERVER_DB_CMD variable is not set. Defaulting to a blank string.
WARNING: The DB_CONNECTION_URL variable is not set. Defaulting to a blank string.
WARNING: The POSTGRES_DB variable is not set. Defaulting to a blank string.
WARNING: The POSTGRES_PASSWORD variable is not set. Defaulting to a blank string.
WARNING: The POSTGRES_USER variable is not set. Defaulting to a blank string.
Creating network "prefect-server" with the default driver
Creating tmp_postgres_1 ... done
Creating tmp_hasura_1 ... done
Creating tmp_graphql_1 ... done
Creating tmp_towel_1 ... done
Creating tmp_apollo_1 ... done
Creating tmp_ui_1 ... done
Attaching to tmp_postgres_1, tmp_hasura_1, tmp_graphql_1, tmp_towel_1, tmp_apollo_1, tmp_ui_1
graphql_1 | bash: -c: line 0: syntax error near unexpected token `&&'
graphql_1 | bash: -c: line 0: ` && python src/prefect_server/services/graphql/server.py'
hasura_1 | {"type":"pg-client","timestamp":"2020-10-26T11:45:30.504+0000","level":"warn","detail":{"message":"postgres connection failed, retrying(0)."}}
hasura_1 | {"type":"pg-client","timestamp":"2020-10-26T11:45:30.504+0000","level":"warn","detail":{"message":"postgres connection failed, retrying(1)."}}
hasura_1 | {"type":"startup","timestamp":"2020-10-26T11:45:30.504+0000","level":"error","detail":{"kind":"db_migrate","info":{"internal":"could not connect to server: No such file or directory\n\tIs the server running locally and accepting\n\tconnections on Unix domain socket \"/var/run/postgresql/.s.PGSQL.5432\"?\n","path":"$","error":"connection error","code":"postgres-error"}}}
ui_1 | 👾👾👾 UI running at localhost:8080 👾👾👾
postgres_1 | Error: Database is uninitialized and superuser password is not specified.
postgres_1 | You must specify POSTGRES_PASSWORD to a non-empty value for the
postgres_1 | superuser. For example, "-e POSTGRES_PASSWORD=password" on "docker run".
postgres_1 |
postgres_1 | You may also use "POSTGRES_HOST_AUTH_METHOD=trust" to allow all
postgres_1 | connections without a password. This is *not* recommended.
postgres_1 |
postgres_1 | See PostgreSQL documentation about "trust":
postgres_1 | <https://www.postgresql.org/docs/current/auth-trust.html>
tmp_hasura_1 exited with code 1
tmp_graphql_1 exited with code 1
tmp_postgres_1 exited with code 1
Kyle
10/26/2020, 11:47 AMWARNING: The PREFECT_SERVER_DB_CMD variable is not set. Defaulting to a blank string.
WARNING: The DB_CONNECTION_URL variable is not set. Defaulting to a blank string.
WARNING: The POSTGRES_DB variable is not set. Defaulting to a blank string.
WARNING: The POSTGRES_PASSWORD variable is not set. Defaulting to a blank string.
WARNING: The POSTGRES_USER variable is not set. Defaulting to a blank string.
Creating network "prefect-server" with the default driver
Creating tmp_postgres_1 ... done
Creating tmp_hasura_1 ... done
Creating tmp_graphql_1 ... done
Creating tmp_towel_1 ... done
Creating tmp_apollo_1 ... done
Creating tmp_ui_1 ... done
Attaching to tmp_postgres_1, tmp_hasura_1, tmp_graphql_1, tmp_towel_1, tmp_apollo_1, tmp_ui_1
graphql_1 | bash: -c: line 0: syntax error near unexpected token `&&'
graphql_1 | bash: -c: line 0: ` && python src/prefect_server/services/graphql/server.py'
hasura_1 | {"type":"pg-client","timestamp":"2020-10-26T11:45:30.504+0000","level":"warn","detail":{"message":"postgres connection failed, retrying(0)."}}
hasura_1 | {"type":"pg-client","timestamp":"2020-10-26T11:45:30.504+0000","level":"warn","detail":{"message":"postgres connection failed, retrying(1)."}}
hasura_1 | {"type":"startup","timestamp":"2020-10-26T11:45:30.504+0000","level":"error","detail":{"kind":"db_migrate","info":{"internal":"could not connect to server: No such file or directory\n\tIs the server running locally and accepting\n\tconnections on Unix domain socket \"/var/run/postgresql/.s.PGSQL.5432\"?\n","path":"$","error":"connection error","code":"postgres-error"}}}
ui_1 | 👾👾👾 UI running at localhost:8080 👾👾👾
postgres_1 | Error: Database is uninitialized and superuser password is not specified.
postgres_1 | You must specify POSTGRES_PASSWORD to a non-empty value for the
postgres_1 | superuser. For example, "-e POSTGRES_PASSWORD=password" on "docker run".
postgres_1 |
postgres_1 | You may also use "POSTGRES_HOST_AUTH_METHOD=trust" to allow all
postgres_1 | connections without a password. This is *not* recommended.
postgres_1 |
postgres_1 | See PostgreSQL documentation about "trust":
postgres_1 | <https://www.postgresql.org/docs/current/auth-trust.html>
tmp_hasura_1 exited with code 1
tmp_graphql_1 exited with code 1
tmp_postgres_1 exited with code 1
Billy McMonagle
10/26/2020, 1:16 PMKyle
10/26/2020, 1:23 PM(venv) shark@shark-H310M-S2-2-0:~/Documents/prefect$ prefect config
{'debug': False, 'home_dir': '/home/shark/.prefect', 'backend': 'cloud', 'server': {'host': '<http://localhost>', 'port': 4200, 'host_port': 4200, 'endpoint': '<http://localhost:4200>', 'database': {'host': 'localhost', 'port': 5432, 'host_port': 5432, 'name': 'prefect_server', 'username': 'postgres', 'password': 'postgres', 'connection_url': '<postgresql://postgres:postgres@localhost:5432/prefect_server>', 'volume_path': '/home/shark/.prefect/pg_data/'}, 'graphql': {'host': '0.0.0.0', 'port': 4201, 'host_port': 4201, 'debug': False, 'path': '/graphql/'}, 'hasura': {'host': 'localhost', 'port': 3000, 'host_port': 3000, 'admin_secret': '', 'claims_namespace': 'hasura-claims', 'graphql_url': '<http://localhost:3000/v1alpha1/graphql>', 'ws_url': '<ws://localhost:3000/v1alpha1/graphql>', 'execute_retry_seconds': 10}, 'ui': {'host': '<http://localhost>', 'port': 8080, 'host_port': 8080, 'endpoint': '<http://localhost:8080>', 'apollo_url': '<http://localhost:4200/graphql>'}, 'telemetry': {'enabled': True}}, 'cloud': {'api': '<https://api.prefect.io>', 'endpoint': '<https://api.prefect.io>', 'graphql': '<https://api.prefect.io/graphql>', 'use_local_secrets': True, 'heartbeat_interval': 30.0, 'check_cancellation_interval': 15.0, 'diagnostics': False, 'logging_heartbeat': 5, 'queue_interval': 30.0, 'agent': {'name': 'agent', 'labels': [], 'level': 'INFO', 'auth_token': '', 'agent_address': '', 'resource_manager': {'loop_interval': 60}}}, 'logging': {'level': 'INFO', 'format': '[%(asctime)s] %(levelname)s - %(name)s | %(message)s', 'log_attributes': [], 'datefmt': '%Y-%m-%d %H:%M:%S', 'log_to_cloud': False, 'extra_loggers': []}, 'flows': {'eager_edge_validation': False, 'run_on_schedule': True, 'checkpointing': False, 'defaults': {'storage': {'add_default_labels': True, 'default_class': 'prefect.environments.storage.Local'}}}, 'tasks': {'defaults': {'max_retries': 0, 'retry_delay': None, 'timeout': None}}, 'engine': {'executor': {'default_class': 'prefect.engine.executors.LocalExecutor', 'dask': {'address': '', 'cluster_class': 'distributed.deploy.local.LocalCluster'}}, 'flow_runner': {'default_class': 'prefect.engine.flow_runner.FlowRunner'}, 'task_runner': {'default_class': 'prefect.engine.task_runner.TaskRunner'}}}
prefect backend server
) and start the prefect server running this command: prefect server start
Billy McMonagle
10/26/2020, 1:59 PMprefect server start
should mean that you're using the default docker-compose.yml
, which lives here https://github.com/PrefectHQ/prefect/blob/master/src/prefect/cli/docker-compose.yml~/.prefect/config.toml
file that doesn't have a postgres password setKyle
10/26/2020, 4:27 PM[server]
[server.database]
username = "postgres"
password = "postgres"
volume_path = "~/.prefect/pg_data/"
Billy McMonagle
10/26/2020, 5:03 PMname = "your_db_name
Kyle
10/26/2020, 5:05 PMBilly McMonagle
10/26/2020, 5:08 PMparameter :postgres_user, type: :String, default: "prefect"
parameter :postgres_password, type: :String, default: "test-password"
parameter :postgres_db, type: :String, default: "prefect_server"
Kyle
10/26/2020, 5:10 PMBilly McMonagle
10/26/2020, 5:11 PMprefect
CLI command.Kyle
10/26/2020, 5:12 PMBilly McMonagle
10/26/2020, 5:13 PMKyle
10/26/2020, 5:14 PMBilly McMonagle
10/26/2020, 5:21 PMKyle
10/26/2020, 5:23 PM[server]
[server.database]
name = "prefect_server"
username = "postgres"
password = "postgres"
volume_path = "~/.prefect/pg_data/"
Billy McMonagle
10/26/2020, 5:25 PMdocker system prune
to cleanup your local system.docker run --rm --name pg-docker -e POSTGRES_PASSWORD=password -p 5432:5432 postgres
Kyle
10/26/2020, 5:31 PMdocker system prune -a
) and run it again. although I already did it before adding the name
property to the config.toml file.Billy McMonagle
10/26/2020, 5:32 PMKyle
10/26/2020, 5:34 PMJesper van Dijke
10/27/2020, 12:19 AMlocal/bin/prefect server start
not directly in your home, it will then use the toml.
Search and read my comments here from the last few days, encountered the same problem...
docker-compose will probably need a .env
fileKyle
10/27/2020, 12:24 AMbin
directory within .local
.(venv) shark@shark-H310M-S2-2-0:~/Documents/prefect$ ~/.local/
lib/ share/