volkerjaenisch
12/01/2021, 10:54 PMhttps://upload.wikimedia.org/wikipedia/commons/8/89/Kozloduy_Nuclear_Power_Plant_-_Control_Room_of_Unit_5.jpg▾
Kevin Kho
volkerjaenisch
12/02/2021, 2:05 AMZanie
Zanie
result()
method to retrieve their underlying data
from prefect import flow
@flow
def child(y):
return y
@flow
def parent(x):
state = child(x)
print(state.result())
parent(1)
Zanie
Zanie
Zanie
Zanie
Zanie
Zanie
Zanie
Zanie
volkerjaenisch
12/02/2021, 9:11 PMvolkerjaenisch
12/03/2021, 8:01 PMKevin Kho
volkerjaenisch
12/03/2021, 10:37 PMZanie
volkerjaenisch
12/03/2021, 10:40 PMvolkerjaenisch
12/03/2021, 10:42 PMZanie
volkerjaenisch
12/03/2021, 10:43 PMvolkerjaenisch
12/03/2021, 10:47 PMvolkerjaenisch
12/03/2021, 10:51 PMZanie
Zanie
Sandra Rum
12/06/2021, 10:25 AM@task()
def generate_input():
x = np.arange(100000)
return x
@flow
def parent():
input = generate_input()
parent()
Sandra Rum
12/06/2021, 10:25 AM[11:17:31] sandra_rum: 11:17:01.911 | Flow run 'pastoral-mole' encountered exception:
Traceback (most recent call last):
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/engine.py", line 377, in orchestrate_flow_run
result = await run_sync_in_worker_thread(flow_call)
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/utilities/asyncio.py", line 48, in run_sync_in_worker_thread
return await anyio.to_thread.run_sync(context.run, call, cancellable=True)
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/anyio/to_thread.py", line 28, in run_sync
return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 754, in run
result = context.run(func, *args)
File "/home/sandra/workspace/orion_eva/orion/test.py", line 14, in parent
input = generate_input()
[11:17:31] sandra_rum: File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/tasks.py", line 271, in __call__
return enter_task_run_engine(
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/engine.py", line 450, in enter_task_run_engine
return run_async_from_worker_thread(begin_run)
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/utilities/asyncio.py", line 59, in run_async_from_worker_thread
return anyio.from_thread.run(call)
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/anyio/from_thread.py", line 35, in run
return asynclib.run_async_from_thread(func, *args)
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 847, in run_async_from_thread
return f.result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 440, in result
return self.__get_result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/engine.py", line 510, in create_and_submit_task_run
future = await flow_run_context.executor.submit(
[11:17:31] sandra_rum: File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/executors.py", line 180, in submit
self._results[task_run.id] = await run_fn(**run_kwargs)
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/client.py", line 59, in wrapper
return await fn(*args, **kwargs)
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/engine.py", line 620, in orchestrate_task_run
terminal_state = await user_return_value_to_state(
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/engine.py", line 734, in user_return_value_to_state
if is_state(result) or is_state_iterable(result):
File "/home/sandra/workspace/venvs/orion-VYmp5l7m-py3.9/lib/python3.9/site-packages/prefect/orion/states.py", line 17, in is_state_iterable
if isinstance(obj, IterableABC) and obj:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
11:17:01.912 | Shutting down executor `SequentialExecutor`...
11:17:01.937 | Flow run 'pastoral-mole' finished in state Failed(message='Flow run encountered an exception.', type=FAILED)
Process finished with exit code 0
Sandra Rum
12/06/2021, 10:26 AMKevin Kho
volkerjaenisch
12/06/2021, 4:50 PMKevin Kho
export PREFECT_ORION_DATABASE_CONNECTION_URL=sqlite+aiosqlite:///orion.db
volkerjaenisch
12/06/2021, 4:55 PMKevin Kho
volkerjaenisch
12/06/2021, 4:56 PMvolkerjaenisch
12/06/2021, 5:03 PMvolkerjaenisch
12/06/2021, 5:08 PMsqlite+aiosqlite:///file::memory:?cache=shared&uri=true&check_same_thread=false does also not work.
=> Table not found error.
/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/bin/python /home/volker/workspace/ORION/orion/orion/flows.py
Traceback (most recent call last):
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 719, in do_execute
cursor.execute(statement, parameters)
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 100, in execute
self._adapt_connection._handle_exception(error)
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 229, in _handle_exception
raise error
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 82, in execute
self.await_(_cursor.execute(operation, parameters))
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 76, in await_only
return current.driver.switch(awaitable)
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 129, in greenlet_spawn
value = await result
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/aiosqlite/cursor.py", line 37, in execute
await self._execute(self._cursor.execute, sql, parameters)
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/aiosqlite/cursor.py", line 31, in _execute
return await self._conn._execute(fn, *args, **kwargs)
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/aiosqlite/core.py", line 129, in _execute
return await future
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/aiosqlite/core.py", line 102, in run
result = function()
sqlite3.OperationalError: no such table: flow
volkerjaenisch
12/06/2021, 5:43 PMvolkerjaenisch
12/06/2021, 5:48 PM/net/zolxq00618/home/ref-data/conda_environments/vtt-experiments/bin/python /home/xyvjaeni/workspace/vtt-experiments/scripts/vtt_tasks.py
Traceback (most recent call last):
File "_/home/xyvjaeni/_.local/lib/python3.7/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 397, in _prepare_and_execute
operation, self._invalidate_schema_cache_asof
File "_/home/xyvjaeni/_.local/lib/python3.7/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 638, in _prepare
prepared_stmt = await self._connection.prepare(operation)
File "_/home/xyvjaeni/_.local/lib/python3.7/site-packages/asyncpg/connection.py", line 571, in prepare
record_class=record_class,
File "_/home/xyvjaeni/_.local/lib/python3.7/site-packages/asyncpg/connection.py", line 589, in _prepare
record_class=record_class,
File "_/home/xyvjaeni/_.local/lib/python3.7/site-packages/asyncpg/connection.py", line 403, in _get_statement
ignore_custom_codec=ignore_custom_codec,
File "asyncpg/protocol/protocol.pyx", line 168, in prepare
asyncpg.exceptions.UndefinedTableError: relation "flow" does not exist
volkerjaenisch
12/06/2021, 6:08 PMDustin Ngo
12/06/2021, 8:01 PMvolkerjaenisch
12/06/2021, 10:02 PMfrom prefect import flow
@flow
def my_flow():
return 1
print(my_flow())
And it is called the following way:
export PREFECT_ORION_DATABASE_CONNECTION_URL="sqlite+aiosqlite:///file::memory:?cache=shared&uri=true&check_same_thread=false"; python simpley.py
volkerjaenisch
12/06/2021, 10:07 PMvolkerjaenisch
12/06/2021, 10:20 PMvolkerjaenisch
12/06/2021, 10:21 PMvolkerjaenisch
12/06/2021, 10:22 PMvolkerjaenisch
12/06/2021, 10:26 PM(orion-2cnkrIXd-py3.9) volker@runner:~/workspace/ORION/orion/orion$ export PREFECT_ORION_DATABASE_CONNECTION_URL="<postgresql+asyncpg://root:PW@127.0.0.1/orion>"; prefect orion start
Starting Orion API server...
INFO: Started server process [188892]
23:24:37.817 | Started server process [188892]
INFO: Waiting for application startup.
23:24:37.817 | Waiting for application startup.
23:24:37.817 | Scheduler service scheduled to start in-app
23:24:37.817 | MarkLateRuns service scheduled to start in-app
INFO: Application startup complete.
23:24:37.857 | Application startup complete.
INFO: Uvicorn running on <http://127.0.0.1:4200> (Press CTRL+C to quit)
23:24:37.858 | Uvicorn running on <http://127.0.0.1:4200> (Press CTRL+C to quit)
23:24:37.953 | Unexpected error in: ProgrammingError("(sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.UndefinedTableError'>: Relation »deployment« existiert nicht")
Traceback (most recent call last):
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 396, in _prepare_and_execute
prepared_stmt, attributes = await adapt_connection._prepare(
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 638, in _prepare
prepared_stmt = await self._connection.prepare(operation)
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/asyncpg/connection.py", line 566, in prepare
return await self._prepare(
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/asyncpg/connection.py", line 584, in _prepare
stmt = await self._get_statement(
File "/home/volker/workspace/venvs/orion-2cnkrIXd-py3.9/lib/python3.9/site-packages/asyncpg/connection.py", line 398, in _get_statement
statement = await self._protocol.prepare(
File "asyncpg/protocol/protocol.pyx", line 168, in prepare
asyncpg.exceptions.UndefinedTableError: Relation »deployment« existiert nicht
Vipul
12/06/2021, 10:38 PMvolkerjaenisch
12/06/2021, 10:44 PMvolkerjaenisch
12/06/2021, 10:54 PMvolkerjaenisch
12/06/2021, 11:50 PMvolkerjaenisch
12/06/2021, 11:51 PMDustin Ngo
12/07/2021, 12:12 AMvolkerjaenisch
12/07/2021, 12:20 AMvolkerjaenisch
12/07/2021, 12:32 AMvolkerjaenisch
12/07/2021, 12:47 AMvolkerjaenisch
12/07/2021, 12:49 AMvolkerjaenisch
12/07/2021, 1:02 AMKevin Kho
Zanie
OrionClient.persist_object
.volkerjaenisch
12/07/2021, 11:36 PM2021-12-07 23:04:20.544 CET [70132] root@orion LOG: Anweisung: BEGIN ISOLATION LEVEL READ COMMITTED;
2021-12-07 23:04:20.544 CET [70132] root@orion LOG: Dauer: 0.061 ms
2021-12-07 23:04:20.544 CET [70132] root@orion LOG: Dauer: 0.114 ms
2021-12-07 23:04:20.545 CET [70132] root@orion LOG: Dauer: 0.401 ms
2021-12-07 23:04:20.545 CET [70132] root@orion LOG: Ausführen __asyncpg_stmt_f__: INSERT INTO task_run (id, created, updated, name, run_count, total_run_time, task_key, dyn
amic_key, cache_key, cache_expiration, task_version, empirical_policy, task_inputs, tags, flow_run_id) VALUES ($1::uuid, $2::timestamp with time zone, $3::timestamp with tim
e zone, $4::varchar, $5::integer, $6::interval, $7::varchar, $8::varchar, $9::varchar, $10::timestamp with time zone, $11::varchar, $12::jsonb, $13::jsonb, $14::jsonb, $15::
uuid) ON CONFLICT (flow_run_id, task_key, dynamic_key) DO NOTHING
2021-12-07 23:04:20.545 CET [70132] root@orion DETAIL: Parameter: $1 = 'ef312c3e-da62-4619-8089-a592fcce2d87', $2 = '2021-12-07 23:04:20.541589+01', $3 = '2021-12-07 23:04:
20.541608+01', $4 = 'my_task-ec9685da-0', $5 = '0', $6 = '00:00:00', $7 = 'ec9685da70232eae82ef1a236e37c301', $8 = '0', $9 = NULL, $10 = NULL, $11 = NULL, $12 = '{"max_retri
es": 0, "retry_delay_seconds": 0.0}', $13 = '{}', $14 = '[]', $15 = '8d007cd6-b64d-48e5-8723-0c10a26988b8'
2021-12-07 23:04:20.548 CET [70132] root@orion LOG: Dauer: 2.728 ms
2021-12-07 23:04:20.624 CET [70132] root@orion LOG: Dauer: 0.181 ms
2021-12-07 23:04:20.626 CET [70132] root@orion LOG: Dauer: 1.557 ms
2021-12-07 23:04:20.626 CET [70132] root@orion LOG: Ausführen __asyncpg_stmt_10__: SELECT task_run.id, task_run.created, task_run.updated, task_run.name, task_run.state_typ
e, task_run.run_count, task_run.expected_start_time, task_run.next_scheduled_start_time, task_run.start_time, task_run.end_time, task_run.total_run_time, task_run.task_key,
task_run.dynamic_key, task_run.cache_key, task_run.cache_expiration, task_run.task_version, task_run.empirical_policy, task_run.task_inputs, task_run.tags, task_run.flow_run
_id, task_run.state_id, task_run_state_1.id AS id_1, task_run_state_1.created AS created_1, task_run_state_1.updated AS updated_1, task_run_state_1.type, task_run_state_1.ti
mestamp, task_run_state_1.name AS name_1, task_run_state_1.message, task_run_state_1.state_details, task_run_state_1.data, task_run_state_1.task_run_id
FROM task_run LEFT OUTER JOIN task_run_state AS task_run_state_1 ON task_run_state_1.id = task_run.state_id
WHERE task_run.flow_run_id = $1::uuid AND task_run.task_key = $2::varchar AND task_run.dynamic_key = $3::varchar
LIMIT $4::integer
2021-12-07 23:04:20.626 CET [70132] root@orion DETAIL: Parameter: $1 = '8d007cd6-b64d-48e5-8723-0c10a26988b8', $2 = 'ec9685da70232eae82ef1a236e37c301', $3 = '0', $4 = '1'
2021-12-07 23:04:20.626 CET [70132] root@orion LOG: Dauer: 0.045 ms
2021-12-07 23:04:20.632 CET [70132] root@orion LOG: Dauer: 0.065 ms
2021-12-07 23:04:20.632 CET [70132] root@orion LOG: Dauer: 0.098 ms
2021-12-07 23:04:20.632 CET [70132] root@orion LOG: Ausführen __asyncpg_stmt_11__: UPDATE task_run SET state_type=$1, expected_start_time=$2::timestamp with time zone WHERE
task_run.id = $3::uuid
2021-12-07 23:04:20.632 CET [70132] root@orion DETAIL: Parameter: $1 = 'PENDING', $2 = '2021-12-07 23:04:20.538541+01', $3 = 'ef312c3e-da62-4619-8089-a592fcce2d87'
2021-12-07 23:04:20.633 CET [70132] root@orion LOG: Dauer: 0.226 ms
2021-12-07 23:04:20.634 CET [70132] root@orion LOG: Dauer: 0.175 ms
2021-12-07 23:04:20.634 CET [70132] root@orion LOG: Ausführen __asyncpg_stmt_d__: SELECT flow_run.id AS flow_run_id, flow_run.created AS flow_run_created, flow_run.updated
AS flow_run_updated, flow_run.name AS flow_run_name, flow_run.state_type AS flow_run_state_type, flow_run.run_count AS flow_run_run_count, flow_run.expected_start_time AS fl
ow_run_expected_start_time, flow_run.next_scheduled_start_time AS flow_run_next_scheduled_start_time, flow_run.start_time AS flow_run_start_time, flow_run.end_time AS flow_r
un_end_time, flow_run.total_run_time AS flow_run_total_run_time, flow_run.flow_version AS flow_run_flow_version, flow_run.parameters AS flow_run_parameters, flow_run.idempot
ency_key AS flow_run_idempotency_key, flow_run.context AS flow_run_context, flow_run.empirical_policy AS flow_run_empirical_policy, flow_run.empirical_config AS flow_run_emp
irical_config, flow_run.tags AS flow_run_tags, flow_run.auto_scheduled AS flow_run_auto_scheduled, flow_run.flow_id AS flow_run_flow_id, flow_run.deployment_id AS flow_run_d
eployment_id, flow_run.parent_task_run_id AS flow_run_parent_task_run_id, flow_run.state_id AS flow_run_state_id, flow_run_state_1.id AS flow_run_state_1_id, flow_run_state_
1.created AS flow_run_state_1_created, flow_run_state_1.updated AS flow_run_state_1_updated, flow_run_state_1.type AS flow_run_state_1_type, flow_run_state_1.timestamp AS fl
ow_run_state_1_timestamp, flow_run_state_1.name AS flow_run_state_1_name, flow_run_state_1.message AS flow_run_state_1_message, flow_run_state_1.state_details AS flow_run_st
ate_1_state_details, flow_run_state_1.data AS flow_run_state_1_data, flow_run_state_1.flow_run_id AS flow_run_state_1_flow_run_id
FROM flow_run LEFT OUTER JOIN flow_run_state AS flow_run_state_1 ON flow_run_state_1.id = flow_run.state_id
This is due to the ORM that likes to know the actual values of e.g. primary keys generated by the DB in processing the INSERT/UDATE. This would be standard operation of SA before PG8.2. But after PG8.3 the INSERT and UPDATE statements got a new parameter called RETURNING which SQLalchemy supports. With this parameter SQLalchemy does only need do a single INSERT ... RETURNING ... or UPDATE ... RETURNING statement to get the PG created values. With other words the SELECTs are superfluous.
But although SQLalchemy can use RETURNING with the PG dialect, it is not used for the Orion code - doubling the number of statements. I am not sure if this is due to a bug in SQLalchemy OR due to the questionable choice of your developers to use UUIDs as primary keys - which may hinders SA OR PG to use RETURNING (the SA docs are not so clear about this, only talking about integer primary keys).
I dug so deep at this point since for our simulation example for Orion with 21 Tasks and 6 flows with 5 seconds runtime the number of INSERTs is 6800 and then number of SELECTs is 17500. Since we have 66 seconds runtime (wall time) this derives to 100 INSERTS per second in the PG DB. Not bad for PG. But this looks a bit like an overkill to me. No wonder that the overhead utilizing Orion is so dramatically.
By fixing the RETURNING problem mentioned above the number of SELECTS would be dramatically reduced since it is in the order of magnitude of the number of INSERT and UPDATE statements. In other words there are roughly 14000 SELECTS not necessary.
The problem with the number of statements is not only the runtime of the individual statements but also the (Net/IO) latency involved with each statement.
I also got the feeling the the UPDATE statement is comletely redundant. It happens in any case in the same transaction as the INSERT and covers the same table. A part of the cause for this is in orm_models.py lines 63ff:
# onupdate is only called when statements are actually issued
# against the database. until COMMIT is issued, this column
# will not be updated
updated = sa.Column(
Timestamp(),
nullable=False,
index=True,
server_default=now(),
default=lambda: pendulum.now("UTC"),
onupdate=now(),
)
Commenting out the onupdate handler removes the "updated" column from the UPDATE statement. But the Update of the "state_type" still remains. I did not found the source for the delayed "state_type", so I leave this for you as homework 🙂
These findings were consistent on Ubuntu 18/PG10 and Debian 12/PG14.
Cheers,
VolkerZanie
volkerjaenisch
12/10/2021, 3:30 AMvolkerjaenisch
12/10/2021, 3:58 PMfrom prefect import flow, task
@task
def my_task():
return 1
@flow
def my_flow():
return my_task()
print(my_flow())
Resulting in 100 Database operations. I would think that one INSERT per Flow/Subflow/Task is reasonable. Then 9 UPDATEs for the state chances on the three nodes. Also 3 SELECTS to gain insight in what to do next. This would result in essentially 15 DB operations. So Orion wastes nearly an order of magnitude more DB operations than needed.Zanie
RETURNING
/ SELECT
optimization across different database backends.volkerjaenisch
12/10/2021, 4:52 PMZanie
volkerjaenisch
12/12/2021, 6:11 AMvolkerjaenisch
12/16/2021, 6:09 AMZanie
RETURNING
issue which it turns out is a SQLAlchemy bug https://github.com/sqlalchemy/sqlalchemy/issues/7438 which we’ve contributed a documentation fix for until the bug is resolved. We’ve also been exploring memory performance and resolved some issues with unused database session factories.Zanie
volkerjaenisch
12/18/2021, 3:31 PMvolkerjaenisch
12/19/2021, 11:16 PMimport uuid
from timeit import timeit
from prefect.orion.utilities.database import GenerateUUID, UUID
from sqlalchemy import create_engine, Integer, Column
from sqlalchemy.orm import declarative_base, sessionmaker
engine = create_engine('<postgresql+psycopg2://root>:@localhost/test')
session_maker = sessionmaker()
session_maker.configure(bind=engine)
Base = declarative_base()
class UserUUID(Base):
__tablename__ = 'user_uuid'
id = Column( 'id',
UUID(),
primary_key=True,
server_default=GenerateUUID(),
default=uuid.uuid4,
)
other_id = Column('other_id',
Integer,
)
class UserID(Base):
__tablename__ = 'user_id'
other_id = Column('other_id',
UUID(),
server_default=GenerateUUID(),
default=uuid.uuid4,
)
id = Column(Integer, primary_key=True)
Base.metadata.create_all(engine)
session = session_maker()
COUNT = 10000
def run_id():
for i in range(COUNT):
ui = UserID()
session.add(ui)
session.commit()
def run_uuid():
for i in range(COUNT):
ui = UserUUID()
session.add(ui)
session.commit()
print('id: {}'.format(timeit(stmt=run_id, number=3)))
print('uuid: {}'.format(timeit(stmt=run_uuid, number=3)))
Count ID UUID
10.000 3.9 4.0
100.000 41.8 46.4
1.000.000 424.3 454.2
I assume that the overhead of creating the UUIDs is much larger than the index calculation for the inserts.
Why do you need UUIDs in the first hand? They are just IDs, also they are generated by the DB itself. So global uniqueness cannot be the reason.
I will look into the select and joins tomorrow.volkerjaenisch
12/19/2021, 11:50 PMdef run_id():
for i in range(COUNT):
ui = UserID()
session.add(ui)
session.commit()
volkerjaenisch
12/19/2021, 11:56 PMimport uuid
from functools import partial
from timeit import timeit
from prefect.orion.utilities.database import GenerateUUID, UUID
from sqlalchemy import create_engine, Integer, Column
from sqlalchemy.orm import declarative_base, sessionmaker
engine = create_engine('<postgresql+psycopg2://root@localhost/test>')
session_maker = sessionmaker()
session_maker.configure(bind=engine)
Base = declarative_base()
class UserUUID(Base):
__tablename__ = 'user_uuid'
id = Column( 'id',
UUID(),
primary_key=True,
server_default=GenerateUUID(),
default=uuid.uuid4,
)
other_id = Column('other_id',
Integer,
)
class UserID(Base):
__tablename__ = 'user_id'
other_id = Column('other_id',
UUID(),
server_default=GenerateUUID(),
default=uuid.uuid4,
)
id = Column(Integer, primary_key=True)
Base.metadata.create_all(engine)
session = session_maker()
COUNTS = [10,100,1000,10000]
def run_id(count):
for i in range(count):
ui = UserID()
session.add(ui)
session.commit()
def run_uuid(count):
for i in range(count):
ui = UserUUID()
session.add(ui)
session.commit()
for count in COUNTS:
print('count: {}'.format(count))
print('id: {}'.format(timeit(stmt=partial(run_id, count), number=3)))
print('uuid: {}'.format(timeit(stmt=partial(run_uuid, count), number=3)))
volkerjaenisch
12/20/2021, 6:40 AMZanie
Zanie
volkerjaenisch
12/20/2021, 5:24 PM