Deceivious
06/08/2023, 8:22 AMDeceivious
06/08/2023, 8:25 AMblock_document
table is growing at an alarming rate. ~1-1.2 GB perdayDeceivious
06/08/2023, 8:26 AMselect
bt."name" as block_type ,
extract(month from bd.created) as month,
extract(day from bd.created) as day, count(1)
from block_document bd , block_type bt
where
bd.block_type_id =bt.id
group by
bt."name" ,
extract(month from bd.created),
extract(day from bd.created)
order by block_type,month desc,day desc
Result.
Azure 6 8 102171
Azure 6 7 252417
Azure 6 6 247796
Azure 6 5 251646
Azure 6 4 252944
Azure 6 3 252332
Azure 6 2 252440
Azure 6 1 253898
Azure 5 31 254765
Azure 5 30 254290
Azure 5 29 254048
Azure 5 28 254570
Azure 5 27 254046
Azure 5 26 254614
Azure 5 25 254965
Azure 5 24 253799
Azure 5 23 248840
Azure 5 22 247909
Azure 5 21 254712
Azure 5 20 255658
Azure 5 19 246496
Azure 5 18 252260
Azure 5 17 252821
Azure 5 16 252160
Azure 5 15 251961
Azure 5 14 252362
Azure 5 13 253300
Azure 5 12 252055
Azure 5 11 245653
Azure 5 10 251813
Azure 5 9 251273
Azure 5 8 231937
Azure 5 7 226121
Azure 5 6 221572
Azure 5 5 59424
Date Time 5 12 1
Date Time 5 11 1
Kubernetes Job 6 6 14
Kubernetes Job 6 5 14
Kubernetes Job 6 1 12
Kubernetes Job 5 31 1
Kubernetes Job 5 30 1
Kubernetes Job 5 26 1
Kubernetes Job 5 25 3
Kubernetes Job 5 24 11
Kubernetes Job 5 23 22
Kubernetes Job 5 19 23
Kubernetes Job 5 16 1
Kubernetes Job 5 15 1
Kubernetes Job 5 10 1
Kubernetes Job 5 8 14
Kubernetes Job 5 5 11
Local File System 6 8 942
Local File System 6 7 2176
Local File System 6 6 2183
Local File System 6 5 2198
Local File System 6 4 2198
Local File System 6 3 2191
Local File System 6 2 2214
Local File System 6 1 2064
Local File System 5 31 1912
Local File System 5 30 1901
Local File System 5 29 1898
Local File System 5 28 1893
Local File System 5 27 1909
Local File System 5 26 1911
Local File System 5 25 1899
Local File System 5 24 1895
Local File System 5 23 1900
Local File System 5 22 1905
Local File System 5 21 1903
Local File System 5 20 1904
Local File System 5 19 1191
Local File System 5 18 304
Local File System 5 17 304
Local File System 5 16 300
Local File System 5 15 301
Local File System 5 14 302
Local File System 5 13 304
Local File System 5 12 399
Local File System 5 11 294
Local File System 5 10 292
Local File System 5 9 297
Local File System 5 8 279
Local File System 5 7 253
Local File System 5 6 255
Local File System 5 5 79
Deceivious
06/08/2023, 8:28 AMDeceivious
06/08/2023, 8:31 AMDeceivious
06/08/2023, 8:31 AMDeceivious
06/08/2023, 8:32 AMDeceivious
06/08/2023, 8:33 AMDeceivious
06/08/2023, 8:34 AMNate
06/08/2023, 1:58 PMsave
on itDeceivious
06/08/2023, 1:59 PM.load()
method.Nate
06/08/2023, 1:59 PMDeceivious
06/08/2023, 2:00 PMDeceivious
06/08/2023, 2:02 PM@task(.....result_starage=Azure.load(<NAMEHERE>).....)
Deceivious
06/08/2023, 2:03 PMDeceivious
06/08/2023, 2:12 PMDeceivious
06/08/2023, 2:13 PMcached_task
as decorator - just tasks that are primed with standard parameters.alex
06/08/2023, 2:17 PMis_anonymous
set to True
? When Prefect creates block documents on the user’s behalf, we set that equal to True
, so if you have a large number of anonymous blocks, something within Prefect is likely causing your high number of block documents.Deceivious
06/08/2023, 2:19 PMDeceivious
06/08/2023, 2:20 PMDeceivious
06/08/2023, 2:20 PMDeceivious
06/08/2023, 2:22 PMNate
06/08/2023, 2:23 PMDeceivious
06/08/2023, 2:28 PMDeceivious
06/08/2023, 2:32 PMDeceivious
06/08/2023, 2:33 PMDeceivious
06/08/2023, 2:37 PMwiih_options
. Would using Task.with_options
cause this issue?alex
06/08/2023, 2:49 PMresult_storage
is causing the overzealous block saving (which is my hunch) then we’ll want to check if the block that you’re passing has _block_document_id
set. If it doesn’t then that would prompt the results functionality to save a new block document.Deceivious
06/08/2023, 2:52 PMAzure.load("name")
into the result_storage
parameter. Would there be any cases where Azure.load
would return object with no _block_document_id
attribute?Deceivious
06/08/2023, 2:52 PMhttps://prefect-community.slack.com/files/U03RN8W7DPU/F05B6J0QEB1/image.png▾
Deceivious
06/08/2023, 2:53 PMalex
06/08/2023, 2:54 PM.load
should always set _block_document_id
on the returned object, but it might be getting taken off somewhere. This is worth checking because the presence of _block_document_id
determines whether or not a new block document is createdDeceivious
06/09/2023, 9:23 AMDeceivious
06/09/2023, 11:03 AMimport datetime
import prefect
from prefect import flow,task
from prefect.filesystems import Azure
from prefect.serializers import JSONSerializer
from prefect.tasks import task_input_hash
from datetime import timedelta
def get_cache_parameters():
cache_params = {
"cache_key_fn": task_input_hash,
"persist_result": True,
"result_storage": Azure.load("test-block"),
"result_serializer": JSONSerializer(jsonlib="json"),
"cache_expiration":timedelta(minutes=1)
}
return cache_params
@task
def mock_test(i:int,now):
prefect.get_run_logger().info(f"{i}{now}")
@flow
def flow():
now= datetime.datetime.now()
for i in range(10):
mock_test.with_options(name=f"Name_{i}",**get_cache_parameters())(i,now)
flow()
Deceivious
06/09/2023, 11:04 AMtest-block
Azure block must be pre-created. Everytime the code is run - a new block gets created in the block_document
table.Deceivious
06/09/2023, 11:04 AMDeceivious
06/14/2023, 8:39 AMDeceivious
06/14/2023, 8:42 AM_block_document_id
is present.alex
06/14/2023, 1:44 PM