https://prefect.io logo
d

Deceivious

08/02/2023, 9:32 AM
HI Prefect Devs, Continuing the clean up processes of Prefect OSS database, I have more questions , this time regarding the
block_document
table. Question and details in thread. TLDR: Why are there so many
Local File System
records in
block_document
table, even when I am not using Local File system directly on my code? Is it safe to just delete them?
My prefect production database seems to have large number of records in
block_document
table. I did a quick analysis.
Copy code
select bt.name,count(1) from block_document bd , block_type bt 
where bt.id=bd.block_type_id 
and bd.updated >= now() - interval '1 week'
group by bt.name;
Results are as follows:
Copy code
"name","count"
Kubernetes Job,19
Azure,1
Date Time,11
Local File System,17072
NOTE: This is data for only 1 week. I decrypted the
data
with the Fernet key. The decrypted values are the same and refer to the PREFECT_DIRECTORY base path. This is pain ful as the table is estimated to be 50+ GB for us and most of the values are just encrypted duplicates of the same value.
SO I ran a sample flow in a fresh Prefect Database.
Copy code
import prefect
from prefect import flow, task


@task
def ta():
    prefect.get_run_logger().info("THis is from task")


@flow
def sub_flow():
    prefect.get_run_logger().info("This is sub flow")
    ta()


@flow
def main_flow():
    prefect.get_run_logger().info("This is main flow")
    sub_flow()
    ta()


if __name__ == "__main__":
    main_flow()
This created a single record in block document. I took the dump sql of the entire database in text format, searched for the ID of the block document record and there are no references to that ID in any other tables.
Unsure as to why a record is being created in the block document with every flow run. I would like to know if it is safe to run the following sql.
Copy code
select bd.* from block_document bd 
where bd.updated < now()-interval '1 week'
and is_anonymous = True
and exists (
	select 1 from block_type bt where bt.id=bd.block_type_id 
	and bt.name='Local File System'
)
*delete instead of select
Tagging @Jake Kaplan cuz hes the man 😄
j

Jake Kaplan

08/02/2023, 1:40 PM
I don't think it's dangerous to delete. Would you mind filing an issue for this? I am pretty sure this a bug? I don't see why we need a different anonymous block every time. I believe it's for default result storage path, which I would think could get or create the same block
d

Deceivious

08/02/2023, 1:41 PM
I think a duplicate bug has already been filed by me ages ago.
Let me try find it
🙏 1
This is not the same but similar