Scott Moreland
11/23/2020, 12:58 PMsql_context.read.table
. Are there any references for this?Kyle Moon-Wright
11/23/2020, 4:47 PMScott Moreland
11/23/2020, 5:01 PMKyle Moon-Wright
11/23/2020, 6:12 PMScott Moreland
11/23/2020, 6:22 PMKyle Moon-Wright
11/23/2020, 6:52 PMtarget
and location
as both being a part of the write part of the Result, not the read (target
only checks for existence). You may need to customize a task/logic to check for your table’s existence before doing your Result write - I can’t think of a way to do this cleanly otherwise, but I will continue to think about it.Scott Moreland
11/23/2020, 7:06 PMsql_context = create_sql_context()
db_result = HiveResult(sql_context, location='task_output_table_name')
@task(target='task_output_table_name', result=db_result)
def create_table(sql_context):
"""Transform a table"""
sdf = sql_context.read.table('database.src_table_name')
sdf = sdf.groupby('col1').agg(sf.sum('col2').alias('sum'))
db_result.write(sdf)
return sdf
Dealing with something like this and trying to avoid duplicating the persistent table name in both location and target. Also wondering if I really need to manually specify db_result.write(sdf)
as it sounds like this should happen automatically when the target doesn't exist and needs to be rebuilt.Kyle Moon-Wright
11/23/2020, 7:18 PMwrite
method on your HiveResult/db_result
.Scott Moreland
11/23/2020, 7:20 PMKyle Moon-Wright
11/23/2020, 7:28 PMcreate_table
task runs and sees that the target exists?Scott Moreland
11/23/2020, 9:35 PMprefect.config.checkpointing
to True via the associated environment variable. Thanks again for the help!Kyle Moon-Wright
11/23/2020, 9:40 PM