Brad
04/23/2020, 11:12 PMDylan
04/23/2020, 11:12 PMBrad
04/23/2020, 11:13 PMResult
like table for example, and rerun only those tasks/flows would be very powerfulDylan
04/23/2020, 11:14 PMBrad
04/23/2020, 11:14 PMDylan
04/23/2020, 11:15 PMBrad
04/23/2020, 11:16 PMDylan
04/23/2020, 11:16 PMBrad
04/23/2020, 11:16 PMDylan
04/23/2020, 11:16 PMBrad
04/23/2020, 11:17 PMDylan
04/23/2020, 11:17 PMResult
interface, which you can read about here: https://docs.prefect.io/core/PINs/PIN-16-Results-and-Targets.html#pin-16-results-and-targetsException
would be an interesting result typeDavid Ojeda
04/24/2020, 3:04 PMclass GracefulFail(Success): class
color = '#dd1c77'
class GRACEFULFAIL(PrefectStateSignal):
_state_cls = GracefulFail
and we have our task do raise GRACEFULFAIL('what happened')
Brad
04/24/2020, 11:46 PMDavid Ojeda
04/25/2020, 10:16 AMprefect.Task
that wraps the run
method so we can raise expected (related to soft fail) or unexpected (related to hard fails) exceptions. This custom task catches the expected exceptions and wraps them in a state and endrun signal.
In our team, we have an example similar to your example. It is a flow that is like this:
with Flow('example') as flow:
files = query()
clean_files = clean.map(file=files)
features = process.map(file=clean_files)
summary = gather(features)
The clean
task takes a file, extracts a signal and does some filtering. It can happen that the signal is saturated. When >10% of the samples are saturated, we soft fail. In that case, we raise a soft fail with an “empty” result that is usually an empty dataframe; the process
and gather
tasks are coded to handle dataframes as inputs and can either soft fail with an empty input or just generate an empty output because they are designed to work with empty inputs as well.
On the other side, when clean
encounters a corrupted file, it will hard fail. This will stop the flow execution after the clean
dag node and will result in some very visible status in the UI or in our logs. This is what we want: we will either remove this file from our data source, or remove it from the query.
In some cases, clean
will encounter an unexpected exception, due to human-error when coding that function. This is also something that we want to be very visible so we can fix it as soon as possible.