Our team is in GCP, but there are a lot of parallels to a Flow we have running right now. Our steps are almost identical to yours, except we gather the source data, save it directly to GCS, and in a downstream task we do validation. We made this decision because the source we’re extracting from is flaky at best in terms of reliability, so we wanted minimal errors on our side between api response & downstream processing. This means we made a decision that we’d actively make our processing slower, but it works for us. I don’t see why reversing the order (validate after getting api response) would be a bad practice.
We’re using snowflake, and we have steps 4 and 5 bunched into one task. Our task creates a staging table (unique to this task), copies data in, does the merge into the actual table, then cleans up. We did this because we wanted the entire task to be idempotent, but looking back at it, I think that mentality came from our team being so heavily influenced by Airflow.
Not prefect specific, but one thing I would note is that if you aren’t using a task specific staging table (marked by date, uuid, or something else), having concurrent flow runs could result in some weird race conditions of both flows updating the staging table.