Hi. I have a flow that does the following:
• query snowflake for a list of URLs (there may up to 100k URLs)
• create a list of lists (create chunks of 120 URLs in each sublist)
• a mapped task processes each chuck/batch of URLs and calls an API for each URL. An
is configured for this task so the output of the API lands in S3 in json format.
• A pipe in Snowflake will detect and load these files
My question: is relying on the Prefect results an acceptable way to save data to S3 or should I have an explicit task that will handle writing to S3?In this approach (S3Results), will the data stay in memory until the end of the process or will the memory be released when the data is written to S3?
9 months ago
I would add an explicit task to load to S3 so that you can have more control over how the data looks like so that it’s in the right format to be consumed by Snowpipe