Hi, I have a flow like this A -> List -> Dict -> B, Result of A is persisted in S3, Whenever B fails and i try to restart the flow Dict task supplies empty data to B since its result is not persisted. Dict is an internal task which we don’t have any control, How we can overcome this?
s
Sylvain Hazard
03/09/2022, 8:08 AM
Hello !
Collection tasks are used implicitly but they still exist as usual tasks if you need to use them with specific parameters. Check out this doc.
s
Suresh R
03/09/2022, 8:12 AM
It won’t be counted as task for billing if i use it explicitly right?
s
Sylvain Hazard
03/09/2022, 8:13 AM
Not a Cloud user myself, you might have to wait for a Prefect team member to answer that one, sorry.
s
Suresh R
03/09/2022, 8:14 AM
Ok
a
Anna Geller
03/09/2022, 9:30 AM
@Sylvain Hazard is 100% correct that those tasks are added implicitly. You can avoid those in many cases if you rewrite your tasks a bit. Here is an example: https://discourse.prefect.io/t/how-to-avoid-tasks-such-as-list-tuple-or-dict-in-a-flow-structure/318
Regarding billing: those tasks should not count to Billing since they take less than a second to run - see example in the image below using this sample flow:
Copy code
import random
import time
from prefect import Flow, task
@task
def a_number():
time.sleep(2)
return random.randint(0, 100)
@task
def get_sum(x):
time.sleep(2)
return sum(x)
with Flow("Using_Collections") as flow:
a = a_number()
b = a_number()
s = get_sum([a, b])
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.