Hi! is there a option in prefect something similia...
# ask-community
s
Hi! is there a option in prefect something similiar to smart sensor in airflow?
a
Happy New Year! Yes, there is, albeit we don’t call it that way. You can raise a RETRY signal until the file will arrive, effectively polling every X seconds/minutes. Here is an example polling for a file in S3:
Copy code
import pendulum
from prefect.engine.signals import RETRY
import awswrangler as wr


def check_if_file_arrived_in_s3():
    return wr.s3.does_object_exist("<s3://bucket/example_file.csv>")


@task
def s3_sensor(**kwargs):
    bool_s3_object_arrived = check_if_file_arrived_in_s3()
    if bool_s3_object_arrived is False:
        raise RETRY(
            "File not available yet, retrying in 20 seconds.",
            start_time=pendulum.now().add(seconds=20),
        )
c
Interesting. does every retry count as a successful task run? ( towards the successful task run credits in cloud )
a
nope, you can see that on the pricing page - retries don’t create new task runs, they only add new state history to the existing task run
c
brilliant, thanks!
s
@Anna Geller We are using k8s workers, will the worker always run for this task or it will go off and start again after the retry interval.
a
you can imagine this as one very long flow run because that’s what “sensors” are - long running, mostly idle processes that do nothing but checking for some condition and eventually doing something once this condition is satisfied
so yes, your pod will need to be up all the time until the flow run is finished
s
ok
@Anna Geller We have an task which triggers long running job, we don't want our flow run pod to run all the time till that job completes. We want to periodically spin up a pod and check the status, is there a way?
a
What problem do you try to solve using sensors? Are you waiting for some file arriving somewhere? Perhaps you can do it event-based? What infrastructure do you run on - AWS/GCP/Azure/on-prem?
s
We run on AWS, We trigger spark jobs using Prefect tasks, We want to get the status of spark jobs using sensor.
a
Gotcha. In that case I think that sensor is not really suitable for that. You could instead e.g. create a flow run from a Spark job - this would be way more efficient and without idle waiting/polling. For instance, you can have one flow triggering Spark job, and another flow sending you a notification when this flow run is successful (or performing some other action)
Just yesterday we had a discussion about Spark on AWS - sharing in case it may be useful https://prefect-community.slack.com/archives/CL09KU1K7/p1641166768242700?thread_ts=1641151695.238700&amp;cid=CL09KU1K7