https://prefect.io logo
Title
f

Federico Zambelli

04/15/2023, 2:13 PM
Hey folks, is there a way to force a task to run in the main thread? I'm using PMAW's PushShiftAPI to get some historical reddit data, but when I decorate my function with
@task
, I get the following error:
ValueError: signal only works in main thread of the main interpreter
This is, from what I understand, due to the fact that pmaw is configured to listen to a SIGKILL from the CLI, and gracefully interrupt the process, which I guess can't happen if it's running in a separate thread. The moment I remove
@task
decorator from that function, then I can execute it normally inside a flow, but I obviously lose all the advantages of tasks such as caching and whatnot. For the record, here's the function I'm talking about:
def get_submission_ids(start_date: str, end_date: str, subreddit: str):
    reddit = get_reddit_client()  # Reddit client from `praw`
    api_praw = PushshiftAPI(praw=reddit)
    start_datetime = datetime.strptime(start_date, "%Y-%m-%d")
    start_date_ts = int(start_datetime.timestamp())
    end_date_ts = int(datetime.strptime(end_date, "%Y-%m-%d").timestamp())
    search_window_days = (datetime.today() - start_datetime).days

    submissions = await api_praw.search_submissions(
        subreddit=subreddit,
        after=start_date_ts,
        until=end_date_ts,
        search_window=search_window_days,
    )

    return [f'{SUBMISSION}_{sub["id"]}' for sub in submissions]