• d

    Dmitry Dorofeev

    3 years ago
    Hi all, assuming I have a config file with unknown number of jobs defined. Each job should run in parallel. I can easily define a prefect Flow for each job, but how can I fire all Flows from one python process? What if I want to fire each Flow periodically with cron scheduler? Currently I fork() separate python process for each Flow, is that right?
    d
    a
    +1
    10 replies
    Copy to Clipboard
  • j

    Jonah Benton

    3 years ago
    Hi folks, dumb questions I'm having a hard time finding answers to- What happens in a Dask cluster if/when your dask-scheduler process dies? Tasks continue to run to completion on workers but nothing new gets scheduled? Can you just restart the scheduler and work picks up? Is workflow state lost if the scheduler dies; if not, where does the scheduler keep it? Happy to be pointed to documentation on these questions, I just don't see lower level operational details discussed in the Dask site docs.
    j
    Jeremiah
    3 replies
    Copy to Clipboard
  • f

    Feliks Krawczyk

    3 years ago
    Just a quick question. With Prefect, I assume you are developing the code in such a way that things like this:https://issues.apache.org/jira/browse/AIRFLOW-5240 Don’t happen?
    f
    Jeremiah
    3 replies
    Copy to Clipboard
  • f

    Feliks Krawczyk

    3 years ago
    I mean #python, and external open source projects being bad at updating things.. but wow so frustrating when you run into these types of issues 😆
  • m

    Mikhail Akimov

    3 years ago
    Where do the task_state_handlers get called? In the main flow (scheduler) process or on the workers (dask, subprocess etc.) I'm passing an object to task_state_handlers through a closure, and with sequential/threads LocalDaskExecutor they work OK, but with processes it's clear that the passed object isn't the same instance for all task runs I was under the impression that the scheduler catches state changes and calls all the callbacks, while the executor is only responsible for, well, the execution of work.
    m
    Chris White
    +1
    12 replies
    Copy to Clipboard
  • emre

    emre

    3 years ago
    I have a task that maps on a list of
    table_names
    and loads some files in and s3 bucket, under prefix=
    table_name
    . I wanted to add a
    state_handler
    that deletes any leftover file from the failed task, should my task go in a
    Failed
    state. The problem is that the files I want to delete are identifiable by a task input. I have failed to find a way to access input parameters from the state_handler callback. I thought
    task.inputs()
    would get me what I wanted, but that had only type informations. Any suggestions?
    emre
    m
    +1
    6 replies
    Copy to Clipboard
  • j

    Jan Therhaag

    3 years ago
    Hi - I have a question regarding the usage of Dask collections in prefect flows. Basically what I'm trying to accomplish is:- reading a bunch of parquet files from disk repeatedly (say once a day) - combine them into a Dask dataframe and do several transformations - write out the dataframe to a Kartothek dataset (basically also a collection of parquet files with metadata if you don't know the package)
  • j

    Jan Therhaag

    3 years ago
    Basically I wonder what happens when I create a Dask dataframe within a task - can I still take advantage from parallelization over partitions of the collection when transforming it?
    j
    Chris White
    +1
    4 replies
    Copy to Clipboard
  • a

    Aakarsh Nadella

    3 years ago
    Hello Everyone, I have 2 questions. 1. Can we have more than 1 DAGs running concurrently in the same environment ? 2. Can we have executing different instances of same DAG running concurrently, with each instance having different parameters ?
    a
    Chris White
    8 replies
    Copy to Clipboard