• dh

    dh

    2 years ago
    Hi all, thanks for making this amazing project for the world. A quick intro: I work at a computer vision company that works with Giga-pixel images and I am doing preliminary investigation for tools that can manage our machine learning workflows. Prefect looks very interesting. A question --- is there a web page where I can better understand the current level of adoption "in production systems" for Prefect? I read a couple blogposts from early users and (presumably) customers. That said, I wonder if there's one that'd give a more holistic, up-to-date view. Thank you.
    dh
    Zachary Hughes
    +1
    15 replies
    Copy to Clipboard
  • a

    Avi A

    2 years ago
    I’m using
    GCSResult
    in my flow and getting errors from GCS (caused by the
    GCSResults
    object trying to store task results:
    Unexpected error: SSLError(MaxRetryError("HTTPSConnectionPool(host='<http://storage.googleapis.com|storage.googleapis.com>', port=443): Max retries exceeded with url: /upload/storage/v1/b/prefect-bucket/o?uploadType=multipart (Caused by SSLError(OSError(24, 'Too many open files')))"))
    The reason is (probably) that I have many mapped tasks, and they are running in parallel. It’s a 32-core machine, but I can’t really figure out how Prefect/Dask decides how many run in parallel. Sometimes it’s more than 32. The bigger problem here is that I think it caused the flow to get stuck, so I didn’t even have the indication for the failure and couldn’t go and restart it. Anyhow, any suggestions on how to overcome this? Or at the very least cause my flow to fail on such things so that I can restart? Does it have anything to do with the fact that these tasks have a retry option?
    a
    Jim Crist-Harif
    7 replies
    Copy to Clipboard
  • z

    Zach Angell

    2 years ago
    Is there a recommended way to create custom slack notifications? (For example - craft a message using data from a task) I have slack notifications working for task states, just wondering how best to customize
    z
    Zachary Hughes
    6 replies
    Copy to Clipboard
  • p

    Philip MacMenamin

    2 years ago
    Where is the docs for how the DAG is generated? eg, if I have
    a = task_a()
    b = task_b(a)
    task_c(upstream_tasks=[task_b])
    This produces a strange looking DAG. I would expect that to produce a DAG like task_a ---> task_b ----> task_c
    p
    a
    +3
    50 replies
    Copy to Clipboard
  • b

    Benjamin

    2 years ago
    Hello everyone. I'm trying to run a simple task using prefect server and I'm having some trouble with it. The idea is to read a parquet dataset, standardize the feature columns and write the result to a csv file. I'm using dask and dask-ml to process the data and code looks like:
    with Flow("standardization") as flow:
            df = read_feats(bucket, file_pattern) # a task that returns a dask dataframe using dd.read_parquet(...)
            scaled_df = scale(df) # prefect task that does the standardization using dask-ml StandardScaler
            write_csv_to_s3(scaled_df, bucket, output_file) # this will write the dask dataframe as a csv to an s3 bucket
    This is a very simple POC just to get things running and feel prefect in action. I'm using version 0.12. Everything runs smoothly if I run the flow locally using flow.run with a remote DaskExecutor in a FargateCluster (using dask-cloudprovider):
    executor = DaskExecutor(...) # cluster parameters setting fargatecluster from cloudprovider and cluster_kwargs
    flow.run(executor=executor)
    I start having problem if I try to run it using a local prefect agent connected to a local prefect server after registering it:
    executor = DaskExecutor(...) # same parameters as before
    flow.environment = LocalEnvironment(executor=executor)
    flow.register()
    The agent will deploy the flow, it will create FargateCluster normally and we can see the tasks registered at Dask UI Tasks stream but no processing actually happens. It desearializes the first task and do nothing, then does the same with the second and third tasks. Any idea what am I doing wrong here?
    b
    Jim Crist-Harif
    41 replies
    Copy to Clipboard
  • l

    Luis Muniz

    2 years ago
    Hi I apologize if this is not the appropriate channel, but I'm trying to get a paid plan running on prefect.io and it looks to be unexpectedly difficult for a cloud service.
    l
    Chris White
    11 replies
    Copy to Clipboard
  • Alfie

    Alfie

    2 years ago
    Hi folks, I see the comparison between Prefect and Airflow, but is there any comparison between Prefect and DolphinScheduler? Thanks.
    Alfie
    l
    4 replies
    Copy to Clipboard
  • Vikram Iyer

    Vikram Iyer

    2 years ago
    Is there a doc link that has list of all the environment variables?
    Vikram Iyer
    1 replies
    Copy to Clipboard
  • m

    Matthias

    2 years ago
    Hi! What is the proper way to kick off a flow run while executing something? e.g. I have someone click a button on a website and I would like to async kick off the flow run on my Dask Environment. Just calling
    flow.run()
    is running synchronous here, right?
    m
    j
    2 replies
    Copy to Clipboard
  • j

    Jacques

    2 years ago
    Is there a good way to have a task generate two lists (e.g. error_items, and a processed_items as output and map each list to a different downstream task?
    j
    Jeremiah
    +1
    7 replies
    Copy to Clipboard