• Bogdan Serban

    Bogdan Serban

    2 months ago
    Hello everyone! I am building a flow where inference on a pytorch model is mapped on a series of images which are read from the disk. Currently, I am loading the model within a dedicated task and passing it as an argument to the downstream
    map
    function in an
    unmapped
    call. I am receiving an warning related to the size of the ml model that is being shared in the task graph (more details in the thread). Is there an issue if I am receiving that error message? And is there a better way to share that ML model across the tasks?
    Bogdan Serban
    Anna Geller
    12 replies
    Copy to Clipboard
  • Mathieu Cayssol

    Mathieu Cayssol

    2 months ago
    Hey guys, I'm totally new to prefect. I'm trying to use it, but just after pip install prefect in my conda environnement, I have a module import error.
    Mathieu Cayssol
    Anna Geller
    2 replies
    Copy to Clipboard
  • Jan

    Jan

    2 months ago
    Hi community, I seem to be completely misunderstanding the principle of result within flows. My python code below does not print "just a string" to my prompt. If I check the vars i'm able to disect it down to mijn_flow.result which gives me a uniterable object.
    from prefect import task, flow
    import time
    @flow
    def mijn_flow ():
        return("just a string")
    
    print(mijn_flow())
    The output is: 13:54:51.255 | INFO | prefect.engine - Created flow run 'brainy-parakeet' for flow 'mijn-flow' 13:54:51.271 | INFO | Flow run 'brainy-parakeet' - Using task runner 'ConcurrentTaskRunner' 13:54:51.302 | WARNING | Flow run 'brainy-parakeet' - No default storage is configured on the server. Results from this flow run will be stored in a temporary directory in its runtime environment. 13:54:51.443 | INFO | Flow run 'brainy-parakeet' - Finished in state Completed() Completed() Can you point me into the correct direction? I'm trying to access the return object of my flow (which is a string in this case).
    Jan
    Anna Geller
    8 replies
    Copy to Clipboard
  • Alexandru Anghel

    Alexandru Anghel

    2 months ago
    Hello, I am trying to run flows on a Kubernetes private cluster using the Dask executor and GCS storage. I am coming from a GKE test cluster where i was able to run flows using this approach. The thing now is that the private cluster runs behind a corporate proxy. I've set up the HTTPS_PROXY env variable inside KubernetesRun job template and Prefect is able to download the flow metadata from GCS. The problem is that the same pod is creating the Dask cluster and it fails with this error:
    RuntimeError(f"Cluster failed to start: {e}") from e RuntimeError: Cluster failed to start: 503, message='Service Unavailable', url=URL('<http://proxy-ip-here>:proxy-port-here')
    Any ideas on how to fix this? I've tried adding a NO_PROXY env variable alongside HTTPS_PROXY but it doesn't work. I am using Prefect 1.2 Thanks!
    Alexandru Anghel
    Kevin Kho
    3 replies
    Copy to Clipboard
  • Black Spy

    Black Spy

    2 months ago
    with Flow("hello-world"): task1 = hello_task() task2 = test(task1) with Flow("Multiple Flow"): db_table=create_table() raw=get_complete_date() parsed=parse_complaint_data(raw) populated_table= store_complaints(parsed) populated_table.set_upstream(db_table) from prefect import Flow from prefect.tasks.prefect import StartFlowRun data_engineering_flow = StartFlowRun(flow_name="hello-world", project_name='ETL - Prefect', wait=True) data_science_flow = StartFlowRun(flow_name="Multiple Flow", project_name='ETL - Prefect', wait=True) with Flow("main-flow") as flow: result = data_science_flow(upstream_tasks=[data_engineering_flow]) flow.run() flow.register(project_name="ETL - Prefect") I am facing some issues in flow after executing the code we can able to see flow in UI but We couldn't see task results and also its taking lots of time for processing.. can anyone help me out this
    Black Spy
    Kevin Kho
    5 replies
    Copy to Clipboard
  • b

    Binoy Shah

    2 months ago
    Background Existing Infrastructure: • We do have full fledged Kubernetes clusters running on top of AWS/EKS. • We have stable Jenkins CI/CD pipelines to build/deploy Docker Images and Helm Charts • We support multiple environments separated at namespace level • Our observability is via NewRelic • Our Credential Stores are wired very well in Helm chart via custom charts+annotations • Data Warehouse is Snowflake and data sources are plethora of DBs and API services • ELT is carried out by Meltano + DBT or Python + Celery • Everything running on Kubernetes I am future user of a workflow engine, I am asking around for evaluations, I constructed this so far. I have to confess spent more time at
    #dagster-*
    channels compared to other communities and and it shows in the recommendations chart below under the light of our existing infra setup. But I wanted to add more “fairness” to the evaluation ratings and I’d highly appreciate some constructive feedback from the community on where I can improve this rating.
    b
    Kevin Kho
    +1
    11 replies
    Copy to Clipboard
  • Florian Guily

    Florian Guily

    2 months ago
    hey, i have a flow that is always register even when there is no change to the code. Any idea on why this is hapenning ?
    Florian Guily
    Kevin Kho
    4 replies
    Copy to Clipboard
  • Joshua Greenhalgh

    Joshua Greenhalgh

    2 months ago
    @Kevin Kho Thanks for all your help! Good luck with whatever you are doing next!
    Joshua Greenhalgh
    1 replies
    Copy to Clipboard
  • Octopus

    Octopus

    2 months ago
    [v1.2.1] Hi I would like to run multiple sub flows from task (e.g for each ref I would like to run each action "read", "write","delete") . I have the parent flow who'll have the basic data and the child flow who'll execute an action on a ref. With my code I can only trigger 3 subflows / 9 . I think its because I use the StartFlowRun (I get the same behavior with create_flow_run ) because if I call a task instead of startflowrun I get my 9 subflow executed.
    from prefect import Flow, Parameter, task, unmapped
    from prefect.tasks.prefect import create_flow_run, wait_for_flow_run
    from prefect.executors import LocalDaskExecutor
    
    from prefect.tasks.prefect import StartFlowRun
    
    import time
    from datetime import timedelta
    
    # @task
    # def wait_and_succeed(ref, action_id):
    #     time.sleep(10)
    
    #     print(f"children task success for ref {ref} and action {action_id}")
    
    #     if action_id == "write":
    #         print(f"[SUCCESS] {ref} Second level reached !!!")
    #     if action_id == "delete":
    #         print(f"[SUCCESS] {ref} Third level reached !!!")
    
    @task
    def call_children_flow(ref):
        print(f"{ref} ref")
    
        actions_id = ["read","write","delete"]
    
        for action_id in actions_id:
            start_flow_run = StartFlowRun(flow_name="Generic Children flow")
            print(f"start_flow_run {start_flow_run}")
    
            child_id = start_flow_run.run(parameters={
                    "reference": ref,
                    "action_id": action_id,
            }, project_name="Playground")
    
            wait_for_flow_run.run(child_id)
    
    
    @task
    def run_action(action_id, ref):
        start_flow_run = StartFlowRun(flow_name="Generic Children flow")
    
        print(f"start_flow_run {start_flow_run}")
    
        child_id = start_flow_run.run(parameters={
                "reference": ref,
                "action_id": action_id,
        }, project_name="Playground")
    
        return child_id
    
    
    with Flow("Generic Parent flow") as parent_flow:
        fake_refs = ["ref1", "ref2", "ref3"]
        call_children_flow.map(fake_refs)
    
    if __name__ == "__main__":
        parent_flow.register(
            project_name="Playground"
        )
    
        parent_flow.executor = LocalDaskExecutor(num_workers=20)
    
        parent_flow.run()
    Octopus
    Kevin Kho
    14 replies
    Copy to Clipboard
  • Ievgenii Martynenko

    Ievgenii Martynenko

    2 months ago
    Did any noticed how many db session prefect 1.0 has at a point of time? I spot it creates sessions per every request leading to enormous numbers 100-200+ active session.
    Ievgenii Martynenko
    Kevin Kho
    +1
    4 replies
    Copy to Clipboard