• i

    itay livni

    2 years ago
    Hi - I am encountering a botocore
    NoCredentialsError('Unable to locate credentials')
    error using Docker Storage (local image), prefect cloud and the
    s3ResultHandler
    When the flow is run locally. There are no credential issues initializing the handler
    s3_handler = S3ResultHandler(bucket='some-bucket')
    To resolve the issue I created a Prefect Secret called
    "AWS_CREDENTIALS"
    and tried
    s = Secret("AWS_CREDENTIALS")
    aws_scrts = s.get()
    s3_handler = S3ResultHandler(bucket='some-bucket', aws_credentials_secret=aws_scrts)
    What is the best way to resolve aws credential error? Thanks
    i
    Zachary Hughes
    20 replies
    Copy to Clipboard
  • d

    David Ojeda

    2 years ago
    Hi, I am refactoring my flow parameters and I encountered a slight problem when I compose two flows with a parameter with the same name:
    ValueError: A task with the slug "limit" already exists in this flow.
    I could come up with a plumbing hack like this:
    flow = Flow(name)
            local_flow = build_local_flow()     # a function that returns a flow
            quetzal_flow = build_quetzal_flow() # idem
    
            # plumbing: both local and quetzal flows have a limit parameter
            limit_parameter = local_flow.get_tasks(name='limit')[0]
            other_parameter = quetzal_flow.get_tasks(name='limit')[0]
            for edge in quetzal_flow.edges:
                if edge.upstream_task == other_parameter:
                    edge.upstream_task = limit_parameter
            quetzal_flow.tasks = set(t for t in quetzal_flow.tasks if t != other_parameter)
            flow.update(local_flow)
            flow.update(quetzal_flow)
            ...
    which works, but it seems very hackish and far from elegant. Is there a cleaner alternative to this? (other than renaming the parameter, of course)
    d
    a
    +1
    6 replies
    Copy to Clipboard
  • j

    John Ramirez

    2 years ago
    Hey everyone - I have reached out a few times over the last few days looking for suggestions about how to run up to 3000 distinct runs in parallel in the most efficient way. I see there is a new
    Dask Cloud Provider Environment
    and want to know if this env would be the best way to accomplish this goal
    j
    nicholas
    +3
    19 replies
    Copy to Clipboard
  • j

    Jie Lou

    2 years ago
    Hi everyone. I noticed one issue in scheduling behavior in registering a flow, and wondered if anyone met this issue as well. I used
    CronClock("00 16 * * *",parameter_defaults=MY_PARAMETER_1)
    to schedule one flow. And then I also have another flow with different batch of parameters to be scheduled at the same time,
    CronClock("00 16 * * *",parameter_defaults=MY_PARAMETER_2)
    . And then I set
    flow.schedule=Schedule(clocks=[clock1,clock2])
    , and then register the flow. In cloud UI, I just see one flow scheduled instead of two. If I tweak the time a bit, i.e., set
    CronClock("05 16 * * *",parameter_defaults=MY_PARAMETER_2)
    , then two flows are scheduled as expected. It seems like if two flows are scheduled at the same time, then only on will be picked. It’d better if it allows multiple flows scheduled at the same time.
    j
    nicholas
    +1
    9 replies
    Copy to Clipboard
  • Jim Crist-Harif

    Jim Crist-Harif

    2 years ago
    @Joe Schmid (and anyone else with opinions), I've been thinking about the current state of creating a new
    Environment
    class per dask-cluster class, and it seems a bit untenable. I've been thinking about making a generic dask environment that takes the cluster-manager class and kwargs and uses that to create a dask cluster. Since dask already has a spec'd interface for this, it seems significantly simpler than having a mirror of each of these in prefect. Something like (class name not decided):
    environment = DaskClusterEnvironment(
      cls=dask_yarn.YarnCluster,
      kwargs={
        "queue": "engineering",
        "environment": "hdfs://..."
      }
    )
    Jim Crist-Harif
    j
    +1
    4 replies
    Copy to Clipboard
  • d

    David Ojeda

    2 years ago
    So what’s the latest word on result_handlers? I have this warning when doing flow.register:
    [2020-05-06 18:24:40]  WARNING - py.warnings | /home/david/.virtualenvs/iguazu-env/lib/python3.8/site-packages/prefect/client/client.py:576: U
    serWarning: No result handler was specified on your Flow. Cloud features such as input caching and resuming task runs from failure may not wor
    k properly.
    but the flow constructor docstring says:
    - result_handler (ResultHandler, optional, DEPRECATED): the handler to use for
                retrieving and storing state results during execution
    Should I add a no-op result handler to quiet that warning, or just ignore it?
    d
    Chris White
    2 replies
    Copy to Clipboard
  • d

    Dan DiPasquo

    2 years ago
    We'd like to automate flow registration via CICD. Is there an existing pattern for doing so via CircleCI? What is the most appropriate token type for this type of integration?
    d
    nicholas
    +1
    3 replies
    Copy to Clipboard
  • m

    Matthias

    2 years ago
    I’d like to have Prefect server running locally in an existing Docker environment. Is there any way I could spin up the containers created by
    prefect server start
    manually? Or is there a way to run
    prefect server start
    inside another Docker container? I tried to add the Docker socket as a volume but that did not work.
    m
    nicholas
    +1
    9 replies
    Copy to Clipboard
  • m

    Manuel Mourato

    2 years ago
    Hello guys, I am trying to execute flow.visualize() in a Pycharm IDE, but nothing is showing up. The code doesnt show any error, it executes until the end of the flow with Success, but the visualization doesnt show up.
    f_run=test_flow1.run()
    test_flow1.visualize(flow_state=f_run)
    Anyone had this issue before?
    m
    nicholas
    +1
    6 replies
    Copy to Clipboard
  • c

    Chris Vrooman

    2 years ago
    I have a question about executing the same function multiple times within a flow. Is there a recommended way to configure upstream dependencies so that we can ensure that tasks execute in the right order? There is no data dependency for my use case. Was hoping to avoid redefining a function with a different name. Basic Example:
    @task
    def my_function(x, y):
        print(x+y)
    
    with Flow(name="my_flow") as flow:
        # Run 1st
        my_function(1, 2)
        # Run 2nd
        my_function(3, 4)
        # Run 3rd
        my_function(5, 6)
    c
    d
    +1
    3 replies
    Copy to Clipboard