• j

    Joyce Xu

    1 year ago
    Hi all. My team is using the
    StartFlowRun
    task to build "flow-within-a-flow" pipelines, i.e. Flow A includes the
    StartFlowRun
    task X, and task X starts Flow B. We would like to use the graphql API to track metadata on our flows. Is there a way to query flows such that for Flow B, we can see that it corresponds to task X, or at least, that it is originated from Flow A?
    j
    Chris White
    5 replies
    Copy to Clipboard
  • n

    Nabeel

    1 year ago
    Hi all. My team has been trying to execute a flow in Azure storage. The flow registers to prefect server & can push to Azure blob storage. When we start a local agent it is able to pick up the flow and says deploying. However, it then crashes with the error ``Failed to load and execute Flow's environment: AttributeError("'NoneType' object has no attribute 'rstrip'")` The azure storage version is within the range specified by setup.py. Please if anyone has any idea what could be the possible issue it would be a huge help 🙌 Thanks so much! 😃
    n
    Aiden Price
    3 replies
    Copy to Clipboard
  • v

    Vitaly Shulgin

    1 year ago
    Hello Team, I created flow, which schedules other flows, using
    StartFlowRun
    , when I run it locally, everything is working fine, but, when it executed in k8s container, which is running by schedule it fails
    v
    j
    10 replies
    Copy to Clipboard
  • Josh Greenhalgh

    Josh Greenhalgh

    1 year ago
    Does anyone know how to specify the prefect extras I need in the Docker storage? I am getting the following build error;
    System Version check: OK
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/prefect/tasks/gcp/__init__.py", line 14, in <module>
        from prefect.tasks.gcp.bigquery import (
      File "/usr/local/lib/python3.7/site-packages/prefect/tasks/gcp/bigquery.py", line 4, in <module>
        from google.cloud import bigquery
    ImportError: cannot import name 'bigquery' from 'google.cloud' (unknown location)
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/opt/prefect/healthcheck.py", line 151, in <module>
        flows = cloudpickle_deserialization_check(flow_file_paths)
      File "/opt/prefect/healthcheck.py", line 44, in cloudpickle_deserialization_check
        flows.append(cloudpickle.loads(flow_bytes))
      File "/usr/local/lib/python3.7/site-packages/prefect/tasks/gcp/__init__.py", line 24, in <module>
        ) from err
    ImportError: Using `prefect.tasks.gcp` requires Prefect to be installed with the "gcp" extra.
    I tried;
    storage = Docker(
        registry_url="<http://gcr.io/blah/|gcr.io/blah/>",
        image_name=name,
        image_tag="latest",
        build_kwargs={"buildargs":{"EXTRAS":"kubernetes,gcp"}}
    )
    With no luck 😞
    Josh Greenhalgh
    Jan Marais
    +1
    21 replies
    Copy to Clipboard
  • m

    Matthew Blau

    1 year ago
    Hello all, I am looking for some guidance/examples on running flows on docker containers. Right now my company has software that runs on schedules in their own docker containers. My understanding currently of how to best utilize Prefect is to rebuild those containers to have
    FROM prefecthq/prefect:0.7.1-python3.6
    or similar in the Dockerfile, which I have done so. From there I am needing to create a flow that takes this Dockerfile and builds the container and that is where my understanding is weaker and not so clear. I have @task decorations added to various bits of the integration that I am currently attempting to convert over to having Prefect handle the execution of. I am stuck with the how of writing the flow in order to have it work with this docker container. Am I needing a seperate flow.py that takes the Dockerfile of the container, build it, and run the tasks denoted by "@task" within the integration in order for this to be able to be orchestrated by Prefect? If so, how would I write the flow as an example? I feel like my understanding is flawed and would appreciate some help with this. For reference I am running 0.14.1 Thank you all in advance!
    m
    Sean Talia
    13 replies
    Copy to Clipboard
  • j

    jcozar

    1 year ago
    Hello all, I am running a prefect ECS agent as a service in AWS ECS. Now I am trying to register a flow run in Prefect Cloud using the Docker storage. I created a
    ECSRun
    run configuration for the
    run_config
    argument in the Flow. If I use the
    env
    argument to provide the
    AWS_ACCESS_KEY_ID
    and
    AWS_SECRET_ACCESS_KEY
    it works! However, I don’t want to put my credentials in the source code. I am trying to use the
    task_definition_arn
    argument, but I am not sure if it is the correct way, because the image of the task should be the Flow docker image. Can you give me any tip or advise? Thank you very much!
    j
    j
    7 replies
    Copy to Clipboard
  • Josh Greenhalgh

    Josh Greenhalgh

    1 year ago
    Do mapped tasks only behave in a parallel fashion on dask? I am running on k8s using job per flow and all my mapped tasks log in order sequentially - also theres a huge gap between two mapped tasks I was under the impression that if you have;
    p1_res = extract_raw_from_api.map(
            asset_location=asset_locations,
            extract_date=unmapped(extract_date),
            extract_period_days=unmapped(extract_period_days),
        )
        p2_res = process_2.map(p1_res)
        p3_res = process_3.map(p2_res)
    Then as soon as task 1 from p1_res is done then the corresponding task should start in process_2? As it stands no process_2 tasks start until many of the extract_raw_from_api tasks are already complete...
    Josh Greenhalgh
    j
    +1
    19 replies
    Copy to Clipboard
  • f

    Felipe Saldana

    1 year ago
    Hello all, I was wondering if someone can point me in the right direction. (Please let me know if I need to clarify 🙂) I have this working as a POC in Dagster and was wondering how this example would work in Prefect This would be testing a custom task class ... multiple instances of a smaller task ... configurable params/secrets I have a legacy module that accepts 20+ parameters. Some of these parameters will be required (name), some will need to be secrets (tokens), and some vary (description). Depending on the combination of these parameters the module will do specific things: Query a database and dump to a bucket, grab data from a bucket and load in the db, and other tasks. Instead of accepting command line args, I want to "prefect-ify" that module and supply some of the configuration directly into that module. Would this be a combination of parameters and secrets? Can these be loaded from a toml file? Next, I want to wrap that base module into a smaller/specific task: example: query_db_and_dump_to_bucket(). This smaller task will have required values(db host, db username, db pass, table_name). Just to point out these values are not required in the base module. In my flow, I would want to call query_db_and_dump_to_bucket() again and again using different tables names.
    f
    emre
    22 replies
    Copy to Clipboard
  • Sean Talia

    Sean Talia

    1 year ago
    I have a (I think simple) question around exposing more complete output to the prefect logger – I'm deliberately crashing a task that uses a
    DbtShellTask
    to execute a dbt project under the hood, and am not supplying the necessary ENV vars that
    dbt
    needs to run. If I have prefect run my flow that calls this shell task, the only failure message I see is the very last line of what dbt spits out, i.e.
    [2021-01-27 16:34:13+0000] ERROR - prefect.run_dbt | Command failed with exit code 2
    [2021-01-27 16:34:13+0000] ERROR - prefect.run_dbt |   Could not run dbt
    this is obviously not terribly informative, and I'd like for prefect to forward to me the entirety of the error message that dbt generates, which is:
    Encountered an error while reading profiles:
      ERROR Runtime Error
      Compilation Error
        Could not render {{ env_var('DB_USERNAME') }}: Env var required but not provided: 'DB_USERNAME'
    Defined profiles:
     - default
    
    For more information on configuring profiles, please consult the dbt docs:
    
    <https://docs.getdbt.com/docs/configure-your-profile>
    
    Encountered an error:
    Runtime Error
      Could not run dbt
    what kind of additional steps need to be taken in order for me to have prefect forward all the lines of output rather than just the final one? it's not clear to me where or how I would need to adjust the default prefect logger to set this up
    Sean Talia
    j
    2 replies
    Copy to Clipboard
  • Nikul

    Nikul

    1 year ago
    Hi all, I'm running a flow on Kubernetes using DaskExecutor with 4 workers, but it doesn't create 4 worker pods. Ideally, I'd like to be able to scale to >1 node (not using an existing Dask cluster). (I tried using dask_kubernetes.KubeCluster, but I get an error: AttributeError("'NoneType' object has no attribute 'metadata'"). I'm not sure if this would solve my problem anyway.) Is there a way I can run a flow such that I can scale up to multiple nodes? Thank you
    Nikul
    1 replies
    Copy to Clipboard