• i

    itay livni

    2 years ago
    Hi there again. Running into an issue wher I pass two
    pamdas dataframes
    However the
    merge
    fails with a
    pandas error
    ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
    Without the
    merge
    command both return valid
    dataframes
    . Any suggestions?
    i
    Chris White
    11 replies
    Copy to Clipboard
  • Mark McDonald

    Mark McDonald

    2 years ago
    Hi - I'm looking for some feedback on how we're deploying our flows. I'm trying to dynamically deploy our flows using Docker and a custom deployment script. Within my project, I have a module that contains my flows (each flow in a separate file). Each of these flows has various dependencies that get imported (pandas, scipy, numpy, etc.). We pin our dependencies in a requirements.txt file. My deployment script does the following in this order:1. Grabs all of my flows' dependencies from the requirements.txt file and adds it to a list called
    dependencies
    2. Installs these dependencies on my docker image 3. Builds prefect docker storage object: storage = Docker(registry_url=os.getenv('PREFECT_ECR_REGISTRY'), image_name=os.getenv('PREFECT_ECR_IMAGE_NAME'), image_tag=package_version, python_dependencies=dependencies) 4. script imports each of my flow files and locates the flow objects within them. These flows get added to the storage object 5. calls storage.build 6. iterates through the flows and calls flow.deploy() on each flow object with build set to False As it stands the deployment takes ~5 minutes. Any areas where I might be able to improve this?
    Mark McDonald
    Chris White
    2 replies
    Copy to Clipboard
  • i

    itay livni

    2 years ago
    How do you add a custom
    filepath
    to visualize? Using the script from the visualize docs I added filename as an arg. But I do not see the write to s3. I am assuming that s3fs is available for writing this file (if that is the right terminology?)
    i
    Chris White
    8 replies
    Copy to Clipboard
  • d

    Daniel Veenstra

    2 years ago
    Hey all, I'm working on starting what's essentially a data warehouse at my company, and thinking of using prefect to schedule and orchestrate all the ETL. We're going to have a number of 3rd party data sources to pull data from on various schedules, and then we'll likely want to schedule some transformations after certain combinations of tables are finished loading each day. I'm trying to get the project architecture off on the right foot and have been trying out prefect for a couple days, and I'm wondering how I should think about organizing my Flows. I'm wondering, should I have one flow per data source, or one flow for the whole pipeline? My dilemma is that each data source is going to have its own schedule, which leads me to have one Flow per source, but if I want to trigger transformations based on the completion of table loads, that feels like the flows are going to have dependencies on each other's completions and would be better off as one flow. Thoughts? Any examples out there of similar projects?
    d
    Jeremiah
    +2
    16 replies
    Copy to Clipboard
  • e

    Egbert Ypma

    2 years ago
    Hi folks, I'm pretty new to prefect and exploring its possibilities at the moment. I am trying to upload a file to AWS but I am struggling with all the security details. Where can I find an example that uses the S3Upload task as a step in a workflow?
    e
    Chris White
    2 replies
    Copy to Clipboard
  • b

    Brian Mesick

    2 years ago
    Hi folks. I'm trying out Prefect and running some pretty basic flows to get a feel for how development works. I'm curious if anyone is actually using the Snowflake task as it seems to have a bug that I would expect to render it unusable.
    b
    Chris White
    +2
    8 replies
    Copy to Clipboard
  • i

    itay livni

    2 years ago
    Hi - I am trying to check if a task has successfully run in a flow and then based on that on a successful; or failed
    task
    completed do something. Something like
    TaskRunner
    state.is_successful()
    but in a
    flow
    .
    i
    Jeremiah
    +1
    9 replies
    Copy to Clipboard
  • m

    Matias

    2 years ago
    I’m wondering if and how prefect could be used to transfer large amounts of data between different servers/clouds. Basically, I’d need to move 10-100 gigabyte csv/Jason files from an SFTP server to ADLS, and later on between other sources and sinks. Moving this amount of data as a one gigantic in memory string between tasks does not seem very sound approach for many reasons. So how would you actually do that?
    m
    Jeremiah
    +3
    18 replies
    Copy to Clipboard