• Sen

    Sen

    6 months ago
    I have tried the suggestion from @Abhishek by following the GitHub project at https://github.com/kvnkho/demos/tree/main/prefect/docker_with_local_storage But it still doesn't work as expected. I believe I am missing some configuration.
    Sen
    Anna Geller
    19 replies
    Copy to Clipboard
  • Nico Neumann

    Nico Neumann

    6 months ago
    Hi! I have a question regarding running prefect on AWS ECS. I am currently using Fargate to launch my flows. I have a pretty big docker image (~2-3GB uncompressed) which adds some dependencies (not only python) on top of the prefect docker image. The problem is, that for every flow Fargate pulls the image from AWS ECR (in the same VPC) which results in multiple minutes to start. Most of the runs are small, so I need to wait couple of minute to start and then they finish within a few seconds. Let’s assume I start 100 flows a day, this would result in 200-300gb of pulling the same image. My first idea was to split the image into multiple images and use subflows. Then every subflow could specify which image and dependencies it needs. Or I could try to reduce the image somehow. But in both cases even at 0.5GB per image it would result in pulling 50GB a day. I found this AWS issue regarding caching the image: https://github.com/aws/containers-roadmap/issues/696 Unfortunately caching is currently only supported for EC2 but not for Fargate. So my second idea was to use EC2 instead? But I am not sure how well it scales. This would result in startup and shutdown of EC2 instances depending on how many flows are running. So it might just shifts the startup problem as flows might need to wait for another EC2 instance to start. I used this tutorial to set-up everything for Fargate: https://towardsdatascience.com/how-to-cut-your-aws-ecs-costs-with-fargate-spot-and-prefect-1a1ba5d2e2df (thanks to @Anna Geller, this works great!) But I could not figure out how to do it for EC2 properly. If I understand correctly, EC2 has one IP per instance while Fargate has one IP per every flow, so the set-up would be a little different. My main problem is the long startup time of multiple minutes and I am not sure what’s the best way to deal with it. Maybe someone experienced the same problem and found a better solution?
    Nico Neumann
    Noah Holm
    +1
    7 replies
    Copy to Clipboard
  • y

    Yas Mah

    6 months ago
    Hello 🙂 which possibilities are there to access the result of a task and use it in an other operation with the flow, which is not a task:
    @task
    def get_access_paths(base_path:Path):
        return base_path
    
    with Flow("flow") as flow:
        base_path = Parameter("base_path", default=pathlib.Path(__file__).parent.parent.resolve())
        data_access = get_access_paths(base_path)
        files = [str(Path.joinpath(data_access, x)) for x in data_access.glob('*') if x.is_file()]
    
        input = Parameter("input", default=files)
    y
    Kevin Kho
    2 replies
    Copy to Clipboard
  • j

    Jake

    6 months ago
    We have a parameter that gets passed to some of our tasks (like which DB endpoint to point to); when this value changes, re-registration won’t happen (since it doesn’t count as any of the metadata) but how can I make it so that this change does trigger a re-registration?
    j
    Anna Geller
    +1
    48 replies
    Copy to Clipboard
  • Keith Veleba

    Keith Veleba

    6 months ago
    hello, I'm running flows in AWS ECS and I'm using the "prefecthq/prefect:latest-python3.8" image in my flow's ECSRun run_config. I just added a PostgresFetch task and now im getting an a error that the extras are not installed in the prefect image I'm using. Is there an alternative base image I should be using?TIA
    Keith Veleba
    Anna Geller
    +2
    44 replies
    Copy to Clipboard
  • Xavier Babu

    Xavier Babu

    6 months ago
    Hi Prefect Community, Please provide available tutorial or doc link to integrate Prefect 2.0 with Tomcat Web/App Server.
    Xavier Babu
    Kevin Kho
    9 replies
    Copy to Clipboard
  • Dominick Olivito

    Dominick Olivito

    6 months ago
    hello, i'm trying to run on GCP GKE with a custom image. the pod is dying immediately with these errors in the logs:
    /home/flex/.local/bin/prefect: line 3: import: command not found
    /home/flex/.local/bin/prefect: line 4: import: command not found
    /home/flex/.local/bin/prefect: line 5: from: command not found
    /home/flex/.local/bin/prefect: prefect: line 7: syntax error near unexpected token `('
    /home/flex/.local/bin/prefect: prefect: line 7: `    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])'
    it looks like it's finding and parsing the
    prefect
    executable file but not running it with
    python
    . when I run a local container using the image, i'm able to successfully call the command
    prefect
    and run a flow.
    prefect
    is in the path of active user (
    flex
    ). i'm also able to run basic flows successfully on GKE using prefect's base image, so the issue is specific to our custom image. do you have any suggestions on what we can check in our custom image?
    Dominick Olivito
    Kevin Kho
    +1
    5 replies
    Copy to Clipboard
  • Vadym Dytyniak

    Vadym Dytyniak

    6 months ago
    Hi. Could you please provide minimum IAM permissions required for Prefect ECS agent?
    Vadym Dytyniak
    Kevin Kho
    3 replies
    Copy to Clipboard
  • Christian Nuss

    Christian Nuss

    6 months ago
    anyone have a lil cheatsheet snippet for a
    KubernetesRun
    defininig the
    job_template
    as a dict?
    Christian Nuss
    Kevin Kho
    15 replies
    Copy to Clipboard
  • p

    Prasanth Kothuri

    6 months ago
    Hi All, I would like to write a pandas dataframe as csv to s3 in prefect, shouldn't this work?
    # upload to s3
    write_to_s3 = S3Upload(
        bucket=s3_bucket,
        boto_kwargs=dict(
            endpoint_url=os.getenv("s3_endpoint"),
            aws_access_key_id=os.getenv("s3_access_key"),
            aws_secret_access_key=os.getenv("s3_secret_key")
        )
    )
    
    output = write_to_s3(results.to_csv(index=False), key=file_name)
    p
    Kevin Kho
    3 replies
    Copy to Clipboard