• s

    Sridhar

    10 months ago
    Hi, I'm using docker image pushed on the container registry as storage. I have a lot of custom dependency py files needed to run my flow. The directory looks something like this (image below) . And my Dockerfile content is
    # specify a base image
    FROM python:3.8-slim
    # copy all folder contents to the image
    COPY . .
    # install all dependencies
    RUN apt-get update && apt-get -y install libpq-dev gcc && pip install psycopg2
    RUN pip install -r requirements.txt
    My understanding is
    COPY . .
    should copy all the files required to run the flow into the image. But I'm getting an error saying no module found (image attached). Also Here's my STORAGE and RUN_CONFIG
    STORAGE = Docker(registry_url='<http://aws_id.dkr.ecr.region.amazonaws.com/|aws_id.dkr.ecr.region.amazonaws.com/>',
                     
    image_name='name',
                     
    image_tag='tag',
                     
    dockerfile='Dockerfile')
    RUN_CONFIG = ECSRun(run_task_kwargs={'cluster': 'cluster-name'},
                       
    execution_role_arn='arn:aws:iam::aws_id:role/role',
                       
    labels=['dev-modelling', 'flow-test'])
    Am I missing something?? Really appreciate the help. Thanks in advance!!
    s
    Kevin Kho
    +1
    5 replies
    Copy to Clipboard
  • Florian Boucault

    Florian Boucault

    10 months ago
    Hi everyone: in Orion would it be possible to call a task from within a task?
    Florian Boucault
    Anna Geller
    2 replies
    Copy to Clipboard
  • a

    Asmita

    10 months ago
    Hi everyone: We are having an issue on running a mocked flow. We have a flow and for testing the flow, we are mocking a few tasks on the flow and creating a new test flow. The test flow is mocked correctly but when running the test flow, it is using registration_settings of the original flow rather than the new test flow. Would you be able to help? @Milly gupta
    a
    Anna Geller
    +2
    53 replies
    Copy to Clipboard
  • Fina Silva-Santisteban

    Fina Silva-Santisteban

    10 months ago
    Hi community, I have an imperative api question:
    flow.set_dependencies(
                upstream_tasks=[A],
                task=B,
                keyword_tasks=dict(item=A)
            )
    Do I need to explicitly set a Task as an
    upstream_task
    when I already have it as a
    keyword_task
    ? I’ve read that one is to set state dependencies and the other is to set data dependencies. Since a data dependency is implicitly a state dependency, having task
    B
    as keyword_task should be enough, right?
    Fina Silva-Santisteban
    Anna Geller
    3 replies
    Copy to Clipboard
  • Andreas Tsangarides

    Andreas Tsangarides

    10 months ago
    hey all, is there a way to run only specific tasks using tags from a flow? imagine our flow splits after a specific tag, treating two different datasets
    task_1 -> task_2 -> [task_a, task_b] # i.e. task_a and task_b are independent of each other but share the same dependencies
    if we define tasks as such:
    task_1: [a, b]
    task_2: [a, b]
    task_a: [a]
    task_b: [b]
    Is there anything like:
    prefect run --tags=a
    I am really asking for functionalities I was using with
    kedro
    here 😛
    Andreas Tsangarides
    Kevin Kho
    2 replies
    Copy to Clipboard
  • Vamsi Reddy

    Vamsi Reddy

    10 months ago
    Hello all, we are currently using lambda to trigger our flows. we are also currently setting up our CI for each of these flow projects. we are using the flow id to trigger our flows through lambda. but the problem is everytime we register a new version of the flow the flow id keeps changing. we noticed the flow group id remains the same. is there a way possible that we can always trigger the latest version of our flow without having to worry about changing flow id’s.
    Vamsi Reddy
    Anna Geller
    +2
    13 replies
    Copy to Clipboard
  • Leon Kozlowski

    Leon Kozlowski

    10 months ago
    Not sure if this is the appropriate place for this question, but - is it feasible to include a service account annotation in a k8s job template for a particular flow?
    Leon Kozlowski
    Kevin Kho
    4 replies
    Copy to Clipboard
  • r

    Ryan Brideau

    10 months ago
    Hey all, I’m looking for a way to run multiple flows with different schedules. When I try to do this and execute each flow’s
    .run()
    method in a loop, though, it only the first runs and seems to block before the others run. Is there a way around this?
    r
    Kevin Kho
    3 replies
    Copy to Clipboard
  • Jacob Warwick

    Jacob Warwick

    10 months ago
    Hey folks. Is there a way to have Prefect treat a file on disk that was created during a task’s execution as a Result, that can be persisted using S3Result (for example) without loading that file into memory / returning it from the task function? I am trying to see if my organization can use Prefect, but our core need is to run 3rd party programs that produce large output files that may not fit in memory. Thanks and I apologize if this is already in the docs.
    Jacob Warwick
    Nate
    +1
    10 replies
    Copy to Clipboard
  • f

    Frank Oplinger

    10 months ago
    Hello, I am currently trying to use a DaskExecutor in an ECSRun to parallelize a flow. I’m following the documentation to create a temporary cluster with a specified worker image. My flow currently looks something like this:
    def fargate_cluster(n_workers=4):
        """Start a fargate cluster using the same image as the flow run"""
        return FargateCluster(n_workers=n_workers, image=prefect.context.image)
    
    class LeoFlow(PrefectFlow):
    
        def generate_flow(self):
            with Flow(name=self.name, storage=S3(bucket="raptormaps-prefect-flows")) as flow:
                ...
            flow.executor = DaskExecutor(
                cluster_class=fargate_cluster,
                cluster_kwargs={"n_workers": 4}
            )
            return flow
    In the dockerfile for the image that I’m specifying in the ECSRun, I have included the following line to install dask-cloudprovider:
    RUN pip install dask-cloudprovider[aws]
    However, when I execute the flow, I am hitting the following error:
    Unexpected error: AttributeError("module 'aiobotocore' has no attribute 'get_session'",)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/prefect/engine/runner.py", line 48, in inner
        new_state = method(self, state, *args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/prefect/engine/flow_runner.py", line 442, in get_flow_run_state
        with self.check_for_cancellation(), executor.start():
      File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
        return next(self.gen)
      File "/usr/local/lib/python3.6/site-packages/prefect/executors/dask.py", line 238, in start
        with self.cluster_class(**self.cluster_kwargs) as cluster:
      File "/rprefect/leo_flow.py", line 58, in fargate_cluster
        return FargateCluster(n_workers=n_workers, image=prefect.context.image)
      File "/usr/local/lib/python3.6/site-packages/dask_cloudprovider/aws/ecs.py", line 1361, in __init__
        super().__init__(fargate_scheduler=True, fargate_workers=True, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/dask_cloudprovider/aws/ecs.py", line 726, in __init__
        self.session = aiobotocore.get_session()
    AttributeError: module 'aiobotocore' has no attribute 'get_session'
    Is there a specific version of dask_cloudprovider that Prefect requires?
    f
    Kevin Kho
    +1
    21 replies
    Copy to Clipboard