prefect-community #prefect-community

Hi I am trying to test out orion, getting the following error: sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: flow [SQL: INSERT INTO flow (id, created, updated, name, tags) VALUES (?, ?, ?, ?, ?) ON CONFLICT (name) DO NOTHING] [parameters: ('3120dd0f-fbcc-481e-8597-3b522b697520', '2021-11-25 030712.340375', '2021-11-25 030712.340409', 'main-flow', '[]')] Did I miss any step in the installation? Thanks

夏文思

11/25/2021, 6:44 AM

💃 刚刚加入！

👋 4

Yong Tian

11/25/2021, 8:03 AM

🎉Glad to Join! Good Job! Prefect Inc.

👋 3

André Petersen

11/25/2021, 10:08 AM

Hello again! I want to create a very rough price estimate based on the Prefect Setup described here (https://towardsdatascience.com/how-to-cut-your-aws-ecs-costs-with-fargate-spot-and-prefect-1a1ba5d2e2df) by @Anna Geller. I have not much experience with AWS, yet, which makes it difficult to estimate the price using the price estimator. We will only have a couple of batch jobs running a couple of times a day and the processing will take place exclusively on snowflake, so the prefect flows will only deligate work. Also we would probably use prefect cloud. I would guess that in the price calculator here https://calculator.aws/#/createCalculator/Fargate we could go with Linux OS, 2 tasks/pods running 2 hours on average with 2 vCPU allocated and 4GB memory (minimum) and 20GB of storage (minimum). 1. Do you agree? If not, why? I can not really believe that this would only cost $14 per month. 2. Do I need to calculate additional costs for the ECS tasks? If I understand correctly, these are included in the Fargate price estimation. I know that we could save money by using spot instances. My target is to make a pessimistic/conservative price estimation. Thanks in advance! Really appreciate the help in this channel!

Daniil Ponizov

11/25/2021, 10:39 AM

Hi! is it possible to pass some information to the flow run in the UI? Often there is problem, that run depends on current time: for example it downloads data with timestamp equals to previous day and if this run fails and you find later than that day, you have to run the script manually with the appropriate arguments

haf

11/25/2021, 1:28 PM

KeyError: 'data'

— have you seen this error message before?

haf

11/25/2021, 1:30 PM

What might be causing Heartbeat failures from Prefect / Dask? How would one go about debugging this?

Vince Bob

11/25/2021, 2:17 PM

hello, I am struggling on a great_expectations integration problem. I obviously use RunGreatExpectationsValidation task on a checkpoint I created in GE with:

validation_task(

context_root_dir=root_dir,

checkpoint_name=expectation_checkpoint_name

When I run the command on GE (great_expectations --V3-api checkpoint run my_checkpoint), it works, but on prefect task, I have an exception: With GE V3 api:

.....

for batch in ge_checkpoint["batches"]:

TypeError: 'Checkpoint' object is not subscriptable

The same with GE V2 api

...

for batch in ge_checkpoint["batches"]:

TypeError: 'LegacyCheckpoint' object is not subscriptable

Great_expectations=0.13.43 (also tried with 0.12.10 version) prefect=0.15.9 Anyone experienced this pb?

Elijah Roussos

11/25/2021, 3:38 PM

Hi all! I got a quick question if anyone can answer. We’ve got a couple flows running on ECS with a JSON secret for postgres DB access. I want to be able to test flows locally without deploying to Prefect cloud, but obviously that means setting the secret in

~/.prefect/config.toml

locally. From the docs it seems like you can only set strings in the toml, but I need JSON. I’ve tried setting it as a JSON string and also in toml syntax to no avail. Is there any way to set a local JSON secret?

Adam Everington

11/25/2021, 3:48 PM

.map().... is there an upper limit? ...and i'm sure i've asked before, but there's no way of batching it so say like 5 run at a time?

Anh Nguyen

11/26/2021, 2:36 AM

Hi all. I cannt run flow . How to fix that ?

Bruno Murino

11/26/2021, 11:17 AM

Hello everyone — I’m using Prefect Cloud and I’m trying to get a count of tasks ran this month but the account -> usage view is fully empty for me saying 0 tasks have been run today. We’ve been running tasks in prefect cloud for about a month now so this seems like a bug? Or is there a better way?

John Shearer

11/26/2021, 12:42 PM

Hi. I'd like to check the meaning of

date

in prefect context. The docs say "an actual datetime object representing the current time". The datetime value appears to be the same value across all tasks within a flow- so I assuem this is actually the start time of the flow? This behaviour is what I want, but want to confirm my assumption.

Giovanni Giacco

11/26/2021, 1:41 PM

Hello. From the Prefect Cloud GUI I can change memory/cpu request for a KubernetesRun ? How I can do that when I start a flow from Python? And is there a way to change the executor too? I’d like to change memory/cpu request for Dask Workers pod depending on the effort of the computation requested.

Aleksandr Liadov

11/26/2021, 2:44 PM

Hello, Could I change the task name on runtime(The main problem I need to cast dict to BaseModel but I need to keep the parameter name)? I provide the minimal example with flow in comment

Prasanth Kothuri

11/26/2021, 3:41 PM

Hi All, I want to schedule a prefect flow every minute and within the flow have a check to determine a file in s3 has changed, if the file is changed a bunch of tasks are executed, otherwise flow exits, for this I need to maintain state across flows, how can I do that, thanks a ton

Jinho Chung

11/26/2021, 4:46 PM

I'm new to Prefect, and have been experimenting with running flows on my local machine with a local server as well. So far things have been very intuitive! However I have another nodejs app running locally, and want to try to have that app call the local Prefect server to start a parameterized flow. I can start flows through the UI at localhost:8080 and I've been going through the documentation but things just haven't clicked and I'm not sure how to do this. (Additional disclaimer - I have virtually no experience with GraphQL which probably isn't helping). Thanks for the help and this amazing product!

Erick House

11/26/2021, 11:48 PM

Hi all, where do I post potential bugs questions? There is some issue with sqlite creating tables as I go through the basic Orion tutorial.

itay livni

11/27/2021, 3:48 AM

Hi - I a couple of questions about orion: 1. How do you use Radar? 2. Can a task be a class? -- Thanks

haf

11/27/2021, 12:05 PM

I've managed to get around most of the problems I had with retries and stability on Dask, but this one eludes me. I'm getting the

KilledWorker

error which seemingly fails the whole flow. Despite this, the workers are alive and fine (more in thread)

Jake Watson

11/27/2021, 3:52 PM

Hi all, I'm excited for the additional features Orion / 2.0 will bring, though is there a list of what 1.0 features that won't be included in 2.0? The one feature we use most currently in 1.0 that doesn't seem to be implemented in 2.0 yet is flow and task state handlers via the state_handler argument (apologies if it is!).

Erick House

11/27/2021, 4:29 PM

Copy code

from prefect import flow


@flow
def my_favorite_function():
    print("This function doesn't do much")
    return 42


print(my_favorite_function())

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: flow [SQL: INSERT INTO flow (id, created, updated, name, tags) VALUES (?, ?, ?, ?, ?) ON CONFLICT (name) DO NOTHING] [parameters: ('f4971b70-0675-41c8-af7b-efcf8e3c2254', '2021-11-27 162421.653684', '2021-11-27 162421.653699', 'my-favorite-function', '[]')] (Background on this error at: https://sqlalche.me/e/14/e3q8) ❯ prefect version 2.0a5 ❯ sqlite3 version SQLite version 3.36.0 2021-06-18 185849

Lana Dann

11/27/2021, 4:34 PM

Hi! https://docs.prefect.io/api/latest/run_configs.html#ecsrun Is there any way we can configure

ECSRun

to take the most recent (or the only revision) of a

task_definition_arn

? Otherwise we’d have to update and deploy the flow every time we update a task definition which is not ideal

itay livni

11/27/2021, 9:37 PM

Hi - I have one task in orion I am trying to run in a flow that returns a pandas dataframe. The function runs correctly without the task decorator. But the function returns a disambiguation error in a flow. Code in thread

Luis Jaramillo

11/28/2021, 3:31 PM

👋

👋 6

Sridhar

11/29/2021, 12:38 AM

Hi, I have a function that runs parallelly to fetch data from external api. The

get_data_asynchronous()

function below creates 10 threads and calls the api concurrently. I am using this function in

run_factset_api()

. As a standalone code locally this works fine. But when I schedule a run on prefect, the

run_factset_api()

function exits before execution and returns coroutine object (although locally it returns the desired value). Is there something I should do to facilitate parallel run on prefect?

Copy code

async def get_data_asynchronous():
    with ThreadPoolExecutor(max_workers=10) as executor:
        with requests.Session() as session:
            # Set any session parameters here before calling `fetch`
            loop = asyncio.get_event_loop()
            tasks = [
                loop.run_in_executor(
                    executor,
                    company.get_company_records,
                    *(session, [companies], {**company_info, **formulas})
                    # Allows us to pass in multiple arguments to `fetch`
                )
                for companies in companies_to_fetch
            ]
            for response in await asyncio.gather(*tasks):
                master = master.append(response, ignore_index=True)
    return master

@task
def run_factset_api():
    loop = asyncio.get_event_loop()
    future = asyncio.ensure_future(get_data_asynchronous())
    master = loop.run_until_complete(future)
    return master

@task
def save_data_to_s3(emmi_reduction):
    s3_resource = boto3.resource('s3')
    s3_resource.Object(bucket, 'factset_output_data.csv').put(Body=csv_buffer.getvalue())

with Flow('api-flow', storage=STORAGE, run_config=RUN_CONFIG) as flow:
     response = run_factset_api()        
     if response:
         save_data_to_db(response)
        
flow.register('pipeline')

Priyab Dash

11/29/2021, 9:08 AM

We have a function defined as a task as below

Copy code

@task(log_stdout=True, state_handlers=[notify_run_failure])
def submit_job_run_to_tmc(job_run):

but this is being called twice when we run a flow

Gabriel Milan

11/29/2021, 11:45 AM

Hi all, I was wondering if there's a way of setting environments for my jobs on the Kubernetes Agent. I'm trying that by using the Helm chart, and my

agent

section of the

values.yaml

file looks like this:

Copy code

agent:
  enabled: true
  prefectLabels:
    - mylabel
  ...
  job:
    ...
    envFrom:
      - secretRef:
          name: gcp-credentials

The secret

gcp-credentials

exists and is correct. Unfortunately, this doesn't seem to work

Zohaa Qamar

11/29/2021, 2:07 PM

Hi all, I have a Python task in my flow that has sys.exit() in it meaning I want that task to break if some condition has met and do not proceed further. But, my task keeps on running in this case and does nothing. Any help?