https://prefect.io logo
Title
a

Alexandru Anghel

06/06/2022, 3:08 PM
Hi guys, I am trying to submit a batch job to Prefect 1.2.2 using Kubernetes but the cron schedule does nothing. The flow runs only at submission time and it's not starting again. What could be the cause? More info inside the thread.
The flow is scheduled to run every 5 minutes
from main.batch import job_template, create_bigquery_table, load_to_bq, KubernetesRun, Parameter, Flow, GCS, Schedule, CronClock


schedule = Schedule(clocks=[CronClock("*/5 * * * *")]) #every 5 minute

with Flow(name='alerts', schedule=schedule, storage=GCS(bucket="aino-prefect-batch"), run_config=KubernetesRun(labels=["dev"], job_template=job_template, image="<http://docker.io/alexanghel23/prefect-v0.2|docker.io/alexanghel23/prefect-v0.2>", image_pull_policy="Always")) as flow:
    
    gcp_project = Parameter(name = 'gcp_project', default = 'myproject')
    bigquery_dataset =  Parameter(name = 'bigquery_dataset', default = 'batch_landing')
    bigquery_table = Parameter(name = 'bigquery_table')
    infer_schema = Parameter(name = 'infer_schema', default = True)
    bigquery_fields = Parameter(name = 'bigquery_fields')
    
    prometheus_query = Parameter(name = 'prometheus_query')
    prometheus_start_date = Parameter(name = 'prometheus_start_date')
    prometheus_end_date = Parameter(name = 'prometheus_end_date')
    prometheus_step = Parameter(name = 'prometheus_step')

    a = create_bigquery_table(gcp_project, bigquery_dataset, bigquery_table, infer_schema, prometheus_query, bigquery_fields)

    load_to_bq(a, prometheus_query, prometheus_start_date, prometheus_end_date, prometheus_step, upstream_tasks=[a])
I am also using a job template and i tried changing it to CronJob but it's not working:
job_template={
        "spec": {
            "template": {
                "spec": 
                {
                    "volumes": [
                        {
                            "name": "google-cloud-keys",
                            "secret":
                                {
                                    "secretName": "gcp-cred"
                                }
                            
                        }
                    ],     
                    "containers": [
                        {
                            "name": "flow",
                            "env": [
                                {
                                "name": "GOOGLE_APPLICATION_CREDENTIALS",
                                "value": "/var/secrets/google/key.json"
                                }
                            ],
                             "volumeMounts": [
                                {
                                "name": "google-cloud-keys",
                                "mountPath": "/var/secrets/google"
                                }
                            ]

                        }
                    ]
                }
            }
        }
    }
k

Kevin Kho

06/06/2022, 4:08 PM
Can you go to the settings tab of the flow and see what the schedule looks like there?
a

Alexandru Anghel

06/07/2022, 6:40 AM
Hi @Kevin Kho , this is how it looks like. Thanks!
k

Kevin Kho

06/07/2022, 1:56 PM
Your flow schedule is not on? Is that intentional?
a

Alexandru Anghel

06/07/2022, 2:31 PM
Hi @Kevin Kho, I noticed that but when trying to enable it from the UI i get and error:
Error: GraphQL error: Can not schedule a flow that has required parameters.
Still, i don't want to enable the schedule manually each time. Isn't supposed to enable it as soon as you mention the schedule parameter in the flow properties?
schedule = Schedule(clocks=[CronClock("*/5 * * * *")]) #every 5 minute

with Flow(name='alerts', schedule=schedule ...
Thanks!
k

Kevin Kho

06/07/2022, 2:46 PM
I think you can add
required=False
to your parameters
a

Alexandru Anghel

06/07/2022, 3:49 PM
Yes, but i need the parameters to be required. My question is if there is some way to toggle the Schedule when i submit the flow in the first place without having to interact with the UI. Thanks!
k

Kevin Kho

06/07/2022, 4:14 PM
If you need them to be required, then you need defaults. The issue here is you can’t toggle a schedule on because there are no defaults for the required parameters. The flow will turn the schedule on when you register by default. It’s just in this case, you run into an error when it tries to do so, and then it ends up not working
a

Alexandru Anghel

06/07/2022, 4:59 PM
Ok, i've added some dummy defaults and the schedule is turned on automatically. However, at runtime, i provide a json file containing all the required parameters. When i first submit the flow, it reads the parameters from the json file, but the next run is looking into the default dummy values that i specified. Worth mentioning that what my architecture logic is to only provide the parameter json file and run the prefect register and prefect run commands inside a ci/cd pipeline.