Greg Roche
06/09/2021, 2:46 PMMichael Wedekindt
06/09/2021, 4:26 PMwith Flow(name="dbt_flow") as f:
task = DbtShellTask(
log_stderr=True,
log_stdout=True,
return_all=True,
stream_output=True,
profiles_dir=profile_path
)(command='dbt run --models data_hub')
out = f.run()
I got this error message back
[2021-06-09 16:06:15+0200] INFO - prefect.FlowRunner | Beginning Flow run for 'dbt_flow'
[2021-06-09 16:06:15+0200] INFO - prefect.TaskRunner | Task 'DbtShellTask': Starting task run...
[2021-06-09 16:06:15+0200] INFO - prefect.DbtShellTask | /bin/bash: C:\Users\M7856~1.\AppData\Local\Temp\prefect-5yoqhqg_: No such file or directory
[2021-06-09 16:06:15+0200] ERROR - prefect.DbtShellTask | Command failed with exit code 127
[2021-06-09 16:06:15+0200] INFO - prefect.TaskRunner | FAIL signal raised: FAIL('Command failed with exit code 127')
[2021-06-09 16:06:15+0200] INFO - prefect.TaskRunner | Task 'DbtShellTask': Finished task run for task with final state: 'Failed'
[2021-06-09 16:06:15+0200] INFO - prefect.FlowRunner | Flow run FAILED: some reference tasks failed.
When I try this with the common ShellTask I get the same error.
Here some background facts:
I already added bash to path and I can open a bash console
I use Visual Studio Code for developing
My Python Version is 3.9.5
My prefect Version is 0.14.21
Usually I run my dbt locally in the Ubuntu Subsystem on Windows but could run dbt in windows cmd as well
Thanks in advance for help and best regards, MichaelAndrew Hannigan
06/09/2021, 6:11 PMClientException('An error occurred (ClientException) when calling the RegisterTaskDefinition operation: Too many concurrent attempts to create a new revision of the specified family.')
Fina Silva-Santisteban
06/09/2021, 6:38 PMflow.register()
method to register flows, I also set the flows’ run config and storage along with that:
flow.run_config = (some run config)
flow.storage = (some storage config)
flow.register(project_name=(project name))
I’d like to switch to using the prefect cli’s register functionality. How can I point it to the run config and storage?Peter Roelants
06/09/2021, 7:14 PMerror: Signature of "run" incompatible with supertype "Task"
For example for following Task:
class RestTaskTest(prefect.Task):
def run(self, string_val: str, int_val: int) -> TaskTestResult:
return TaskTestResult(string_val=string_val, int_val=int_val)
This makes sense since the definition of Task.run
is def run(self) -> None:
.
Is there any way to make mypy work together with overriding of Task.run in custom tasks? Or is the only option to ignore these with # type: ignore
?Zach Schumacher
06/09/2021, 7:40 PMJonathan Chu
06/09/2021, 7:57 PMFlow Concurrency
and Task Concurrency
?
i have a flow running right now, and i can see it shows as 1 running flow (100%)
but then 0 running tasks (0%)
Leon Kozlowski
06/09/2021, 9:20 PMFelipe Saldana
06/09/2021, 10:14 PMif __name__ == "__main__":
Brett Naul
06/09/2021, 10:29 PMidempotency_key
<24 hours apart (say 30 mins). is the idempotency key queryable from graphql so we could try to debug what's going on?Ben Collier
06/10/2021, 5:17 AMclient.get_flow_run_info(flow_run_id)
with a flow run id in the format “c2686800-3c0f-4b67-83b2-f85fa6fd6773”
I’m getting the following:
'message': 'invalid input syntax for type uuid: "c2686800-3c0f-4b67-83b2-f85fa6"'
I assume I’m doing something fairly stupid - could someone explain why the UUID is in the wrong format?Talha
06/10/2021, 8:27 AMNoah Holm
06/10/2021, 9:22 AMThomas Nyegaard-Signori
06/10/2021, 11:05 AMprefect agent kubernetes install -rbac ...
command, so the RBAC is functioning on the agent. When starting a very simple flow that consists of a single RunNamespacedJob
task running the custom Docker image the job pod starting the flow runs into RBAC issues, but the RunNamespacedJob
task pod runs fine. My question is, how to handle job pods that are going to spawn several jobs on Kubernetes and the issues that arise with the RBAC on these pods. Am I thinking about this incorrectly? The error, for reference:
HTTP response headers: HTTPHeaderDict({'Audit-Id': '67b79e3e-ab13-45ee-8ad5-2ae1769c6a7f', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Thu, 10 Jun 2021 09:21:02 GMT', 'Content-Length': '372'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch \"cmems-historical\" is forbidden: User \"system:serviceaccount:prefect-zone:default\" cannot get resource \"jobs/status\" in API group \"batch\" in the namespace \"prefect-zone\"","reason":"Forbidden","details":{"name":"cmems-historical","group":"batch","kind":"jobs"},"code":403}
Dmytro Kostochko
06/10/2021, 1:34 PMdetect_new_documents
), load data that should be extracted(get_skill_extractor
). Also this data should be periodically reloaded from database if new version was added. Save results(save_skills
).
I use such flow:
with Flow("Analyze new documents") as flow:
documents = detect_new_documents()
extracted_skills = extract_skills.map(documents, unmapped(get_skill_extractor()))
save_skills.map(documents, extracted_skills)
And looks like I need some stateful worker to keep instance of skill_extractor in memory and share for all tasks, because now it is passed with tasks to Dask workers, and serializing/deserializing takes more time then processing.
Maybe someone can give some advice how this can be implemented with Prefect?Devin McCabe
06/10/2021, 1:50 PMChristian Eik
06/10/2021, 2:29 PM@task(state_handlers=[post_task_fail])
def add_headers(headers: list, data: tuple) -> List:
logger = prefect.context.get("logger")
<http://logger.info|logger.info>(prefect.config.flows.checkpointing)
<http://logger.info|logger.info>(prefect.config.tasks.checkpointing)
return headers + list(data)
*edit: but -> and config values ...John Ramirez
06/10/2021, 7:47 PMBen Muller
06/10/2021, 8:50 PMprefect agent
in an ECS Fargate cluster.
I am feeding in all my secrets via the cdk namely, RUNNER_TOKEN_FARGATE_AGENT
, AWS_SECRET_ACCESS_KEY
and AWS_ACCESS_KEY_ID
I am getting some type of auth error when I run my image command
CMD [ "prefect", "agent", "ecs", "start", "--token", "$RUNNER_TOKEN_FARGATE_AGENT", \
"--task-role-arn=arn:aws:iam::********:role/ECSTaskS3Role", \
"--log-level", "INFO", "--label", "fargate-dev", "--label", "s3-flow-storage", \
"--name", "fargate-demo" ]
The trace in 🧵
``````
Any ideas would be great!
I assumed I dont have to register the agent with Prefect cloud as I did this with Ec2 and it didnt have to be done.
Thanks in advance for any help on this...Michael Law
06/10/2021, 8:55 PMTim Enders
06/10/2021, 9:52 PMjoshua mclellan
06/10/2021, 10:09 PMRobert Bastian
06/10/2021, 10:13 PMBen Muller
06/11/2021, 4:10 AMResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 1 time(s): AccessDeniedException: User: arn:aws:sts::*****:assumed-rol...
Both my execution role and task role have admin access ( while i am debugging ) so i dont think it is a role based issue.
Wondering if anyone has come across something like this before and can help out.
Oh and this is running on ECS and fargate, I have attached my flow code in the 🧵Shivam Shrey
06/11/2021, 7:09 AMEmma Rizzi
06/11/2021, 12:59 PMPedro Machado
06/11/2021, 1:36 PMPrefectResult
in a flow that has a resource manager. The idea is to easily allow flow restarts without having to set up a different result backend.
However, I am running into an issue with the resource manager tasks. The setup
method of the resource manager returns an object that is not json serializable which causes the task to fail. Do you have any suggestions to workaround this? Is the only solution to use a different result backend for this task? Thanks!Marie
06/11/2021, 2:04 PMbind
a few different ways without being able to separate the outputs of the first task (see simple example below). Does anyone know how to do this? Thanks!
flow = prefect.Flow("example-flow")
flow.add_task(task_a) # Task A returns 2 outputs
flow.add_task(task_b) # Task B uses the first output of task A
flow.add_task(task_c) # Task C uses the first output of task A
task_b.bind(job=task_a, flow=flow)
task_c.bind(report=task_a, flow=flow)
Talha
06/11/2021, 2:13 PMNathan Atkins
06/11/2021, 4:13 PMNathan Atkins
06/11/2021, 4:13 PMKevin Kho
06/11/2021, 4:32 PMNathan Atkins
06/11/2021, 4:35 PM