Philipp Eisen
01/19/2022, 12:02 AMConsider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and
keep data on workers
future = client.submit(func, big_data) # bad
big_future = client.scatter(big_data) # good
future = client.submit(func, big_future) # good
I was wondering, what would be the prefect pattern here to scatter the object ahead of time?Jacob Blanco
01/19/2022, 1:52 AMAkharin Sukcharoen
01/19/2022, 7:46 AMThomas Pedersen
01/19/2022, 8:14 AMSamay Kapadia
01/19/2022, 8:59 AM{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "jobs.batch \"dummy\" is forbidden: User \"system:serviceaccount:default:default\" cannot get resource \"jobs/status\" in API group \"batch\" in the namespace \"default\"",
"reason": "Forbidden",
"details": {
"name": "dummy",
"group": "batch",
"kind": "jobs"
},
"code": 403
}
For context, I’ve applied the yaml from prefect agent kubernetes install --rbac
so all the permissions should work in theory. I'm stuck at what could be wrongMuddassir Shaikh
01/19/2022, 9:23 AM[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'Tuple': Finished task run for task with final state: 'Success'
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'List': Starting task run...
[2022-01-19 14:45:49+0530] INFO - prefect.TaskRunner | Task 'List': Starting task run...
[2022-01-19 14:45:50+0530] INFO - prefect.TaskRunner | Task 'List': Starting task run...
[2022-01-19 14:45:50+0530] INFO - prefect.TaskRunner | Task 'List': Starting task run...
Muddassir Shaikh
01/19/2022, 9:25 AMTony Waddle
01/19/2022, 10:14 AMYueh Han Huang
01/19/2022, 10:16 AMMuddassir Shaikh
01/19/2022, 12:01 PMFile "/home/infra/prefect_server/lib/python3.8/site-packages/prefect/client/client.py", line 603, in _send_request
response = <http://session.post|session.post>(
File "/home/infra/prefect_server/lib/python3.8/site-packages/requests/sessions.py", line 590, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/home/infra/prefect_server/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/home/infra/prefect_server/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/home/infra/prefect_server/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=4200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3be9b9e700>: Failed to establish a new connection: [Errno 111] Connection refused'))
Example: My GUI server is hosted on Machine A and one agent is Machine B, the code present on Machine B is to be run and registered on Machine B but should show its task on the Machine A GUI.Samay Kapadia
01/19/2022, 1:13 PMdummy
runs on the aks-spot
node but prefect-job
runs on the aks-system
node (and I don’t want it running on the system node pool). Is there a way to configure tolerations and affinities for the prefect-job
pod?Thomas Opsomer
01/19/2022, 2:44 PMLuis Aguirre
01/19/2022, 3:09 PMMichail Melonas
01/19/2022, 3:10 PMTomek Florek
01/19/2022, 3:41 PMTom Shaffner
01/19/2022, 4:23 PMJake
01/19/2022, 4:42 PMMuddassir Shaikh
01/19/2022, 5:35 PMSuresh R
01/19/2022, 6:41 PMbrian
01/19/2022, 7:03 PMbrian
01/19/2022, 7:57 PMTrigger was "all_successful" but some of the upstream tasks failed.
but all the upstream tasks were successful. Am I missing something?Martim Lobao
01/19/2022, 11:33 PMTony Yun
01/19/2022, 11:51 PMVaruna Bamunusinghe
01/20/2022, 6:06 AMSuresh R
01/20/2022, 7:01 AMAnurag Bajpai
01/20/2022, 9:15 AMclient.get(f"repositories/{self.workspace}/{self.repo}/refs/branches")
call to get the list of branches actually returns a paginated list, and the method is not able to find the hash corresponding to the branch if the branch is not included in the first page. Additionally, the error raised in case the branch is not found is not formatted properly (it's a string instead of an f-string).Stefan Rasmussen
01/20/2022, 9:37 AMPhilipp Eisen
01/20/2022, 2:32 PMNo heartbeat detected from the remote task; marking the run as failed.
Is there some obvious things to look for?Thomas Opsomer
01/20/2022, 2:48 PMNo heartbeat detected...
. Usually It happens in 2 situations:
• the pod that run the tasks gets evicted / OOM killed
• the pod was running on a preemptible node that gets removed and replaced.
Is there something on the k8s agent, k8s job specification, or something else to configure to allow k8s to reschedule the job and let prefect know about it, so that the flow would continue ?!Florian Kühnlenz
01/20/2022, 4:40 PM