Arun Giridharan
09/07/2022, 6:10 PMChris Gunderson
09/07/2022, 7:01 PMfrom prefect.blocks.system import *
sraDatabaseSecretName = String.load("sra-database") #This is the name of the secret
Traceback (most recent call last):
File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/client.py", line 1268, in read_block_document_by_name
response = await self._client.get(
File "/opt/pysetup/.venv/lib/python3.8/site-packages/httpx/_client.py", line 1751, in get
return await self.request(
File "/opt/pysetup/.venv/lib/python3.8/site-packages/httpx/_client.py", line 1527, in request
return await self.send(request, auth=auth, follow_redirects=follow_redirects)
File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/client.py", line 279, in send
response.raise_for_status()
File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/client.py", line 225, in raise_for_status
raise PrefectHTTPStatusError.from_httpx_error(exc) from exc.cause
prefect.exceptions.PrefectHTTPStatusError: Client error '404 Not Found' for url 'http://ephemeral-orion/api/block_types/slug/string/block_documents/name/sra-database?include_secrets=true'
Response: {'detail': 'Block document not found'}
For more information check: https://httpstatuses.com/404Mike Vanbuskirk
09/07/2022, 7:14 PMJoshua Massover
09/07/2022, 7:55 PM[2022-09-07 19:14:16,051] DEBUG - agent | Deleting job prefect-job-c10e0512
• the killing event in my k8s cluster
apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2022-09-07T19:13:23Z"
involvedObject:
apiVersion: v1
...
kind: Pod
name: prefect-job-c10e0512-9wm4j
...
kind: Event
lastTimestamp: "2022-09-07T19:13:23Z"
message: Stopping container prefect-container-prepare
...
reason: Killing
....
type: Normal
• i can see via metrics that i am not oom'ing or doing anything that seems like it should trigger the job being killed
• a single flow is running on its own node controlled via the kubernetes cluster autoscaler
• i don't see any reason why the cluster autoscaler would be killing this node, and safe-to-evict is set to false.
• my application logs always just end, there's nothing suspicious in the logs
• there aren't obvious patterns to me. it's not the same job, it's not happening after x amount of minutes.
• i've switched to threaded heartbeats, and then most recently turned off heartbeats entirely, and it hasn't fixed it
1. there's a chicken/egg that i'm not sure about. in the agent log, is it issuing a request to the k8s cluster to kill a job? or is it deleted after the kubernetes job kills it for some reason?
2. Any suggestions for how to debug a killed flow in a kubernetes cluster using cluster autoscaling? I can see that it's being killed by the event, it's a herculean task to figure out why it's killed.John Mizerany
09/07/2022, 10:03 PMsys.path.append
to include the module in our PYTHONPATH but that did not work. We are using Git Remote storage but it seems the agent we are using is not able to pick up on the custom files/modules we wrote in the subdirectory (we are still using prefect cloud 1.0 and the UI when we create a run gives us Failed to load and execute flow run: ModuleNotFoundError
)Ankur Sheth
09/07/2022, 10:21 PMEmerson Franks
09/07/2022, 11:40 PMMark Li
09/08/2022, 1:52 AMYoung Ho Shin
09/08/2022, 4:28 AMsqlalchemy
errors when running a test flow with many tasks (>10000) locally. Here's the code I'm running:
https://gist.github.com/yhshin11/1832bc945446a62c5c6152abb9c1a0a5
It seems like the problem has to do with the fact that there are too many tasks that are trying to write to the Orion database at the same time. I tried switching to a Postgres database as described in the [docs](https://docs.prefect.io/concepts/database/), and also adding concurrency limit of 10. Neither seems to fix the issues. Any ideas about how to fix this?
Here's an example of the kind of errors I'm getting:
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: <https://sqlalche.me/e/14/3o7r>)
Zac Hooper
09/08/2022, 4:31 AMAndreas Nord
09/08/2022, 9:06 AMBrad
09/08/2022, 9:08 AMDavid Peláez
09/08/2022, 9:58 AMVadym Dytyniak
09/08/2022, 10:47 AMBradley Collins
09/08/2022, 12:30 PMAnirudh MV
09/08/2022, 12:38 PMNo such command 'backend'.
Please help me fix this. Could this be because my prefect client and prefect server might be running different versions?Eli Treuherz
09/08/2022, 2:04 PMRaghuram M
09/08/2022, 2:11 PMFelipe Fernandez
09/08/2022, 2:54 PMDat Tran
09/08/2022, 3:07 PMIgor Morgunov
09/08/2022, 3:28 PMMapped Child X
? I know I can get it from logs, but would be cool if I could see it in this screenSlackbot
09/08/2022, 3:35 PMJeffery Newburn
09/08/2022, 4:56 PMVenkat Ramakrishnan
09/08/2022, 5:07 PMRoger Webb
09/08/2022, 5:19 PMMark Li
09/08/2022, 5:59 PMError: No such command 'orion'.
I’m assuming this failure is coming from when it’s calling ‘prefect orion start’
Does anyone know what could be causing prefect to not recognize the orion command?Venkat Ramakrishnan
09/08/2022, 6:02 PMAndrew Pruchinski
09/08/2022, 6:17 PM%
and pandas.read_sql. There seems to be an issue during compilation during a prefect flow run where it's complaining about the wildcard. We can run the queries outside of a prefect run no problem. When executing in prefect, we are getting formatting errors. Errors listed in threadsMarc Lipoff
09/08/2022, 6:30 PMMarc Lipoff
09/08/2022, 6:30 PMlast_modified_begin, last_modified_end, partition = make_chunks(
target_start_time,
to_timedelta(poll_freq),
to_timedelta(total_duration_to_run),
)
# lists the files in the chunk (based on modified timestamp)
files = S3ListGracefulAndWait(bucket=s3_bucket_name.run()).map(
prefix=unmapped("fargate"),
last_modified_begin=last_modified_begin,
last_modified_end=last_modified_end,
)
df = read_files.map(files)
### and then take the dataframe and push to database, ...
class S3ListGracefulAndWait(S3List):
def run(
self,
prefix: str,
partition: str,
last_modified_begin: datetime.datetime,
last_modified_end: datetime.datetime,
) -> list[str]:
# using the partitions is important because of the number of files. without at least pre-filtering on dt, the s3list takes way too long
prefix += "/dt=" + partition
if last_modified_end < now_(): # has passed
try:
log().info("Starting to list s3 files...")
res = super().run(
prefix=prefix,
last_modified_begin=datetime_to_string(last_modified_begin),
last_modified_end=datetime_to_string(last_modified_end),
)
log().info(
f"S3ListGracefulAndWait run prefix={prefix} last_modified_begin={last_modified_begin} last_modified_end={last_modified_end}. Result={res}"
)
if len(res) == 0:
raise signals.SKIP(
f"No files available for dt={partition} {last_modified_begin} to {last_modified_end}"
)
return [f"s3://{self.bucket}/{x}" for x in res]
except Exception as e:
log().error(e, exc_info=True)
raise signals.SKIP(
f"Failed to get s3 files for dt={partition} {last_modified_begin} to {last_modified_end}. {e}"
)
else:
raise signals.RETRY(
message=f"Going to retry at {last_modified_end}",
start_time=last_modified_end,
)
Anna Geller
09/09/2022, 1:43 AM