Hello friends, happy Monday. I’m getting a strange...
# prefect-community
a
Hello friends, happy Monday. I’m getting a strange error in Prefect regarding
Failed to set task state with error: HTTPError('413 Client Error: Request Entity Too Large for url: <https://api.prefect.io/graphql>')
— whats the best way to debug this?
It occurs between two tasks, where the first task is passing a list of filenames to the second. I suspect that list of filenames results in a query that is too large? But surely that is not sent to the API?
c
Hi @Adam - what type of Result configuration are you using for this task / for your flow?
a
Hi @Chris White, we’re not using any
c
What is your Flow’s storage configuration?
a
Docker. I recently updated to agent to 0.13.18 but the flow might still be on 0.13.8. Could that be an issue?
c
hmmm no, I don’t think so; without a result configuration I’m really surprised your state object was large. If you can reproduce this, perhaps do the following:
Copy code
def size_handler(task, old, new):
    <http://prefect.context.logger.info|prefect.context.logger.info>(f"State {new} has serialized representation: {new.serialize()}")
and then add this as a state handler to your affected task:
Copy code
@task(state_handlers=[size_handler])
...

# or

my_task(state_handlers=[size_handler])
and you’ll probably want to look at the logs in STDOUT because there’s a chance this log will be rejected by the API as well because it might be large
a
Thanks @Chris White, I’ll try the above
Hi @Chris White, here is the output of the state handler:
Copy code
State <Running: "Starting task run."> has serialized representation: {'message': 'Starting task run.', 'context': {'tags': []}, 'cached_inputs': {}, '_result': {'__version__': '0.13.18', 'type': 'NoResultType'}, '__version__': '0.13.18', 'type': 'Running'}
Although from looking at the stdout logs, it appears that my code is somehow logging contents of a file to stdout. As I am using
log_stdout=True
for this task, it seems thats the culprit. Very strange though that its outputting the contents of a file. This is one of the methods that gets called from within the task
Copy code
def clean_file(filepath: str):
    print(f"Cleaning file {filepath}")
    filename = path.basename(filepath)
    with open(filepath, "r") as fin:
        data = fin.read().splitlines(True)
        for index, value in enumerate(data):
            data[index] = "|".join([filename, value])
    with open(filepath, "w") as fout:
        # skip line 1 (header) and skip last line (footer)
        fout.writelines(data[1 : len(data) - 1])
    return filepath
It appears
fout.writelines
is coming up on stdout rather than into the file. Or am I mistaken?
I’ve removed all
log_stdout
and explicity use the prefect logger, but I’m still seeing the actual file written to stdout. It used to work though, so very confused 😕
Locally when running
flow.run()
I don’t have this issue though
c
So the original traceback was for a state, logs are batched up in the background and wouldn’t cause a state update failure; you only posted the serialized form for the
Running
state — what did the final state look like?
a
Hi @Chris White So I’m no longer seeing the “set task state” error, but I am seeing this error:
Copy code
Failed to write log with error: 413 Client Error: Request Entity Too Large for url: <https://api.prefect.io/graphql>
It never proceeds further than a few of those errors as eventually it says
No heartbeat detected from the remote task; marking the run as failed.
That being said, when I look at the actual logs on the container, I see a file being printed which shouldn’t be (it seems this file being printed is sent as a log). I have disabled all
log_stdout
properties on the task though. Any ideas?
Hey @Chris White, good morning 🙂 Any ideas? Do you have any idea why writing to a file is logged to stdout when running in Kuebrnetes, but not locally?
It’s causing one of our main ETL jobs to fail :(
c
Hey @Adam - I think this will be better as a GitHub issue w/ some code snippets - I don’t have enough information to debug here; the symptoms you’re describing appear contradictory to me; the failure to write a log has no bearing on the final state of your task runs, so there’s something missing that we’ll need to identify
a
Sure, will post as an issue with the code snippets
Thanks!
c
👍 👍