Thread
#prefect-community
    Adam

    Adam

    1 year ago
    Hello friends, happy Monday. I’m getting a strange error in Prefect regarding
    Failed to set task state with error: HTTPError('413 Client Error: Request Entity Too Large for url: <https://api.prefect.io/graphql>')
    — whats the best way to debug this?
    It occurs between two tasks, where the first task is passing a list of filenames to the second. I suspect that list of filenames results in a query that is too large? But surely that is not sent to the API?
    Chris White

    Chris White

    1 year ago
    Hi @Adam - what type of Result configuration are you using for this task / for your flow?
    Adam

    Adam

    1 year ago
    Hi @Chris White, we’re not using any
    Chris White

    Chris White

    1 year ago
    What is your Flow’s storage configuration?
    Adam

    Adam

    1 year ago
    Docker. I recently updated to agent to 0.13.18 but the flow might still be on 0.13.8. Could that be an issue?
    Chris White

    Chris White

    1 year ago
    hmmm no, I don’t think so; without a result configuration I’m really surprised your state object was large. If you can reproduce this, perhaps do the following:
    def size_handler(task, old, new):
        <http://prefect.context.logger.info|prefect.context.logger.info>(f"State {new} has serialized representation: {new.serialize()}")
    and then add this as a state handler to your affected task:
    @task(state_handlers=[size_handler])
    ...
    
    # or
    
    my_task(state_handlers=[size_handler])
    and you’ll probably want to look at the logs in STDOUT because there’s a chance this log will be rejected by the API as well because it might be large
    Adam

    Adam

    1 year ago
    Thanks @Chris White, I’ll try the above
    Hi @Chris White, here is the output of the state handler:
    State <Running: "Starting task run."> has serialized representation: {'message': 'Starting task run.', 'context': {'tags': []}, 'cached_inputs': {}, '_result': {'__version__': '0.13.18', 'type': 'NoResultType'}, '__version__': '0.13.18', 'type': 'Running'}
    Although from looking at the stdout logs, it appears that my code is somehow logging contents of a file to stdout. As I am using
    log_stdout=True
    for this task, it seems thats the culprit. Very strange though that its outputting the contents of a file. This is one of the methods that gets called from within the task
    def clean_file(filepath: str):
        print(f"Cleaning file {filepath}")
        filename = path.basename(filepath)
        with open(filepath, "r") as fin:
            data = fin.read().splitlines(True)
            for index, value in enumerate(data):
                data[index] = "|".join([filename, value])
        with open(filepath, "w") as fout:
            # skip line 1 (header) and skip last line (footer)
            fout.writelines(data[1 : len(data) - 1])
        return filepath
    It appears
    fout.writelines
    is coming up on stdout rather than into the file. Or am I mistaken?
    I’ve removed all
    log_stdout
    and explicity use the prefect logger, but I’m still seeing the actual file written to stdout. It used to work though, so very confused 😕
    Locally when running
    flow.run()
    I don’t have this issue though
    Chris White

    Chris White

    1 year ago
    So the original traceback was for a state, logs are batched up in the background and wouldn’t cause a state update failure; you only posted the serialized form for the
    Running
    state — what did the final state look like?
    Adam

    Adam

    1 year ago
    Hi @Chris White So I’m no longer seeing the “set task state” error, but I am seeing this error:
    Failed to write log with error: 413 Client Error: Request Entity Too Large for url: <https://api.prefect.io/graphql>
    It never proceeds further than a few of those errors as eventually it says
    No heartbeat detected from the remote task; marking the run as failed.
    That being said, when I look at the actual logs on the container, I see a file being printed which shouldn’t be (it seems this file being printed is sent as a log). I have disabled all
    log_stdout
    properties on the task though. Any ideas?
    Hey @Chris White, good morning 🙂 Any ideas? Do you have any idea why writing to a file is logged to stdout when running in Kuebrnetes, but not locally?
    It’s causing one of our main ETL jobs to fail 😦
    Chris White

    Chris White

    1 year ago
    Hey @Adam - I think this will be better as a GitHub issue w/ some code snippets - I don’t have enough information to debug here; the symptoms you’re describing appear contradictory to me; the failure to write a log has no bearing on the final state of your task runs, so there’s something missing that we’ll need to identify
    Adam

    Adam

    1 year ago
    Sure, will post as an issue with the code snippets
    Thanks!
    Chris White

    Chris White

    1 year ago
    👍 👍