<@ULVA73B9P> My flow is failing with 'Value error,...
# ask-marvin
j
@Marvin My flow is failing with 'Value error, Flow run parameters must be less than 512KB when serialized.', I have lots of questions - 1. Is that limit configurable at all? 2. How do I get the actual size of my serialized parameters for a given flow run? 3. On prefect UI, it just says "flow run failed" for my subflow and I had to instrument my worker to get the above error. Are the detailed errors supposed to be logged somewhere? If so, where do I look?
m
thought for 86.5 seconds
You can use
prefect flow-run logs <flow-run-id>
to see detailed logs for any flow run, including subflows. b) In Your Code: To ensure errors are properly logged, use Prefect's built-in logging:
Copy code
python
from prefect import get_run_logger

@flow
def your_flow():
    logger = get_run_logger()
    try:
        # your flow code
    except Exception as e:
        logger.error(f"Flow failed with error: {str(e)}")
        raise
c) Worker Logs: Since you mentioned instrumenting your worker, you're on the right track. Worker logs can be valuable for debugging deployment-related issues. Make sure your worker's log level is set appropriately (INFO or DEBUG for more detail). Recommendations for Your Case: 1. Consider breaking up large parameters into smaller chunks or using alternative methods like: - Storing large data in a block and passing the block ID instead - Using intermediate storage (like S3, local filesystem, etc.) and passing references - Using Prefect's artifact system for large data transfers between flows 2. For better error visibility, make sure you're using proper logging in your flows and subflows using
get_run_logger()
. 3. If you need to debug specific flow runs, use the CLI command
prefect flow-run logs <flow-run-id>
to get detailed logs. Would you like me to elaborate on any of these points or provide more specific examples for your use case?
j
@Marvin The logger also failed: ValueError: Log of size 3497160 is greater than the max size of 1000000
m
thought for 33.3 seconds
k
Worker logs are reachable in the UI too! go to your work pool, then workers, then click on the worker name to see its logs
on cloud anyway
m
Based on the research, I can provide a comprehensive answer about handling both your parameter and log size limits: 1. Log Size Limit (1MB) You can configure this limit using environment variables:
Copy code
bash
# Increase the maximum log size to 5MB (5000000 bytes)
export PREFECT_LOGGING_MAX_SIZE=5000000
2. Alternative Logging Strategies: When dealing with large logs, consider these approaches: - Break down your logging into smaller chunks - Use log rotation - Log only critical information to Prefect and use external logging systems for verbose logs - Use Prefect's
get_run_logger()
with appropriate log levels:
Copy code
python
from prefect import get_run_logger

@flow
def your_flow():
    logger = get_run_logger()
    # Use appropriate log levels
    logger.debug("Detailed debug info")  # For detailed debugging
    <http://logger.info|logger.info>("Important but not huge info")  # For general information
    logger.error("Critical errors only")  # For errors
3. Handling Large Parameters (512KB limit) Since you're hitting both parameter and log size limits, here are some recommended approaches: a) Use Blocks for Large Data:
Copy code
python
from prefect.blocks.system import JSON

@flow
def your_flow():
    # Store large data in a block
    json_block = JSON(value=your_large_data)
    json_block.save("large-data-block")
    
    # Pass the block name instead of the data
    process_data(block_name="large-data-block")

@flow
def process_data(block_name: str):
    # Retrieve data from block
    data = JSON.load(block_name).value
    # Process data
b) Use File Storage:
Copy code
python
from pathlib import Path

@flow
def your_flow():
    # Save large data to file
    temp_path = Path("/tmp/large_data.json")
    with open(temp_path, "w") as f:
        json.dump(your_large_data, f)
    
    # Pass file path instead of data
    process_data(file_path=str(temp_path))
c) Break Down into Smaller Chunks:
Copy code
python
@flow
def main_flow(large_data):
    # Break data into chunks
    chunks = [large_data[i:i+1000] for i in range(0, len(large_data), 1000)]
    
    # Process each chunk separately
    for chunk in chunks:
        process_chunk(chunk)

@flow
def process_chunk(chunk):
    # Process smaller amount of data
    pass
4. Monitoring and Debugging To better track these issues: - Use the Prefect UI's logs view for successful runs - Check worker logs for deployment issues - Set up external logging for detailed debugging - Use
prefect flow-run logs <flow-run-id>
for specific flow runs Would you like me to elaborate on any of these solutions or provide more specific examples for your use case? Also, it would be helpful to know more about the nature of your large parameters - are they data files, configuration settings, or something else?
j
@Kevin Grismore I'm on cloud, but the log was too long to be sent back to prefect, see second note above
(I'm on cloud, but local infrastructure)
k
that log being the error from the worker? was it a huge stack trace
j
Not a terribly huge stack trace, but a terribly huge parameter that was trying to be printed
and logged
and serialized
Maybe "quote" it?
k
ahhh I see
quoting just skips in-memory recursive walking of the param to see if it was returned from another task
upvote 1
if it's large data, you could write it out to blob storage, pass the path to it as a param, and read it in in your subflow
upvote 1
j
So, if I move pulling the parameter from the database from the parent flow, to the start of the subflow, and then "quote" it for each task in the subflow, that might work?
@Kevin Grismore FYI - Logged #18545 and #18546 on github related to this problem.
n
quoting the parameters will not make a difference in terms of what is sent to the api, like kevin said it will just skip recursive exploration of the object that is the parameter (used to automatically resolve futures etc). also worth mentioning that task run parameters are never sent to the API, whereas flow parameters are
taking a look at the issues, thanks!