Hi! New to controlflow. There is lots of great d...
# marvin-ai
n
Hi! New to controlflow. There is lots of great documentation to read and lots to learn. For fun I thought I'd try to clone NotebookLM deep dive's script generation. I'm not getting the output I hoped for--with an error about half the time and a short back-and-forth otherwise. Is it a bad practice to pass very long context? What should I change in my prompt to avoid "openai.APIError: The model produced invalid content. Consider modifying your prompt if you are seeing this error persistently."
Here's some example output from when there isn't an error I was surprised to see a turn repeated
I thought it might be helpful to see the API traces. It wasn't obvious to me how to turn on logging (skill issue šŸ™ƒ ) but I did eventually arrive to the following to see what's going on
Copy code
import logging
logging.getLogger("controlflow").setLevel(logging.DEBUG)
logging.getLogger("openai").setLevel(logging.DEBUG)
I was also pleased that langtrace worked with no hiccups via this, to get a nice UI
Copy code
from langtrace_python_sdk import langtrace
langtrace.init(**config)
Anyway, I'm humbled -- I thought I'd pick up this agentic stuff fast šŸ˜‚
j
Hey @Nat Taylor! This is really cool and I love the idea
n
Thanks! I feel the same way about controlflow 😊
j
šŸ™‚
Quick notes: • controlflow.settings.log_level = 'DEBUG' will be slightly cleaner than going via logging module • that OpenAI error is EXTREMELY frustrating -- there is some bug on the API side where the model attempts to generate an invalid tool call. We have been unable to figure out how or why it happens, though we have been able to demonstrate it is deterministic by recreating it with the OpenAI native library (no ControlFlow). Because of that, and the fact that it's on the API side, we haven't been able to find a way around it. Sorry you hit it, I wish I had a more constructive piece of advice. As you iterate it will hopefully just naturally disapper.
Now on to the meat of your questions -- from a strictly "framework" point of view, CF is doing what you asked it to do, but here are a couple of ways you can get better results
n
BRB
j
The double "turn" is a little tricky - Bill is posting a message to the internal thread, and then all other conversation is happening as the agents pass the virtual mic back and forth. From a technical perspective, Bill's initial message and the subsequent delegation to Hillary are both part of the same "turn", as a turn is defined as any single agent being invoked one or more times. The reason a turn and an invocation aren't the same is that when an agent uses a tool, it frequently needs to be shown the tool result in a subsequent call. So both of those LLM invocations would constitute a single "turn". On the main branch I've actually been adding a utility that prevents agents from positing messages like that since it sometimes leads to redundant outcomes like this! Should be in a release sometime in the next few days.
You might get better results if you split up your single task into many tasks. The agents are nominally complying with your instructions but its all very implicit. Something like:
Copy code
@cf.flow
def podcast():
    lines = []
    while True:
        
        next_line = bill.run("generate the next line in the podcast")
        lines.append(next_line)
        next_line = hillary.run("generate the next line in the podcast")
        lines.append(next_line)

        # break somehow
    return '\n\n'.join(lines)
In this setup, its much more explicit that you want the agents to generate a line, though you give up the more natural yielding to each other via the delegation tool. However it might improve your ability to collect and introspect the outcome
The other advantage is that you can determine when to break the loop yourself or with a third task; in your single-task setup, the task is going to end the second the agents believe they have satisfied your objective, which may be too short. This is one of those fuzzy "we're all learning agent best practices" zones - I'm not sure which approach will get better outcomes for sure, but wanted to show you different ways of thinking about how to engage agents. Your approach is like a one-shot ask "hey agents, please do this" and the explicit loop takes much tighter control of the situation. There are many approaches in between!
By the way, if you want the agents to go back and forth instead of delegating to each other, this is a relevant example in the docs
Oh and it is not bad practice to pass long context, though now I am wondering if the fact that context is shown to the agent after the task definition could affect performance. I don't think it should
d
ok, so in my experience this error is often when you try to send a very long string into the model that is repetative. Aka, one word over and over again.
n
The text I am using has some repititons. I will try removing those
...it...it...it...it...it...it...
What kind of subjects do you remember? I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember. I remember.
I was like, oh, I'm sorry. I was like, oh, I'm sorry. I was like, oh, I'm sorry. I was like, oh, I'm sorry. I was like, oh, I'm sorry. I was like, oh, I'm sorry. I was like, oh, I'm sorry. I was like, oh, I'm sorry.
They were for women. And they were for women. And they were for women. And they were for women. And they were for women. And they were for women. And they were for women. And they were for women. And they were for women. And they were for women. And they were for women. And they were for women. And they were for women. And they were for women.
This text is the output of mlx-whisper, so maybe its a strange feedback cycle where the model gets tripped up during the transcription, then also tripped up as input tokens (although I don't have any hypotheses on what "tripped up" really means.)
I will try all these suggestions. Thank you!
On
openai.APIError: The model produced invalid content. Consider modifying your prompt if you are seeing this error persistently.
at least in my case it was often (always?) the result of an EMPTY response from the API, so I followed this thread: https://community.openai.com/t/empty-text-in-the-response-from-the-api-after-few-calls/2067/4 which says a space/newline at the end of the prompt can cause issues. So I added the string "Please follow the instructions." to the end of
llm_instructions.jinja
as a lazy way to avoid trailing newlines and I haven't hit the error since. Maybe there's an opportunity to add a
strip()
around the prompt as an experiment?
Looping is producing much better results - thank you!
j
@Nat Taylor sorry for not seeing this -- if you've solved the mystery of that OpenAI error message that would be INCREDIBLE
d
I ran into it today too while trying gpt4o-mini
So if you want to repro it that might be na easy way
n
šŸ˜• my suggestion may help but it's not the antidote
d
We really want controlflow to do managed retries though, right ?
j
Hmm, I just ran into this today. And I can't reproduce it. Did anyone ever figure this out?
d
This is something that has always happened with the openai models (aka not just controlflow) and it's something that you can really only handle with catching the exception and retrying.
j
@Jason (or anyone in this thread) - do you have any example of the code that produced it? My code that used to cause the error is working now so I'm having trouble testing a solution
j
its just running a task within a cf.Flow block. I presumed maybe it has to do with my prompt formatting(?), but that wouldn't necessarily explain the rarity of it.
d
Switch to gtp4o-mini and it will probably be easy to repro
j
@Jason is there any way to share what your code is or a stripped down version? It has something to do with formatting rather than complexity. I used to have an example that was literally a single one line task that I could replicate as a raw call to the OpenAI API and get the error