This doesn't look right to me.... ```import cont...
# marvin-ai
c
This doesn't look right to me....
Copy code
import controlflow.tools.code as tc
agent = cf.Agent(tools=[tc.shell, tc.python])

await cf.run_async(
    objective="What tools can you invoke?",
    agent=agent,
    handlers=...
)
Result
Copy code
I can invoke the functions for marking tasks successful or failed, specifically `mark_task_0fd0eb0b_successful` and `mark_task_0fd0eb0b_failed`, as well as use the `multi_tool_use.parallel` function to execute tools in parallel.
🤔
I can understand what's going on though. It manages prefect tasks under the hood, and they're passed as tools to the Agent. However, we do need some sort of separation of concerns between what I would call "system tools" and "user-agent tools"
Am I misusing controlflow here? @Jeremiah
Instead of parametrizing
cf.run_async
with an
Agent
, and populating the
tools
argument directly, I do get the user defined tools in the output:
Copy code
1. shell - to execute shell commands.
2. python - to execute Python code locally.
3. mark_task_f86d644d_successful - to mark task ID f86d644d as successful.
4. mark_task_f86d644d_failed - to mark task ID f86d644d as failed.
However system tools are still showing up. Ain't looking well, I presume this could be hard to address
j
multi_tool_use.parallel is a (not common but not uncommon) OpenAI hallucination
I think we have special system instructions that tell it not to attempt to to use it but might have deleted bc it was ineffective
ah ok i see
pass
agents=[agent]
not singular
^ agents will work because its passed to the task directly; I consider
agent
not working to be a bug because it should and will get a new release out today to fix it. The kwarg just isn't making it to the task
i'll double check though
image.png
c
You managed to reproduce the background issue I was trying to point out: you're invoking an agent expecting a generation, yet, the feedback that you've got is that there's been a tool call - namingly
mark_task_xyz_successful
, which prefect uses to notify task completion. I get it - it does the job in managing prefect API comms. This also comes with the drawback of everything going through
cf.run
, the simplest "say hello" agent interaction, coming out with tool calls, and users that haven't been there get confused quickly. When invoking an agent, we expect to receive feedback upon tools invocation from those we provided as arguments in
tools=[..]
, aka the user-defined agent tools. On the other hand Prefect tools belong to a "system" domain which should not be displayed to the user, from an ideological separation of concerns. One purposeful brainstorming initiative is to ask whether that entity is meant to be consumed by the user, or rather kept internal, hidden. If you would ask me, I can tell straightaway I wouldn't mind applying a facade process to filter out the
mark_task_*
glob. Though we could likely all agree this is a patch on the wound. What would make sense is if this [pre]filter is managed by controlflow's stream process instead. It's an interesting challenge and I'd be keen to engage in its developments, especially to understand if there are other complex solutions worth exploring (imagine forked sessions). Let me know what's on your mind
j
Hey @Constantin Teo -- For your pragmatic question, tools support a metadata field and we can put something like
is_completion_tool=True
or something similar there to facilitate filtering out those events. We used to have that, I see it's no longer in the codebase, simple to add. In general though I think you're thinking of agents as "things you chat with" and ControlFlow will be a far more powerful framework if you think of them as "things you delegate to". ControlFlow's core abstraction is a task (note: not a Prefect task. CF would work exactly the same, including mark_successful tools, without Prefect). This is because the problem the framework solves is how to move between the structured world of a script and the unstructured world of agentic behavior. The task becomes a contract for determining control at any time. As a user, you create tasks to represent concrete outcomes you require. When you
.run()
the task, control is yielded to the agentic loop. The agent can do whatever it wants, with whatever tools it has, but ultimate needs to use a
mark_task
tool to indicate that it is finished and return a result. This satisfies the task contract, and control is returned to the workflow script. Managing that back and forth of control is the core gameplay loop, so to speak, of ControlFlow. That's why the "system" tools are actually of primary interest to you as the developer, and the "agent" tools are actually secondary from the framework's perspective. Granted that's a slightly exaggerated characterization, they are obviously critically important as well for understanding behavior, but the task's status is the thing the CF world revolves around. Now, if you want to use CF to build a chatbot that is a perfectly valid approach. @Marvin in this Slack is an example of such a bot! But the small mental model shift is that you're not asking the agent for a generation; you're giving the agent a task of generating a response to an external user. The agent is explicitly told that it is interacting with an orchestrator, not an external user, and so you should consider the CF developer as the primary person for whom information is intedended, not the person who thinks they are chatting directly with the AI. That's why cf.run("Say hi") involving a tool shouldn't come as a surprise: under the hood, we're creating a contract for an agent to fulfil, and its using a structured tool instead of a message to do it. The fact that it's also possible to satisfy that without a task-centric agent framework (e.g. just a single chat completion message) shouldn't dictate expectations for the entire framework. I realize this is a long explanation for what may seem like a very small semantic adjustment but it informs a lot of the design choices in the framework ("code first, chat second"). tldr: agree, more control over outputs is good. Also your conception of who the primary "user" of CF is, and what information they need, is not wrong but slightly different than the framework's intent, so want to avoid misalignment there. Agentic frameworks in general is a super live space so I appreciate the chance to dig in and try to capture some of the philosophy here and welcome the discussion - I actually owe a blog post on why we take this approach.
separately I've already wired up everything to exclude the completion tool calls but there is one little UX thing I'm trying to work out -- we start streaming the tool call to the terminal as soon as the agent starts to make it, but we don't find out its a completion tool until it writes the whole name (e.g. at some point it's just streaming
tool_calls: [{
and nothing more, only when it says
tool_calls: [{tool_1: <args>}, {mark_task_123_successfull:
do we have enough info to actually exclude the latter call). I don't want to show the beginning of the tool call and then just hide it so I might try to just show the fact that it marked the task complete or failed without rendering it as a full tool call. It'll take a little more thought than i expected because I haven't looked at the formatter code in a while so it may not be in today's release, but will try to get in the next one
c
To give a conceptual idea, could session-level request metadata be used as a discriminator , or is it something too out of reach with the current design?
We probably want to know it's a system tool before streaming even starts, hence why I believe it belongs at session layer
Copy code
def is_system_tool(
        self,
        event: AgentMessage | AgentMessageDelta | ToolCallEvent | ToolResultEvent,
        chunk: ToolCallChunk | ToolResult | ToolCall | InvalidToolCall | int,
    ) -> bool:
        """Helper method to check if a tool name matches any registered agent tools"""

        if isinstance(chunk, int):
            idx = chunk
            if isinstance(event, (AgentMessageDelta)):
                tool_call = event.delta_message.tool_call_chunks[idx]
            elif isinstance(event, (AgentMessage)):
                tool_call = event.ai_message.tool_calls[idx]
            else:
                raise ValueError(
                    f"Can't parse tool call from Invalid event type: {type(event)}"
                )
        else:
            tool_call = chunk

        agent_tools = event.agent.tools

        return not any(tool_call["name"] == tool.name for tool in agent_tools)
For now I have come up with this, the idea is simple: if the agent does not have that tool in its
tools
property, I identify it as a system tool
👍 1