This doesn t look right to me ```import controlflow tools co Prefect Community #marvin-ai

This doesn't look right to me.... ```import cont...

Constantin Teo

10/30/2024, 3:24 AM

This doesn't look right to me....

Copy code

import controlflow.tools.code as tc
agent = cf.Agent(tools=[tc.shell, tc.python])

await cf.run_async(
    objective="What tools can you invoke?",
    agent=agent,
    handlers=...
)

Result

Copy code

I can invoke the functions for marking tasks successful or failed, specifically `mark_task_0fd0eb0b_successful` and `mark_task_0fd0eb0b_failed`, as well as use the `multi_tool_use.parallel` function to execute tools in parallel.

🤔

Constantin Teo

10/30/2024, 3:26 AM

I can understand what's going on though. It manages prefect tasks under the hood, and they're passed as tools to the Agent. However, we do need some sort of separation of concerns between what I would call "system tools" and "user-agent tools"

Constantin Teo

10/30/2024, 3:26 AM

Am I misusing controlflow here? @Jeremiah

Constantin Teo

10/30/2024, 3:32 AM

Instead of parametrizing

cf.run_async

with an

Agent

, and populating the

tools

argument directly, I do get the user defined tools in the output:

Copy code

1. shell - to execute shell commands.
2. python - to execute Python code locally.
3. mark_task_f86d644d_successful - to mark task ID f86d644d as successful.
4. mark_task_f86d644d_failed - to mark task ID f86d644d as failed.

However system tools are still showing up. Ain't looking well, I presume this could be hard to address

Jeremiah

10/30/2024, 2:02 PM

multi_tool_use.parallel is a (not common but not uncommon) OpenAI hallucination

Jeremiah

10/30/2024, 2:02 PM

I think we have special system instructions that tell it not to attempt to to use it but might have deleted bc it was ineffective

Jeremiah

10/30/2024, 2:03 PM

ah ok i see

Jeremiah

10/30/2024, 2:03 PM

pass

agents=[agent]

not singular

Jeremiah

10/30/2024, 2:05 PM

^ agents will work because its passed to the task directly; I consider

agent

not working to be a bug because it should and will get a new release out today to fix it. The kwarg just isn't making it to the task

Jeremiah

10/30/2024, 2:05 PM

i'll double check though

Jeremiah

10/30/2024, 2:06 PM

image.png

Constantin Teo

10/30/2024, 4:44 PM

You managed to reproduce the background issue I was trying to point out: you're invoking an agent expecting a generation, yet, the feedback that you've got is that there's been a tool call - namingly

mark_task_xyz_successful

, which prefect uses to notify task completion. I get it - it does the job in managing prefect API comms. This also comes with the drawback of everything going through

cf.run

, the simplest "say hello" agent interaction, coming out with tool calls, and users that haven't been there get confused quickly. When invoking an agent, we expect to receive feedback upon tools invocation from those we provided as arguments in

tools=[..]

, aka the user-defined agent tools. On the other hand Prefect tools belong to a "system" domain which should not be displayed to the user, from an ideological separation of concerns. One purposeful brainstorming initiative is to ask whether that entity is meant to be consumed by the user, or rather kept internal, hidden. If you would ask me, I can tell straightaway I wouldn't mind applying a facade process to filter out the

mark_task_*

glob. Though we could likely all agree this is a patch on the wound. What would make sense is if this [pre]filter is managed by controlflow's stream process instead. It's an interesting challenge and I'd be keen to engage in its developments, especially to understand if there are other complex solutions worth exploring (imagine forked sessions). Let me know what's on your mind

Jeremiah

10/31/2024, 12:45 PM

Hey @Constantin Teo -- For your pragmatic question, tools support a metadata field and we can put something like

is_completion_tool=True

or something similar there to facilitate filtering out those events. We used to have that, I see it's no longer in the codebase, simple to add. In general though I think you're thinking of agents as "things you chat with" and ControlFlow will be a far more powerful framework if you think of them as "things you delegate to". ControlFlow's core abstraction is a task (note: not a Prefect task. CF would work exactly the same, including mark_successful tools, without Prefect). This is because the problem the framework solves is how to move between the structured world of a script and the unstructured world of agentic behavior. The task becomes a contract for determining control at any time. As a user, you create tasks to represent concrete outcomes you require. When you

.run()

the task, control is yielded to the agentic loop. The agent can do whatever it wants, with whatever tools it has, but ultimate needs to use a

mark_task

tool to indicate that it is finished and return a result. This satisfies the task contract, and control is returned to the workflow script. Managing that back and forth of control is the core gameplay loop, so to speak, of ControlFlow. That's why the "system" tools are actually of primary interest to you as the developer, and the "agent" tools are actually secondary from the framework's perspective. Granted that's a slightly exaggerated characterization, they are obviously critically important as well for understanding behavior, but the task's status is the thing the CF world revolves around. Now, if you want to use CF to build a chatbot that is a perfectly valid approach. @Marvin in this Slack is an example of such a bot! But the small mental model shift is that you're not asking the agent for a generation; you're giving the agent a task of generating a response to an external user. The agent is explicitly told that it is interacting with an orchestrator, not an external user, and so you should consider the CF developer as the primary person for whom information is intedended, not the person who thinks they are chatting directly with the AI. That's why cf.run("Say hi") involving a tool shouldn't come as a surprise: under the hood, we're creating a contract for an agent to fulfil, and its using a structured tool instead of a message to do it. The fact that it's also possible to satisfy that without a task-centric agent framework (e.g. just a single chat completion message) shouldn't dictate expectations for the entire framework. I realize this is a long explanation for what may seem like a very small semantic adjustment but it informs a lot of the design choices in the framework ("code first, chat second"). tldr: agree, more control over outputs is good. Also your conception of who the primary "user" of CF is, and what information they need, is not wrong but slightly different than the framework's intent, so want to avoid misalignment there. Agentic frameworks in general is a super live space so I appreciate the chance to dig in and try to capture some of the philosophy here and welcome the discussion - I actually owe a blog post on why we take this approach.

Jeremiah

10/31/2024, 3:26 PM

separately I've already wired up everything to exclude the completion tool calls but there is one little UX thing I'm trying to work out -- we start streaming the tool call to the terminal as soon as the agent starts to make it, but we don't find out its a completion tool until it writes the whole name (e.g. at some point it's just streaming

tool_calls: [{

and nothing more, only when it says

tool_calls: [{tool_1: <args>}, {mark_task_123_successfull:

do we have enough info to actually exclude the latter call). I don't want to show the beginning of the tool call and then just hide it so I might try to just show the fact that it marked the task complete or failed without rendering it as a full tool call. It'll take a little more thought than i expected because I haven't looked at the formatter code in a while so it may not be in today's release, but will try to get in the next one

Constantin Teo

11/01/2024, 7:32 PM

To give a conceptual idea, could session-level request metadata be used as a discriminator , or is it something too out of reach with the current design?

Constantin Teo

11/01/2024, 7:33 PM

We probably want to know it's a system tool before streaming even starts, hence why I believe it belongs at session layer

Constantin Teo

11/03/2024, 12:47 PM

Copy code

def is_system_tool(
        self,
        event: AgentMessage | AgentMessageDelta | ToolCallEvent | ToolResultEvent,
        chunk: ToolCallChunk | ToolResult | ToolCall | InvalidToolCall | int,
    ) -> bool:
        """Helper method to check if a tool name matches any registered agent tools"""

        if isinstance(chunk, int):
            idx = chunk
            if isinstance(event, (AgentMessageDelta)):
                tool_call = event.delta_message.tool_call_chunks[idx]
            elif isinstance(event, (AgentMessage)):
                tool_call = event.ai_message.tool_calls[idx]
            else:
                raise ValueError(
                    f"Can't parse tool call from Invalid event type: {type(event)}"
                )
        else:
            tool_call = chunk

        agent_tools = event.agent.tools

        return not any(tool_call["name"] == tool.name for tool in agent_tools)

For now I have come up with this, the idea is simple: if the agent does not have that tool in its

tools

property, I identify it as a system tool

👍 1

6 Views

Open in Slack

Previous Next