<@ULVA73B9P> are there any examples of multimodal ...
# marvin-ai
m
@Marvin are there any examples of multimodal llm usage with controlflow? I can see that under the hood prompts are being structured as langchain message objects but this seems very abstracted
m
ControlFlow provides examples of using multiple large language models (LLMs) within a single workflow. This approach is useful for leveraging different models for specific tasks, optimizing for both speed and quality. Here's a brief overview: 1. Multiple LLM Models: For efficient classification tasks, the example uses GPT-4o-mini models, while GPT-4o is used for more complex synthesis tasks like generating comprehensive summaries. 2. Specialized Agents: Separate agents are created for different tasks, each utilizing its own LLM model. This setup ensures that the workflow is both efficient and high-quality. 3. Structured Data: Pydantic models are used to ensure type safety and consistent data structures throughout the workflow. 4. Task-Specific Result Types: Each task has a specific result type that aligns with the expected output, ensuring correct information is provided by the agents. For more details, you can explore the Multi-LLM Workflows example. Additionally, ControlFlow supports any LangChain LLM that supports chat and function calling, and it uses provider-specific rules to compile messages properly for each agent's API. More information can be found in the ControlFlow LLMs guide. No relevant GitHub issues were found, which suggests that the examples and documentation should be quite robust for your needs.
m
@Marvin i think you misunderstood… MultiModal ie VLM .. as in Image-text-to-text
m
It seems that the documentation for ControlFlow does not provide specific examples or guidance on using multimodal models, such as vision-language models (VLMs) for image-text-to-text tasks. The focus is primarily on text-based LLMs and their applications within workflows. For multimodal capabilities, you might need to look into integrating other libraries or frameworks that support vision-language tasks alongside ControlFlow. This could involve using specialized models and tools for handling images and then integrating their outputs into a ControlFlow workflow for further processing or decision-making. If you're interested in more specific guidance or examples, you might consider reaching out to the ControlFlow community or support channels for insights on multimodal applications.
m
@Marvin do you have a link to these channels?
m
Unfortunately, I couldn't locate specific links to ControlFlow community support or forums. However, you might want to explore the ControlFlow website for any contact information or support channels they might offer. Typically, such platforms have a "Contact Us" or "Support" section where you can reach out for more detailed inquiries.
n
hi @Mike Lev I'd use marvin at this point, not controlflow https://github.com/PrefectHQ/marvin