<@ULVA73B9P> can you recommend an open source moni...
# marvin-ai
d
@Marvin can you recommend an open source monitor or tracing project for Marvin. Something like langsmith or langfuse
m
thought for 48.0 seconds
Great question! If you’re looking for open-source monitoring/tracing for LLM apps (à la LangSmith/Langfuse), here are solid options with different strengths: - Langfuse (OSS) - What it’s good for: LangSmith-style traces/spans, prompt/version management, evals, PII redaction, sampling. Great DX and fast to get value. - Stack: Python/JS SDKs; works with OpenAI, LangChain, LlamaIndex, Vercel AI SDK, etc. - Links: SiteGitHubDocs - OpenLLMetry (Traceloop) + OpenTelemetry backend (Jaeger/SigNoz/Grafana Tempo) - What it’s good for: Vendor-neutral tracing using OTel semantic conventions for LLMs. Send traces anywhere OTel is supported and correlate with your broader app/APM traces. - Pair with: Jaeger (traces), SigNoz (traces+metrics+logs, OSS APM), or Grafana Tempo. - Links: OpenLLMetryOpenTelemetryJaegerSigNozGrafana Tempo • OTel LLM conventions: spec - Arize Phoenix (OSS) - What it’s good for: Deep analysis and evaluation of LLM/RAG systems, datasets, embeddings, and traces. Excels at post-hoc debugging, evaluation, and exploration. - Links: SiteGitHubDocs - Helicone (OSS) - What it’s good for: Drop-in proxy for OpenAI-compatible APIs to capture requests/responses, costs, and latency with minimal code changes. Simpler than full tracing but very practical. - Links: SiteGitHubDocs - Braintrust (OSS) - What it’s good for: Experimentation/evaluation datasets, regression testing, and logging. Less “trace tree” oriented, more eval/experimentation-first. - Links: SiteGitHubDocs - TruLens (OSS) - What it’s good for: Quality-focused feedback functions and evals with a simple UI; can complement other tracing tools. - Links: SiteGitHubDocs Quick guidance on choosing: - Want “LangSmith-like,” self-hostable: choose Langfuse. - Want to standardize on OpenTelemetry and view traces alongside your app/APM: OpenLLMetry + Jaeger/SigNoz. - Want deep debugging/evaluation for RAG and model quality: Phoenix. - Want quick logging/costs with minimal code changes: Helicone. - Want experimentation/evals workflow: Braintrust (and/or TruLens). If you’re orchestrating with Prefect, an easy win is to: - Add request/trace IDs as Prefect run tags/labels and log the Prefect flow/task run IDs to your tracing system so you can click back and forth between Prefect runs and LLM traces. - For OTel, propagate trace context in your Prefect tasks so LLM spans appear under the same trace as your workflow logic. A couple of questions to tailor a recommendation: - Do you need to self-host, or is a managed option OK? - What stack are you using (plain OpenAI client, Marvin library, LangChain/LlamaIndex, JS vs Python)? - Are you prioritizing live production monitoring, cost tracking, and alerts, or deeper offline evaluation/debugging?
d
@Marvin instructions on how to setup langfuse with marvin
m
thought for 214.9 seconds
Awesome—here’s a minimal, reliable way to get Langfuse tracing on Marvin’s LLM calls. The quickest path is to auto-instrument OpenAI via Langfuse’s OpenAI patch and (optionally) add a lightweight decorator to create a nice top-level trace for your Marvin run. Links to keep handy: - Langfuse: SiteGitHubDocs - Self-hosting: Self-host guide Step-by-step 1) Deploy Langfuse and get keys - Use Langfuse Cloud or self-host (guide). - Create a project and copy your public/secret keys from the UI. 2) Install dependencies
Copy code
pip install marvin langfuse openai
3) Configure environment variables - For Langfuse Cloud:
Copy code
export LANGFUSE_PUBLIC_KEY=your_public_key
export LANGFUSE_SECRET_KEY=your_secret_key
export LANGFUSE_HOST=<https://cloud.langfuse.com>
- For self-hosted, set
LANGFUSE_HOST
to your instance URL. 4) Auto-instrument OpenAI (captures Marvin’s model calls) Add this as early as possible in your app startup—before any Marvin calls:
Copy code
from langfuse.openai import register_tracing

# Patches the OpenAI client so all OpenAI calls are traced to Langfuse
register_tracing()
5) Optional: wrap your Marvin entrypoint with a Langfuse trace This creates a clean top-level trace and will include nested spans from the OpenAI auto-instrumentation.
Copy code
from langfuse import Langfuse
from marvin import generate  # or Agent, say, run, etc.

langfuse = Langfuse()

@langfuse.observe(name="marvin.generate-demo")
def run_marvin():
    return generate("Write a short haiku about Langfuse + Marvin.")

if __name__ == "__main__":
    print(run_marvin())
That’s it. Now when Marvin uses OpenAI under the hood, those requests/responses, latencies, and costs will appear in Langfuse. The decorator gives you a top-level trace around the Marvin call; you can add more
@langfuse.observe(name="...")
on other functions if you want additional spans. Optional: correlate with Prefect runs (if you’re orchestrating with Prefect 3.x) - Pull the Prefect run ID and add it to your observation name so you can search/link in Langfuse:
Copy code
from prefect.runtime import flow_run
from langfuse import Langfuse

langfuse = Langfuse()

@langfuse.observe(name=f"marvin.flow_run:{flow_run.id()}")
def run_marvin_job():
    ...
- For structured correlation (e.g., metadata/tags on the trace), see the Langfuse docs for adding metadata to observations: Docs. Notes and tips -
register_tracing()
has no parameters and instruments the official OpenAI SDK. If you’re using non-OpenAI providers in Marvin, you’ll still get the decorator-level trace, but not auto-captured LLM details—add more
@observe
or consult the Langfuse docs for manual spans/events. - Place
register_tracing()
as early as possible (before any OpenAI client instantiation). - If you run in containers/workers, ensure the LANGFUSE_* env vars are set in that environment too. Want me to tailor the example to your specific Marvin usage (e.g.,
Agent
, tools, streaming), or to your deployment setup (Prefect deployments, Docker, K8s)?