Skip to content

opik

Repository: comet-ml/opikDescription: Debug, evaluate, and monitor LLM applications, RAG systems, and agentic workflows. Key Features:

  • Comprehensive Tracing: Log and visualize every step of your LLM pipeline.
  • Automated Evaluations: Includes "LLM as a judge" metrics like Hallucination and Relevance.
  • Production Monitoring: Dashboards for tracking performance and accuracy in real-time.
  • Dataset Management: Manage and version test datasets for systematic experimentation.
  • Broad Integrations: Supports LangChain, LlamaIndex, OpenAI, Anthropic, and more.

Primary Use Cases:

  • Debugging complex multi-step agentic workflows.
  • Benchmarking LLM application accuracy before deployment.
  • Monitoring production systems for regressions or hallucinations.

Tags: #observability #evaluation #llmops #monitoring Added: 2026-06-18 Source: GitHub

Notes / Why Notable

Opik (by Comet ML) provides the necessary "LLMOps" infrastructure to move from "vibe coding" to systematic engineering and monitoring of AI applications.

Maintained with Yeda — Karpathy LLM Wiki paradigm.