The LLM Observatory

Trace every call. Score every agent. Guard every output.

Open-source observability built for the agentic era. Native MCP tracing, real-time drift detection, and composable guardrails — in one SDK.

Get Started Free View on GitHub

instrument.ts

MCP-native tracingAgent reliability scoringReal-time drift detectionComposable guardrailsHallucination detectionSelf-hosted in 60s22 model cost trackingSIEM exportMIT LicensedMCP-native tracingAgent reliability scoringReal-time drift detectionComposable guardrailsHallucination detectionSelf-hosted in 60s22 model cost trackingSIEM exportMIT Licensed

Observability

See what your agents actually do

Native MCP Tracing

The only platform that traces MCP tool calls as first-class spans. 4-span OTel hierarchy with W3C traceparent propagation across server boundaries.

Multi-Provider Instrumentation

One line to instrument Anthropic, OpenAI, Vercel AI SDK. Every call traced with latency, tokens, cost, and hallucination scores.

Trace Waterfall

Flame chart visualization of your entire agent workflow. Click any span to inspect prompts, completions, and tool inputs/outputs.

instrument.ts

import { Vigil } from '@vigil/sdk';

const vigil = new Vigil({

apiKey: 'vgl_...',

projectId: 'proj_...'

});

vigil.instrumentOpenAI(openai);

vigil.instrumentAnthropic(anthropic);

vigil.instrumentMCP(mcpClient);

// Every call now traced with latency,

// tokens, cost, and hallucination scores

vigil score --agent checkout-bot

> Agent Reliability Report

────────────────────────────────────────

Agent: checkout-bot

Period: Last 24h

Traces: 12,847

Reliability Score 0.94

────────────────────────────────────────

Hallucination rate: 0.02

Error rate: 0.01

Tool success: 0.98

Latency health: 0.87

! Drift alert: latency +2.1 sigma (p95: 3.2s)

Reliability

Know when your agents break

Agent Reliability Score

Composite 0-1 metric combining hallucination rate, error rate, tool success rate, and latency health. Track it in CI. Alert when it drops.

Real-Time Drift Detection

EWMA + z-score anomaly detection on latency, cost, quality, and errors. Alerts you before dashboards do.

Self-Consistency Checking

Detects internal contradictions in LLM outputs — negation, numeric, temporal, entity, and sentiment conflicts.

Guardrails

Stop bad outputs before they ship

Composable Pipeline

Chain safety checks declaratively: injection detection, PII redaction, topic filtering, output validation.

Prompt Injection Detection

28 pattern categories across system overrides, role manipulation, encoding tricks, and jailbreak attempts. < 5ms per check.

Built-in PII Redaction

Hash, mask, or remove sensitive data before it hits your logs. 7 PII patterns with Web Crypto SHA-256.

guardrails.ts

import { pipeline } from '@vigil/guardrails';

const guard = pipeline([

detectInjection({

threshold: 0.85,

categories: 'all'

}),

redactPII({

mode: 'hash',

patterns: ['email', 'ssn', 'phone']

}),

filterTopics('medical', 'legal'),

validateOutput(schema)

]);

const result = await guard.run(output);

// { safe: true, redacted: 3, latency: '4ms' }

Comparison

Built different

Feature	Vigil	Langfuse	Arize	Helicone	LangSmith
MCP Tracing
Agent Scoring
Drift Detection
Guardrail Chains
Hallucination Detection			Paid
Self-hosted Binary		Docker	Docker
License	MIT	MIT	Apache	Source-avail	Proprietary

Get Started

Start tracing in 60 seconds

terminal

Get Started Free Read the Docs