The LLM Observatory

Trace every call. Score every agent. Guard every output.

Open-source observability built for the agentic era. Native MCP tracing, real-time drift detection, and composable guardrails — in one SDK.

instrument.ts
|
MCP-native tracingAgent reliability scoringReal-time drift detectionComposable guardrailsHallucination detectionSelf-hosted in 60s22 model cost trackingSIEM exportMIT LicensedMCP-native tracingAgent reliability scoringReal-time drift detectionComposable guardrailsHallucination detectionSelf-hosted in 60s22 model cost trackingSIEM exportMIT Licensed
Observability

See what your agents actually do

01

Native MCP Tracing

The only platform that traces MCP tool calls as first-class spans. 4-span OTel hierarchy with W3C traceparent propagation across server boundaries.

02

Multi-Provider Instrumentation

One line to instrument Anthropic, OpenAI, Vercel AI SDK. Every call traced with latency, tokens, cost, and hallucination scores.

03

Trace Waterfall

Flame chart visualization of your entire agent workflow. Click any span to inspect prompts, completions, and tool inputs/outputs.

instrument.ts
import { Vigil } from '@vigil/sdk';
const vigil = new Vigil({
apiKey: 'vgl_...',
projectId: 'proj_...'
});
vigil.instrumentOpenAI(openai);
vigil.instrumentAnthropic(anthropic);
vigil.instrumentMCP(mcpClient);
// Every call now traced with latency,
// tokens, cost, and hallucination scores
vigil score --agent checkout-bot
> Agent Reliability Report
────────────────────────────────────────
Agent: checkout-bot
Period: Last 24h
Traces: 12,847
Reliability Score 0.94
────────────────────────────────────────
Hallucination rate: 0.02
Error rate: 0.01
Tool success: 0.98
Latency health: 0.87
! Drift alert: latency +2.1 sigma (p95: 3.2s)
Reliability

Know when your agents break

01

Agent Reliability Score

Composite 0-1 metric combining hallucination rate, error rate, tool success rate, and latency health. Track it in CI. Alert when it drops.

02

Real-Time Drift Detection

EWMA + z-score anomaly detection on latency, cost, quality, and errors. Alerts you before dashboards do.

03

Self-Consistency Checking

Detects internal contradictions in LLM outputs — negation, numeric, temporal, entity, and sentiment conflicts.

Guardrails

Stop bad outputs before they ship

01

Composable Pipeline

Chain safety checks declaratively: injection detection, PII redaction, topic filtering, output validation.

02

Prompt Injection Detection

28 pattern categories across system overrides, role manipulation, encoding tricks, and jailbreak attempts. < 5ms per check.

03

Built-in PII Redaction

Hash, mask, or remove sensitive data before it hits your logs. 7 PII patterns with Web Crypto SHA-256.

guardrails.ts
import { pipeline } from '@vigil/guardrails';
const guard = pipeline([
detectInjection({
threshold: 0.85,
categories: 'all'
}),
redactPII({
mode: 'hash',
patterns: ['email', 'ssn', 'phone']
}),
filterTopics('medical', 'legal'),
validateOutput(schema)
]);
const result = await guard.run(output);
// { safe: true, redacted: 3, latency: '4ms' }
Comparison

Built different

FeatureVigilLangfuseArizeHeliconeLangSmith
MCP Tracing
Agent Scoring
Drift Detection
Guardrail Chains
Hallucination DetectionPaid
Self-hosted BinaryDockerDocker
LicenseMITMITApacheSource-availProprietary
Get Started

Start tracing in 60 seconds

terminal