Routing analyzer

npx @amit641/llmmeter-cli analyze --since 14d --include-untested

The analyzer looks at the last --since window and finds opportunities to save money:

Tested suggestions. Groups calls by (feature, prompt_hash) and looks for clusters where the same prompt was actually run against multiple models. If the cheaper model succeeded reliably, it suggests routing the rest to it. High confidence — based on real, observed traffic.
Untested same-provider alternatives. For each (feature, model) pair, finds cheaper models in the same provider family that are operation-compatible (chat ↔ chat, embedding ↔ embedding) and would have cost less for the average request profile. Speculative — labelled as "A/B before switching".

Routing suggestions (window: 14d, 2 found)
--------------------------------------------------------------------------------

  support
    openai/gpt-4o  →  openai/gpt-4o-mini
    calls=1240  current=$0.012/call  candidate=$0.0008/call
    estimated savings: $13.91 over the window  (confidence 99.5%)
    reason: 213 historical calls handled the same prompt at 93% lower cost with 99.5% success.

Programmatic API:

import { analyzeRouting, suggestUntestedAlternatives } from "@amit641/llmmeter-cli";

const tested = await analyzeRouting({ storage });
const speculative = await suggestUntestedAlternatives({ storage });

Suggestions are heuristic — production teams should still A/B test before flipping a model in critical paths. But it surfaces the obvious wins ("80% of /support traffic could go to gpt-4o-mini") without any extra instrumentation.