Skip to main content

What is Semantic Router?

Semantic Router is an intelligent routing layer that dynamically selects the most suitable language model for each query based on multiple signals extracted from the request.

The Problem​

Traditional LLM deployments use a single model for all tasks:

User Query → Single LLM → Response

Problems:

  • High cost for simple queries
  • Suboptimal performance for specialized tasks
  • No security or compliance controls
  • Poor resource utilization

The Solution​

Semantic Router uses signal-driven decision making to route queries intelligently:

User Query → Signal Extraction → Decision Engine → Best Model → Response

Benefits:

  • Cost-effective routing (use smaller models for simple tasks)
  • Better quality (use specialized models for their strengths)
  • Built-in security (jailbreak detection, PII filtering)
  • Flexible and extensible (plugin architecture)

How It Works​

1. Signal Extraction​

The router extracts multiple types of signals from each request:

Signal TypeWhat It DetectsExample
keywordSpecific terms and patterns"calculate", "prove", "debug"
embeddingSemantic meaningMath intent, code intent, creative intent
domainKnowledge domainMathematics, computer science, history
fact_checkNeed for verificationFactual claims, medical advice
user_feedbackUser satisfaction"That's wrong", "try again"
preferenceRoute preferenceComplex intent matching

2. Decision Making​

Signals are combined using logical rules to make routing decisions:

decisions:
- name: math_routing
rules:
operator: "AND"
conditions:
- type: "keyword"
name: "math_keywords"
- type: "domain"
name: "mathematics"
modelRefs:
- model: qwen-math
weight: 1.0

How it works: If the query contains math keywords AND is classified as mathematics domain, route to the math model.

3. Model Selection​

Based on the decision, the router selects the best model:

  • Math queries → Math-specialized model (e.g., Qwen-Math)
  • Code queries → Code-specialized model (e.g., DeepSeek-Coder)
  • Creative queries → Creative model (e.g., Claude)
  • Simple queries → Lightweight model (e.g., Llama-3-8B)

4. Plugin Chain​

Before and after model execution, plugins process the request/response:

plugins:
- type: "semantic-cache" # Check cache first
- type: "jailbreak" # Detect adversarial prompts
- type: "pii" # Filter sensitive data
- type: "system_prompt" # Add context
- type: "hallucination" # Verify facts

Key Concepts​

Mixture of Models (MoM)​

Unlike Mixture of Experts (MoE) which operates within a single model, Mixture of Models operates at the system level:

AspectMixture of Experts (MoE)Mixture of Models (MoM)
ScopeWithin a single modelAcross multiple models
RoutingInternal gating networkExternal semantic router
ModelsShared architectureIndependent models
FlexibilityFixed at training timeDynamic at runtime
Use CaseModel efficiencySystem intelligence

Signal-Driven Decisions​

Traditional routing uses simple rules:

# Traditional: Simple keyword matching
if "math" in query:
route_to_math_model()

Signal-driven routing uses multiple signals:

# Signal-driven: Multiple signals combined
if (has_math_keywords AND is_math_domain) OR has_high_math_embedding:
route_to_math_model()

Benefits:

  • More accurate routing
  • Handles edge cases better
  • Adapts to context
  • Reduces false positives

Real-World Example​

User Query: "Prove that the square root of 2 is irrational"

Signal Extraction:

  • keyword: ["prove", "square root", "irrational"] ✓
  • embedding: 0.89 similarity to math queries ✓
  • domain: "mathematics" ✓

Decision: Route to qwen-math (all math signals agree)

Plugins Applied:

  • semantic-cache: Cache miss, proceed
  • jailbreak: No adversarial patterns
  • system_prompt: Added "Provide rigorous mathematical proof"
  • hallucination: Enabled for verification

Result: High-quality mathematical proof from specialized model

Next Steps​