Skip to content

MCP Server Overview

What is MCP?

Model Context Protocol (MCP) is Solatis's centralized AI orchestration layer that manages interactions with multiple AI models, handles context, and optimizes costs.

Architecture

┌────────────────────────────────────────┐
│         Applications                   │
│  (Web, Mobile, API Clients)           │
└─────────────┬──────────────────────────┘

┌─────────────▼──────────────────────────┐
│         MCP Server                      │
│  - Request routing                      │
│  - Context management                   │
│  - Model selection                      │
│  - Cost optimization                    │
│  - Rate limiting                        │
└─────────────┬──────────────────────────┘

       ┌──────┴──────┬──────────┐
       │             │          │
┌──────▼─────┐ ┌────▼────┐ ┌──▼──────┐
│  OpenAI    │ │ Anthropic│ │  Local  │
│  GPT-4     │ │  Claude  │ │  Models │
└────────────┘ └──────────┘ └─────────┘

Key Features

Model Routing

Intelligent Selection:

typescript
// MCP automatically selects best model
const response = await mcp.complete({
  prompt: "Summarize this document",
  context: documentText,
  // MCP chooses: GPT-4 for quality, Claude for long docs, etc.
});

Routing Logic:

Short query (< 500 tokens) → GPT-4-mini (fast, cheap)
Long document (> 8K tokens) → Claude (large context)
Code generation → GPT-4 (best for code)
Creative writing → Claude (nuanced)
Cost-sensitive → GPT-3.5 (cheapest)

Context Management

Automatic Context Window:

typescript
const conversation = mcp.createConversation({
  maxTokens: 128000,  // Claude's limit
  documents: [doc1, doc2, doc3],
  history: previousMessages
});

// MCP manages context automatically
await conversation.ask("What are the key points?");
await conversation.ask("Compare doc1 and doc2");
// Context preserved across calls

Context Strategies:

  • Sliding window (keep recent)
  • Summarization (compress old)
  • Hierarchical (detailed recent, summarized old)
  • Semantic selection (keep relevant)

Cost Optimization

Automatic Optimization:

typescript
// Dev/test: Use cheaper models
if (env === 'development') {
  mcp.setDefaultModel('gpt-3.5-turbo');
}

// Production: Balance cost and quality
mcp.setStrategy({
  model: 'adaptive',  // Auto-select
  maxCostPerRequest: 0.10,  // Budget limit
  fallback: 'gpt-3.5-turbo'  // If over budget
});

Cost Tracking:

Request ID: req_abc123
Model: gpt-4
Input tokens: 1,247
Output tokens: 456
Cost: $0.0523

Monthly Usage:
Total requests: 15,234
Total cost: $156.78
Average: $0.0103/request

Rate Limiting

Per-Model Limits:

OpenAI GPT-4: 10,000 TPM (tokens per minute)
Anthropic Claude: 100,000 TPM
Local models: Unlimited

MCP queues overflow requests
Retry with exponential backoff
Fallback to alternative models

API Examples

Basic Usage

typescript
import { MCPClient } from '@solatis/mcp';

const mcp = new MCPClient({
  apiKey: process.env.SOLATIS_API_KEY
});

// Simple completion
const response = await mcp.complete({
  prompt: "Explain quantum computing",
  maxTokens: 500
});

console.log(response.text);

With Context

typescript
// Chat with documents
const chat = await mcp.chat({
  messages: [
    { role: 'user', content: 'What are the revenue numbers?' }
  ],
  context: {
    documents: ['q3-report-uuid'],
    workspace: 'finance-workspace'
  }
});

Streaming

typescript
// Stream response for better UX
const stream = await mcp.complete({
  prompt: "Write a long article about AI",
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

Batch Processing

typescript
// Process multiple items efficiently
const results = await mcp.batch({
  requests: documents.map(doc => ({
    operation: 'summarize',
    input: doc.content,
    options: { maxLength: 200 }
  })),
  parallelism: 5  // Process 5 at a time
});

Configuration

Server Setup

yaml
# mcp.config.yaml
server:
  port: 8080
  host: 0.0.0.0
  
models:
  - name: gpt-4
    provider: openai
    apiKey: ${OPENAI_API_KEY}
    rateLimit: 10000  # TPM
    priority: high
    
  - name: claude-3-opus
    provider: anthropic
    apiKey: ${ANTHROPIC_API_KEY}
    rateLimit: 100000
    priority: medium
    
  - name: gpt-3.5-turbo
    provider: openai
    apiKey: ${OPENAI_API_KEY}
    rateLimit: 90000
    priority: low
    costMultiplier: 0.1  # 10x cheaper than GPT-4
    
routing:
  strategy: adaptive
  factors:
    - quality: 0.5
    - cost: 0.3
    - latency: 0.2
    
cache:
  enabled: true
  ttl: 3600  # seconds
  maxSize: 1GB
  
monitoring:
  enabled: true
  metrics: prometheus
  logging: structured

Monitoring

Metrics

typescript
// Request metrics
{
  requestId: 'req_123',
  model: 'gpt-4',
  latency: 1234,  // ms
  tokens: {
    input: 500,
    output: 200,
    total: 700
  },
  cost: 0.035,
  cache: 'miss',
  status: 'success'
}

// Aggregate metrics
{
  period: '1h',
  requests: 156,
  avgLatency: 892,
  p95Latency: 1567,
  totalCost: 5.45,
  cacheHitRate: 0.23,
  errorRate: 0.02
}

Dashboard

Key Metrics:

  • Requests per second
  • Average latency
  • Cost per hour/day/month
  • Model distribution
  • Error rate
  • Cache hit rate
  • Queue depth

Best Practices

Efficiency:

  • Cache frequent queries
  • Batch similar requests
  • Use cheaper models for drafts
  • Stream for long responses
  • Set appropriate timeouts

Reliability:

  • Implement retry logic
  • Handle rate limits gracefully
  • Use fallback models
  • Monitor error rates
  • Log all requests

Cost Control:

  • Set budget limits
  • Monitor usage daily
  • Use tiered models
  • Optimize prompts
  • Cache aggressively

Troubleshooting

High Costs:

  • Review usage patterns
  • Optimize prompts (shorter)
  • Use cheaper models
  • Increase caching
  • Set budget alerts

Slow Responses:

  • Check model selection
  • Review prompt length
  • Use streaming
  • Optimize context
  • Increase parallelism

Rate Limits:

  • Implement queuing
  • Add retry logic
  • Use multiple API keys
  • Request limit increase
  • Fallback to alternatives

Next Steps


Last Updated: October 11, 2025

Released under the MIT License.