MCP Server Overview
What is MCP?
Model Context Protocol (MCP) is Solatis's centralized AI orchestration layer that manages interactions with multiple AI models, handles context, and optimizes costs.
Architecture
┌────────────────────────────────────────┐
│ Applications │
│ (Web, Mobile, API Clients) │
└─────────────┬──────────────────────────┘
│
┌─────────────▼──────────────────────────┐
│ MCP Server │
│ - Request routing │
│ - Context management │
│ - Model selection │
│ - Cost optimization │
│ - Rate limiting │
└─────────────┬──────────────────────────┘
│
┌──────┴──────┬──────────┐
│ │ │
┌──────▼─────┐ ┌────▼────┐ ┌──▼──────┐
│ OpenAI │ │ Anthropic│ │ Local │
│ GPT-4 │ │ Claude │ │ Models │
└────────────┘ └──────────┘ └─────────┘Key Features
Model Routing
Intelligent Selection:
typescript
// MCP automatically selects best model
const response = await mcp.complete({
prompt: "Summarize this document",
context: documentText,
// MCP chooses: GPT-4 for quality, Claude for long docs, etc.
});Routing Logic:
Short query (< 500 tokens) → GPT-4-mini (fast, cheap)
Long document (> 8K tokens) → Claude (large context)
Code generation → GPT-4 (best for code)
Creative writing → Claude (nuanced)
Cost-sensitive → GPT-3.5 (cheapest)Context Management
Automatic Context Window:
typescript
const conversation = mcp.createConversation({
maxTokens: 128000, // Claude's limit
documents: [doc1, doc2, doc3],
history: previousMessages
});
// MCP manages context automatically
await conversation.ask("What are the key points?");
await conversation.ask("Compare doc1 and doc2");
// Context preserved across callsContext Strategies:
- Sliding window (keep recent)
- Summarization (compress old)
- Hierarchical (detailed recent, summarized old)
- Semantic selection (keep relevant)
Cost Optimization
Automatic Optimization:
typescript
// Dev/test: Use cheaper models
if (env === 'development') {
mcp.setDefaultModel('gpt-3.5-turbo');
}
// Production: Balance cost and quality
mcp.setStrategy({
model: 'adaptive', // Auto-select
maxCostPerRequest: 0.10, // Budget limit
fallback: 'gpt-3.5-turbo' // If over budget
});Cost Tracking:
Request ID: req_abc123
Model: gpt-4
Input tokens: 1,247
Output tokens: 456
Cost: $0.0523
Monthly Usage:
Total requests: 15,234
Total cost: $156.78
Average: $0.0103/requestRate Limiting
Per-Model Limits:
OpenAI GPT-4: 10,000 TPM (tokens per minute)
Anthropic Claude: 100,000 TPM
Local models: Unlimited
MCP queues overflow requests
Retry with exponential backoff
Fallback to alternative modelsAPI Examples
Basic Usage
typescript
import { MCPClient } from '@solatis/mcp';
const mcp = new MCPClient({
apiKey: process.env.SOLATIS_API_KEY
});
// Simple completion
const response = await mcp.complete({
prompt: "Explain quantum computing",
maxTokens: 500
});
console.log(response.text);With Context
typescript
// Chat with documents
const chat = await mcp.chat({
messages: [
{ role: 'user', content: 'What are the revenue numbers?' }
],
context: {
documents: ['q3-report-uuid'],
workspace: 'finance-workspace'
}
});Streaming
typescript
// Stream response for better UX
const stream = await mcp.complete({
prompt: "Write a long article about AI",
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.text);
}Batch Processing
typescript
// Process multiple items efficiently
const results = await mcp.batch({
requests: documents.map(doc => ({
operation: 'summarize',
input: doc.content,
options: { maxLength: 200 }
})),
parallelism: 5 // Process 5 at a time
});Configuration
Server Setup
yaml
# mcp.config.yaml
server:
port: 8080
host: 0.0.0.0
models:
- name: gpt-4
provider: openai
apiKey: ${OPENAI_API_KEY}
rateLimit: 10000 # TPM
priority: high
- name: claude-3-opus
provider: anthropic
apiKey: ${ANTHROPIC_API_KEY}
rateLimit: 100000
priority: medium
- name: gpt-3.5-turbo
provider: openai
apiKey: ${OPENAI_API_KEY}
rateLimit: 90000
priority: low
costMultiplier: 0.1 # 10x cheaper than GPT-4
routing:
strategy: adaptive
factors:
- quality: 0.5
- cost: 0.3
- latency: 0.2
cache:
enabled: true
ttl: 3600 # seconds
maxSize: 1GB
monitoring:
enabled: true
metrics: prometheus
logging: structuredMonitoring
Metrics
typescript
// Request metrics
{
requestId: 'req_123',
model: 'gpt-4',
latency: 1234, // ms
tokens: {
input: 500,
output: 200,
total: 700
},
cost: 0.035,
cache: 'miss',
status: 'success'
}
// Aggregate metrics
{
period: '1h',
requests: 156,
avgLatency: 892,
p95Latency: 1567,
totalCost: 5.45,
cacheHitRate: 0.23,
errorRate: 0.02
}Dashboard
Key Metrics:
- Requests per second
- Average latency
- Cost per hour/day/month
- Model distribution
- Error rate
- Cache hit rate
- Queue depth
Best Practices
Efficiency:
- Cache frequent queries
- Batch similar requests
- Use cheaper models for drafts
- Stream for long responses
- Set appropriate timeouts
Reliability:
- Implement retry logic
- Handle rate limits gracefully
- Use fallback models
- Monitor error rates
- Log all requests
Cost Control:
- Set budget limits
- Monitor usage daily
- Use tiered models
- Optimize prompts
- Cache aggressively
Troubleshooting
High Costs:
- Review usage patterns
- Optimize prompts (shorter)
- Use cheaper models
- Increase caching
- Set budget alerts
Slow Responses:
- Check model selection
- Review prompt length
- Use streaming
- Optimize context
- Increase parallelism
Rate Limits:
- Implement queuing
- Add retry logic
- Use multiple API keys
- Request limit increase
- Fallback to alternatives
Next Steps
- API Authentication - API security
- Data Flow - System architecture
- Building Agents - AI automation
Last Updated: October 11, 2025