June 20, 2025
LiteLLM and MCP: One Gateway to Rule All AI Models
Picture this: You’ve built a sophisticated AI tool integration, but your client suddenly wants to switch from OpenAI to Claude for cost reasons. Or maybe they need to use local models for sensitive data while using cloud models for general queries. Without proper abstraction, each change means rewriting your integration code. LiteLLM combined with the Model Context Protocol (MCP) transforms this nightmare into a simple configuration change.
This article demonstrates how LiteLLM’s universal LLM gateway integrates with MCP to create truly portable AI tool integrations. Whether you’re using OpenAI, Anthropic, AWS Bedrock, or local models through Ollama, your MCP tools work seamlessly across all of them.
About Our MCP Server: The Customer Service Assistant
Before exploring how to connect LiteLLM to MCP, it’s helpful to understand what we’re connecting to. In our comprehensive MCP guide, we built a complete customer service MCP server using FastMCP. This server serves as our foundation for demonstrating different client integrations.
Our MCP server exposes three powerful tools that any AI system can use:
Available Tools:
- get_recent_customers: Retrieves a list of recently active customers with their current status. This tool helps AI agents understand customer history and patterns.
- create_support_ticket: Creates new support tickets with customizable priority levels. The tool validates customer existence and generates unique ticket IDs.
- calculate_account_value: Analyzes purchase history to calculate total account value and average purchase amounts. This helps in customer segmentation and support prioritization.
The server also provides a customer resource (customer://{customer_id}) for direct customer data access and includes a prompt template for generating professional customer service responses.
What makes this special is that these tools work with any MCP-compatible client—whether you’re using OpenAI, Claude, LangChain, DSPy, or any other framework. The same server, built once, serves them all. This is the power of standardization that MCP brings to AI tool integration.
In this article, we’ll explore how LiteLLM connects to this server and enables these tools to work with over 100 different LLM providers.
Understanding LiteLLM: The Universal LLM Gateway
LiteLLM is more than just another AI library—it’s a universal translator for language models. Think of it as the Rosetta Stone of AI APIs, enabling you to write code once and run it with any supported model. Key features include:
- 100+ Model Support: From OpenAI and Anthropic to local models and specialized providers
- Unified Interface: Same code works across all providers
- Load Balancing: Distribute requests across multiple providers
- Cost Tracking: Monitor usage and costs across providers
- Fallback Support: Automatically switch providers on failure
- Format Translation: Converts between different API formats seamlessly
For deeper dives into LiteLLM’s capabilities, check out my articles on building a multi-provider chat application and enhancing it with RAG and streaming.
The Power of LiteLLM + MCP
Combining LiteLLM with MCP creates unprecedented flexibility:
- Write Once, Deploy Anywhere: Your MCP tools work with any LLM provider
- Provider Agnostic: Switch between models without changing tool integration code
- Cost Optimization: Route requests to the most cost-effective provider
- Compliance Friendly: Use local models for sensitive data, cloud for general queries
- Future Proof: New LLM providers automatically work with existing tools
Building Your First LiteLLM + MCP Integration
Let’s create an integration that demonstrates LiteLLM’s ability to use MCP tools across different providers.
Step 1: Setting Up the Integration
Here’s the core setup that connects LiteLLM to your MCP server:
import asyncio
import json
import litellm
from litellm import experimental_mcp_client
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from config import Config
async def setup_litellm_mcp():
"""Set up LiteLLM with MCP tools."""
# Create MCP server connection
server_params = StdioServerParameters(
command="poetry",
args=["run", "python", "src/main.py"]
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize the MCP connection
await session.initialize()
# Load MCP tools in OpenAI format
tools = await experimental_mcp_client.load_mcp_tools(
session=session,
format="openai"
)
print(f"Loaded {len(tools)} MCP tools")
The key insight here is the format="openai"
parameter. LiteLLM’s experimental MCP client loads tools in OpenAI’s function calling format, which LiteLLM can then translate to any provider’s format.
Step 2: Multi-Model Testing
One of LiteLLM’s strengths is enabling the same code to work with multiple models:
# Dynamically select models based on configuration
models_to_test = []
if Config.LLM_PROVIDER == "openai":
models_to_test.append(Config.OPENAI_MODEL)
elif Config.LLM_PROVIDER == "anthropic":
models_to_test.append(Config.ANTHROPIC_MODEL)
else:
# Test with multiple providers
models_to_test = [Config.OPENAI_MODEL, Config.ANTHROPIC_MODEL]
for model in models_to_test:
print(f"\nTesting with {model}...")
# Same code works for any model
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools,
)
This flexibility means you can:
- Test tools with different models to compare performance
- Switch providers based on availability or cost
- Use different models for different types of queries
Step 3: Handling Tool Execution
LiteLLM standardizes tool execution across providers:
# Check if the model made tool calls
if hasattr(message, "tool_calls") and message.tool_calls:
print(f"🔧 Tool calls made: {len(message.tool_calls)}")
# Process each tool call
for call in message.tool_calls:
print(f" - Executing {call.function.name}")
# Execute the tool through MCP
arguments = json.loads(call.function.arguments)
result = await session.call_tool(
call.function.name,
arguments
)
# Add tool result to conversation
messages.append({
"role": "tool",
"content": str(result.content),
"tool_call_id": call.id,
})
# Get final response with tool results
final_response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools,
)
This code demonstrates how LiteLLM:
- Receives tool calls in a standardized format
- Executes them through MCP
- Formats results appropriately for each provider
- Handles the complete conversation flow
Understanding the Flow
Let’s visualize how LiteLLM orchestrates communication between different LLM providers and MCP tools:
sequenceDiagram
participant User
participant LiteLLM
participant Provider
participant MCP
participant Tools
User->>LiteLLM: Request with model selection
LiteLLM->>LiteLLM: Load provider adapter
LiteLLM->>Provider: Translate to provider format
Provider-->>LiteLLM: Response with tool calls
LiteLLM->>LiteLLM: Standardize tool call format
loop For each tool call
LiteLLM->>MCP: Execute tool
MCP->>Tools: Run tool logic
Tools-->>MCP: Tool results
MCP-->>LiteLLM: Formatted results
end
LiteLLM->>Provider: Send tool results
Provider-->>LiteLLM: Final response
LiteLLM->>LiteLLM: Standardize response
LiteLLM-->>User: Unified response format
This diagram reveals LiteLLM’s role as a universal translator, converting between different provider formats while maintaining a consistent interface for your application.
Real-World Scenarios
Scenario 1: Cost-Optimized Routing
# Route simple queries to cheaper models
if is_simple_query(message):
model = "gpt-3.5-turbo" # Cheaper
else:
model = "gpt-4" # More capable
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools
)
Scenario 2: Compliance-Based Routing
# Use local models for sensitive data
if contains_pii(message):
model = "ollama/llama2" # Local model
else:
model = "claude-3-opus-20240229" # Cloud model
# Same MCP tools work with both
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools
)
Scenario 3: Fallback Handling
# LiteLLM can automatically handle fallbacks
models = ["gpt-4", "claude-4-latest", "ollama/mixtral"]
for model in models:
try:
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools
)
break # Success, exit loop
except Exception as e:
print(f"Failed with {model}, trying next...")
continue
Architectural Insights
The complete architecture shows how LiteLLM bridges multiple worlds:
graph TB
subgraph "Your Application"
App[Application Logic]
Config[Configuration]
end
subgraph "LiteLLM Gateway"
Router[Intelligent Router]
Translator[Format Translator]
Monitor[Usage Monitor]
end
subgraph "MCP Integration"
MCPClient[MCP Client]
ToolManager[Tool Manager]
end
subgraph "Provider Ecosystem"
subgraph "Cloud Providers"
OpenAI[OpenAI]
Anthropic[Anthropic]
Bedrock[AWS Bedrock]
end
subgraph "Local Models"
Ollama[Ollama]
VLLM[vLLM]
end
end
subgraph "MCP Tools"
CustomerDB[Customer Service]
Analytics[Analytics]
Ticketing[Ticketing]
end
App --> Router
Config --> Router
Router --> Translator
Translator --> OpenAI
Translator --> Anthropic
Translator --> Bedrock
Translator --> Ollama
Translator --> VLLM
Router --> MCPClient
MCPClient --> ToolManager
ToolManager --> CustomerDB
ToolManager --> Analytics
ToolManager --> Ticketing
Monitor -.->|Track Usage| Router
style Router fill:#3498db,color:black
style MCPClient fill:#2ecc71,color:black
style CustomerDB fill:#e74c3c,color:black
This architecture provides several key benefits:
- Single Integration Point: Your application only needs to know LiteLLM’s interface
- Provider Independence: Switch or add providers without changing application code
- Unified Tool Access: MCP tools work identically across all providers
- Centralized Monitoring: Track usage and costs across all providers
Advanced Patterns
Pattern 1: Provider-Specific Optimizations
# Customize parameters per provider
provider_configs = {
"gpt-4": {"temperature": 0.7, "max_tokens": 2000},
"claude-3": {"temperature": 0.5, "max_tokens": 4000},
"ollama/mixtral": {"temperature": 0.8, "max_tokens": 1000}
}
model = select_best_model(query)
config = provider_configs.get(model, {})
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools, **config # Provider-specific parameters
)
Pattern 2: Cost and Performance Tracking
# LiteLLM tracks costs automatically
from litellm import completion_cost
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools
)
cost = completion_cost(completion_response=response)
print(f"This request cost: ${cost:.4f}")
Pattern 3: Streaming with Tool Support
# Stream responses while maintaining tool support
async for chunk in await litellm.acompletion(
model=model,
messages=messages,
tools=tools,
stream=True
):
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
# Handle streaming tool calls
if hasattr(chunk.choices[0].delta, "tool_calls"):
# Process tool calls in real-time
pass
Getting Started
-
Clone the example repository:
git clone https://github.com/RichardHightower/mcp_article1 cd mcp_article1
-
Install LiteLLM and dependencies (follow instructions in README.md):
poetry add litellm python-mcp-sdk
-
Configure your providers:
# .env file OPENAI_API_KEY=your-key ANTHROPIC_API_KEY=your-key # Add any other provider keys
-
Run the integration:
poetry run python src/litellm_integration.py
Key Takeaways
The combination of LiteLLM and MCP represents the ultimate flexibility in AI tool integration:
- True Portability: Write once, run with any LLM provider
- Cost Optimization: Route to the most economical provider for each query
- Risk Mitigation: No vendor lock-in, easy provider switching
- Compliance Ready: Use appropriate models for different data sensitivities
- Future Proof: New providers automatically work with existing tools
By abstracting both the tool layer (MCP) and the model layer (LiteLLM), you create AI systems that adapt to changing requirements without code changes. This is enterprise-grade flexibility at its finest.
References
- GitHub Repository: MCP Article Examples - Complete working code for all integrations
- Comprehensive MCP Guide: MCP: From Chaos to Harmony - Deep dive into MCP architecture and server development
- LiteLLM Articles:
- Multi-Provider Chat App - Building with LiteLLM and Streamlit
- Beyond Chat: RAG and Streaming - Advanced LiteLLM patterns
- Official Documentation:
- LiteLLM Docs - Complete API reference
- MCP Specification - Protocol details
Next Steps
Ready to build provider-agnostic AI tools? Here’s your roadmap:
- Explore the example code to understand the integration patterns
- Experiment with different providers to find the best fit for your use case
- Implement cost tracking and optimization strategies
- Build fallback chains for mission-critical applications
The future of AI isn’t tied to a single provider—it’s about choosing the right tool for each job. With LiteLLM and MCP, you have the freedom to make that choice without rewriting your code.
Want to explore more integration patterns? Check out our articles on OpenAI + MCP, DSPy’s self-optimizing approach, and LangChain workflows. For the complete guide to building MCP servers, see our comprehensive guide.
If you like this article, follow Rick on LinkedIn or on Medium.
About the Author
Rick Hightower brings extensive enterprise experience as a former executive and distinguished engineer at a Fortune 100 company, where he specialized in delivering Machine Learning and AI solutions to deliver intelligent customer experience. His expertise spans both the theoretical foundations and practical applications of AI technologies.
As a TensorFlow certified professional and graduate of Stanford University’s comprehensive Machine Learning Specialization, Rick combines academic rigor with real-world implementation experience. His training includes mastery of supervised learning techniques, neural networks, and advanced AI concepts, which he has successfully applied to enterprise-scale solutions.
With a deep understanding of both the business and technical aspects of AI implementation, Rick bridges the gap between theoretical machine learning concepts and practical business applications, helping organizations leverage AI to create tangible value.
If you like this article, follow Rick on LinkedIn or on Medium.
Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting