LiteLLM and MCP One Gateway to Rule All AI Models

June 20, 2025

                                                                           

LiteLLM and MCP: One Gateway to Rule All AI Models

ChatGPT Image Jun 20, 2025, 12_25_40 PM.png

Picture this: You’ve built a sophisticated AI tool integration, but your client suddenly wants to switch from OpenAI to Claude for cost reasons. Or maybe they need to use local models for sensitive data while using cloud models for general queries. Without proper abstraction, each change means rewriting your integration code. LiteLLM combined with the Model Context Protocol (MCP) transforms this nightmare into a simple configuration change.

This article demonstrates how LiteLLM’s universal LLM gateway integrates with MCP to create truly portable AI tool integrations. Whether you’re using OpenAI, Anthropic, AWS Bedrock, or local models through Ollama, your MCP tools work seamlessly across all of them.

About Our MCP Server: The Customer Service Assistant

Before exploring how to connect LiteLLM to MCP, it’s helpful to understand what we’re connecting to. In our comprehensive MCP guide, we built a complete customer service MCP server using FastMCP. This server serves as our foundation for demonstrating different client integrations.

Our MCP server exposes three powerful tools that any AI system can use:

Available Tools:

  • get_recent_customers: Retrieves a list of recently active customers with their current status. This tool helps AI agents understand customer history and patterns.
  • create_support_ticket: Creates new support tickets with customizable priority levels. The tool validates customer existence and generates unique ticket IDs.
  • calculate_account_value: Analyzes purchase history to calculate total account value and average purchase amounts. This helps in customer segmentation and support prioritization.

The server also provides a customer resource (customer://{customer_id}) for direct customer data access and includes a prompt template for generating professional customer service responses.

What makes this special is that these tools work with any MCP-compatible client—whether you’re using OpenAI, Claude, LangChain, DSPy, or any other framework. The same server, built once, serves them all. This is the power of standardization that MCP brings to AI tool integration.

In this article, we’ll explore how LiteLLM connects to this server and enables these tools to work with over 100 different LLM providers.

Understanding LiteLLM: The Universal LLM Gateway

LiteLLM is more than just another AI library—it’s a universal translator for language models. Think of it as the Rosetta Stone of AI APIs, enabling you to write code once and run it with any supported model. Key features include:

  • 100+ Model Support: From OpenAI and Anthropic to local models and specialized providers
  • Unified Interface: Same code works across all providers
  • Load Balancing: Distribute requests across multiple providers
  • Cost Tracking: Monitor usage and costs across providers
  • Fallback Support: Automatically switch providers on failure
  • Format Translation: Converts between different API formats seamlessly

For deeper dives into LiteLLM’s capabilities, check out my articles on building a multi-provider chat application and enhancing it with RAG and streaming.

The Power of LiteLLM + MCP

Combining LiteLLM with MCP creates unprecedented flexibility:

  1. Write Once, Deploy Anywhere: Your MCP tools work with any LLM provider
  2. Provider Agnostic: Switch between models without changing tool integration code
  3. Cost Optimization: Route requests to the most cost-effective provider
  4. Compliance Friendly: Use local models for sensitive data, cloud for general queries
  5. Future Proof: New LLM providers automatically work with existing tools

Building Your First LiteLLM + MCP Integration

Let’s create an integration that demonstrates LiteLLM’s ability to use MCP tools across different providers.

Step 1: Setting Up the Integration

Here’s the core setup that connects LiteLLM to your MCP server:

import asyncio
import json
import litellm
from litellm import experimental_mcp_client
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from config import Config

async def setup_litellm_mcp():
    """Set up LiteLLM with MCP tools."""

    # Create MCP server connection
    server_params = StdioServerParameters(
        command="poetry",
        args=["run", "python", "src/main.py"]
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Initialize the MCP connection
            await session.initialize()

            # Load MCP tools in OpenAI format
            tools = await experimental_mcp_client.load_mcp_tools(
                session=session,
                format="openai"
            )

            print(f"Loaded {len(tools)} MCP tools")

The key insight here is the format="openai" parameter. LiteLLM’s experimental MCP client loads tools in OpenAI’s function calling format, which LiteLLM can then translate to any provider’s format.

Step 2: Multi-Model Testing

One of LiteLLM’s strengths is enabling the same code to work with multiple models:


# Dynamically select models based on configuration
models_to_test = []

if Config.LLM_PROVIDER == "openai":
    models_to_test.append(Config.OPENAI_MODEL)
elif Config.LLM_PROVIDER == "anthropic":
    models_to_test.append(Config.ANTHROPIC_MODEL)
else:
    # Test with multiple providers
    models_to_test = [Config.OPENAI_MODEL, Config.ANTHROPIC_MODEL]

for model in models_to_test:
    print(f"\nTesting with {model}...")

    # Same code works for any model
    response = await litellm.acompletion(
        model=model,
        messages=messages,
        tools=tools,
    )

This flexibility means you can:

  • Test tools with different models to compare performance
  • Switch providers based on availability or cost
  • Use different models for different types of queries

Step 3: Handling Tool Execution

LiteLLM standardizes tool execution across providers:


# Check if the model made tool calls
if hasattr(message, "tool_calls") and message.tool_calls:
    print(f"🔧 Tool calls made: {len(message.tool_calls)}")

    # Process each tool call
    for call in message.tool_calls:
        print(f"   - Executing {call.function.name}")

        # Execute the tool through MCP
        arguments = json.loads(call.function.arguments)
        result = await session.call_tool(
            call.function.name,
            arguments
        )

        # Add tool result to conversation
        messages.append({
            "role": "tool",
            "content": str(result.content),
            "tool_call_id": call.id,
        })

    # Get final response with tool results
    final_response = await litellm.acompletion(
        model=model,
        messages=messages,
        tools=tools,
    )

This code demonstrates how LiteLLM:

  1. Receives tool calls in a standardized format
  2. Executes them through MCP
  3. Formats results appropriately for each provider
  4. Handles the complete conversation flow

Understanding the Flow

Let’s visualize how LiteLLM orchestrates communication between different LLM providers and MCP tools:

sequenceDiagram
    participant User
    participant LiteLLM
    participant Provider
    participant MCP
    participant Tools

    User->>LiteLLM: Request with model selection
    LiteLLM->>LiteLLM: Load provider adapter
    LiteLLM->>Provider: Translate to provider format
    Provider-->>LiteLLM: Response with tool calls
    LiteLLM->>LiteLLM: Standardize tool call format

    loop For each tool call
        LiteLLM->>MCP: Execute tool
        MCP->>Tools: Run tool logic
        Tools-->>MCP: Tool results
        MCP-->>LiteLLM: Formatted results
    end

    LiteLLM->>Provider: Send tool results
    Provider-->>LiteLLM: Final response
    LiteLLM->>LiteLLM: Standardize response
    LiteLLM-->>User: Unified response format

This diagram reveals LiteLLM’s role as a universal translator, converting between different provider formats while maintaining a consistent interface for your application.

Real-World Scenarios

Scenario 1: Cost-Optimized Routing


# Route simple queries to cheaper models
if is_simple_query(message):
    model = "gpt-3.5-turbo"  # Cheaper
else:
    model = "gpt-4"  # More capable

response = await litellm.acompletion(
    model=model,
    messages=messages,
    tools=tools
)

Scenario 2: Compliance-Based Routing


# Use local models for sensitive data
if contains_pii(message):
    model = "ollama/llama2"  # Local model
else:
    model = "claude-3-opus-20240229"  # Cloud model


# Same MCP tools work with both
response = await litellm.acompletion(
    model=model,
    messages=messages,
    tools=tools
)

Scenario 3: Fallback Handling


# LiteLLM can automatically handle fallbacks
models = ["gpt-4", "claude-4-latest", "ollama/mixtral"]

for model in models:
    try:
        response = await litellm.acompletion(
            model=model,
            messages=messages,
            tools=tools
        )
        break  # Success, exit loop
    except Exception as e:
        print(f"Failed with {model}, trying next...")
        continue

Architectural Insights

The complete architecture shows how LiteLLM bridges multiple worlds:

graph TB
    subgraph "Your Application"
        App[Application Logic]
        Config[Configuration]
    end

    subgraph "LiteLLM Gateway"
        Router[Intelligent Router]
        Translator[Format Translator]
        Monitor[Usage Monitor]
    end

    subgraph "MCP Integration"
        MCPClient[MCP Client]
        ToolManager[Tool Manager]
    end

    subgraph "Provider Ecosystem"
        subgraph "Cloud Providers"
            OpenAI[OpenAI]
            Anthropic[Anthropic]
            Bedrock[AWS Bedrock]
        end

        subgraph "Local Models"
            Ollama[Ollama]
            VLLM[vLLM]
        end
    end

    subgraph "MCP Tools"
        CustomerDB[Customer Service]
        Analytics[Analytics]
        Ticketing[Ticketing]
    end

    App --> Router
    Config --> Router
    Router --> Translator
    Translator --> OpenAI
    Translator --> Anthropic
    Translator --> Bedrock
    Translator --> Ollama
    Translator --> VLLM

    Router --> MCPClient
    MCPClient --> ToolManager
    ToolManager --> CustomerDB
    ToolManager --> Analytics
    ToolManager --> Ticketing

    Monitor -.->|Track Usage| Router

    style Router fill:#3498db,color:black
    style MCPClient fill:#2ecc71,color:black
    style CustomerDB fill:#e74c3c,color:black

This architecture provides several key benefits:

  1. Single Integration Point: Your application only needs to know LiteLLM’s interface
  2. Provider Independence: Switch or add providers without changing application code
  3. Unified Tool Access: MCP tools work identically across all providers
  4. Centralized Monitoring: Track usage and costs across all providers

Advanced Patterns

Pattern 1: Provider-Specific Optimizations


# Customize parameters per provider
provider_configs = {
    "gpt-4": {"temperature": 0.7, "max_tokens": 2000},
    "claude-3": {"temperature": 0.5, "max_tokens": 4000},
    "ollama/mixtral": {"temperature": 0.8, "max_tokens": 1000}
}

model = select_best_model(query)
config = provider_configs.get(model, {})

response = await litellm.acompletion(
    model=model,
    messages=messages,
    tools=tools, **config  # Provider-specific parameters
)

Pattern 2: Cost and Performance Tracking


# LiteLLM tracks costs automatically
from litellm import completion_cost

response = await litellm.acompletion(
    model=model,
    messages=messages,
    tools=tools
)

cost = completion_cost(completion_response=response)
print(f"This request cost: ${cost:.4f}")

Pattern 3: Streaming with Tool Support


# Stream responses while maintaining tool support
async for chunk in await litellm.acompletion(
    model=model,
    messages=messages,
    tools=tools,
    stream=True
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

    # Handle streaming tool calls
    if hasattr(chunk.choices[0].delta, "tool_calls"):
        # Process tool calls in real-time
        pass

Getting Started

  1. Clone the example repository:

    git clone https://github.com/RichardHightower/mcp_article1
    cd mcp_article1
    
  2. Install LiteLLM and dependencies (follow instructions in README.md):

    poetry add litellm python-mcp-sdk
    
  3. Configure your providers:

    # .env file
    OPENAI_API_KEY=your-key
    ANTHROPIC_API_KEY=your-key
    # Add any other provider keys
    
  4. Run the integration:

    poetry run python src/litellm_integration.py
    

Key Takeaways

The combination of LiteLLM and MCP represents the ultimate flexibility in AI tool integration:

  • True Portability: Write once, run with any LLM provider
  • Cost Optimization: Route to the most economical provider for each query
  • Risk Mitigation: No vendor lock-in, easy provider switching
  • Compliance Ready: Use appropriate models for different data sensitivities
  • Future Proof: New providers automatically work with existing tools

By abstracting both the tool layer (MCP) and the model layer (LiteLLM), you create AI systems that adapt to changing requirements without code changes. This is enterprise-grade flexibility at its finest.

References

Next Steps

Ready to build provider-agnostic AI tools? Here’s your roadmap:

  1. Explore the example code to understand the integration patterns
  2. Experiment with different providers to find the best fit for your use case
  3. Implement cost tracking and optimization strategies
  4. Build fallback chains for mission-critical applications

The future of AI isn’t tied to a single provider—it’s about choosing the right tool for each job. With LiteLLM and MCP, you have the freedom to make that choice without rewriting your code.


Want to explore more integration patterns? Check out our articles on OpenAI + MCP, DSPy’s self-optimizing approach, and LangChain workflows. For the complete guide to building MCP servers, see our comprehensive guide.

If you like this article, follow Rick on LinkedIn or on Medium.

About the Author

Rick Hightower brings extensive enterprise experience as a former executive and distinguished engineer at a Fortune 100 company, where he specialized in delivering Machine Learning and AI solutions to deliver intelligent customer experience. His expertise spans both the theoretical foundations and practical applications of AI technologies.

As a TensorFlow certified professional and graduate of Stanford University’s comprehensive Machine Learning Specialization, Rick combines academic rigor with real-world implementation experience. His training includes mastery of supervised learning techniques, neural networks, and advanced AI concepts, which he has successfully applied to enterprise-scale solutions.

With a deep understanding of both the business and technical aspects of AI implementation, Rick bridges the gap between theoretical machine learning concepts and practical business applications, helping organizations leverage AI to create tangible value.

If you like this article, follow Rick on LinkedIn or on Medium.

image.png

                                                                           
comments powered by Disqus

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting