July 8, 2025
Design of the application
1. Project Overview
This project consists of two main components:
- Multi-Provider Chat Application: A Streamlit-based chat interface that enables users to interact with multiple Large Language Model (LLM) providers through a unified interface. It supports OpenAI, Anthropic Claude, Google Gemini, Perplexity, Ollama (for local models).
AWS
Bedrock. - Vector-RAG (Retrieval-Augmented Generation) System: A complementary system that enhances the chat application with context-aware responses by retrieving relevant information from a document database. It uses vector embeddings stored in PostgreSQL with pgvector for semantic search capabilities.
Purpose and feature
The system solves several core problems:
- Unified LLM Access: Provides a single interface to interact with multiple AI providers, allowing users to switch between different models and services seamlessly.
- Context-Aware Responses: Enhances AI responses with relevant information from a document database using RAG technology.
- Real-Time Interaction: creates streaming responses to demonstrate AI outputs as they’re generated, improving user experience.
- Conversation Management: Allows saving, loading, and managing conversations across different AI providers.
- Enterprise Integration: Supports
AWS
Bedrock for organizations that prefer to use Amazon’s infrastructure for AI services. - Local Model Support: Integrates with Ollama to enable running open-source models locally for privacy and reduced
API
costs.
2. Features
Features
- Multi-Provider Support
- Connect to multiple LLM providers (OpenAI, Anthropic, Google, Perplexity,
AWS
Bedrock) - Run local models through Ollama
- Switch between providers/models seamlessly during conversations
- Connect to multiple LLM providers (OpenAI, Anthropic, Google, Perplexity,
- Conversation Management
- Maintain conversation history across provider switches
- Save, load, and delete conversations
- Generate conversation titles automatically
- Export conversations as text files
- Document Management and RAG
- Process and store documents in a vector database
- shatter documents into chunks for improved retrieval
- Generate vector embeddings for semantic search
- Search documents based on similarity to queries
- Include relevant document context in LLM prompts
- Response Streaming
- Display LLM responses in real-time as they’re generated
- Provide fallback mechanisms if streaming fails
- demonstrate visual indicators during streaming
- Provider-Specific Settings
- Configure model-specific parameters (temperature, context size)
- Handle provider-specific authentication requirements
- Adjust parameters based on model capabilities
AWS
Bedrock Integration- Connect to
AWS
Bedrock service for accessing foundation models - Manage
AWS
authentication and region settings - Check available models in the user’s
AWS
account
- Connect to
3. Technology Stack
Programming Languages
Python
: Primary language for both the chat application and vector-RAG system
Frameworks and Tools
- Streamlit: Web framework for building the user interface
- PostgreSQL: Database for storing vectors, documents. conversation data
- pgvector: PostgreSQL extension for vector similarity search
- SQLAlchemy: ORM for database operations
- Pydantic: Data validation and settings management
Docker
: Containerization for the PostgreSQL database
Libraries and Packages
Chat Application
- litellm: Unified interface for multiple LLM providers (all python lib)
- python-dotenv: Environment
variable
management - requests: HTTP client for
API
calls - boto3:
AWS
SDK forPython
, used for Bedrock integration - asyncio: Asynchronous programming support
Vector-RAG System
- pgvector: PostgreSQL extension for vector embeddings
- sentence-transformers: Local embedding generation
- openai: OpenAI
API
client for embeddings - torch: Backend for sentence-transformers
- sqlalchemy: Database ORM
- psycopg2-binary: PostgreSQL adapter for
Python
- jsonschema:
JSON
schema validation
4. Architecture and Design
Overall Architecture
The project uses a modular layered architecture with clear separation of concerns:
- Presentation Layer: Streamlit web interface for user interaction
- Service Layer: Provider classes that handle communication with various LLM APIs
- Data Layer: Conversation storage and vector database components
Key Components for Chat Application
- LLM Provider System
- Abstract base
class
(LLMProvider
) defining the interface for all providers - Concrete createations for each supported provider (OpenAI, Anthropic, etc.)
- Provider manager for initializing and managing provider instances
- Abstract base
- Conversation Management
Conversation
class
for managing message historyConversationStorage
for saving/loading conversations to/from disk- UI components for displaying and managing conversations
- User Interface
- Chat display and input handling
- Provider settings sidebar
- Streaming response handling
- Utilities
JSON
processing utilities- Streaming utilities
- Logging configuration
Key Components for Vector-RAG System
- Document Processing
Project
,File
andChunk
models for representing documents and their segments- Chunking strategies (
LineChunker
,SizeChunker
,WordChunker
)
- Vector Embedding
Embedder
interface for generating vector embeddings- createations for different embedding providers (OpenAI, SentenceTransformers)
- Database Management
- PostgreSQL with pgvector extension
DBFileHandler
for managing files, chunks, and embeddings- Query utilities for semantic similarity search
- Search and Retrieval
- Similarity search feature
- Metadata filtering
- Pagination for search results
5. Diagram Generation
Flowchart: Main Chat Application Workflow
flowchart TD
A[User Opens Application] --> B[Initialize Providers]
B --> C[Select Provider and Model]
C --> D[Enter Message]
D --> E{Streaming Enabled?}
E -- Yes --> F[Stream Response in Real-time]
E -- No --> G[Generate Complete Response]
F --> H[Update Conversation History]
G --> H
H --> I{Save Conversation?}
I -- Yes --> J[Store Conversation to Disk]
I -- No --> D
%% Add RAG integration
D --> K{RAG Enabled?}
K -- Yes --> L[Search Vector Database]
L --> M[Retrieve Relevant Context]
M --> N[Add Context to Prompt]
N --> E
K -- No --> E
Class Diagram: LLM Provider System
classDiagram
`class` LLMProvider {
<<abstract>>
+generate_completion(prompt, output_format, options, conversation)
+generate_json(prompt, schema, options, conversation)
+generate_completion_stream(prompt, output_format, options, conversation, callback)
}
LLMProvider <|-- OpenAIProvider
LLMProvider <|-- AnthropicProvider
LLMProvider <|-- GoogleGeminiProvider
LLMProvider <|-- PerplexityProvider
LLMProvider <|-- OllamaProvider
LLMProvider <|-- BedrockProvider
`class` OpenAIProvider {
-api_key
-model
-client
+generate_completion()
+generate_json()
+generate_completion_stream()
-_generate_completion_gpt4_series()
-_generate_completion_o_series()
}
`class` AnthropicProvider {
-api_key
-model
-original_model_name
-client
+generate_completion()
+generate_completion_stream()
}
`class` GoogleGeminiProvider {
-api_key
-model
-client
+generate_completion()
+generate_completion_stream()
}
`class` PerplexityProvider {
-api_key
-model
-client
+generate_completion()
+generate_completion_stream()
-_validate_message_sequence()
-_is_online_model()
}
`class` OllamaProvider {
-model
-original_model_name
-base_url
-client
+generate_completion()
+generate_completion_stream()
}
`class` BedrockProvider {
-aws_access_key_id
-aws_secret_access_key
-aws_session_token
-aws_region
-original_model
-model
-inference_profile
-bedrock_runtime
-use_direct_api
+generate_completion()
+generate_completion_stream()
-_generate_with_boto3()
-_generate_with_litellm()
}
Class Diagram: Conversation Management
classDiagram
`class` Conversation {
+id: str
+title: Optional[str]
+messages: List[Message]
+created_at: datetime
+updated_at: datetime
+add_message(content, message_type, role)
+to_llm_messages()
+ensure_alternating_messages()
}
`class` Message {
+timestamp: datetime
+message_type: MessageType
+content: str
+role: str
+to_llm_message()
}
`class` MessageType {
<<enum>>
INPUT
OUTPUT
}
`class` ConversationStorage {
-storage_dir: Path
+save_conversation(conversation)
+load_conversation(conversation_id)
+delete_conversation(conversation_id)
+list_conversations()
+generate_conversation_title(conversation)
+update_conversation_title(conversation_id, new_title)
}
Conversation "1" *-- "many" Message
Message *-- MessageType
ConversationStorage -- Conversation : manages >
Class Diagram: Vector-RAG System
classDiagram
`class` Chunker {
<<abstract>>
+chunk_text(file)
}
`class` LineChunker {
-chunk_size
-overlap
+chunk_text(file)
}
`class` SizeChunker {
-chunk_size
-overlap
+chunk_text(file)
}
`class` WordChunker {
-chunk_size
-overlap
+chunk_text(file)
}
`class` Embedder {
<<abstract>>
-model_name
-dimension
+get_dimension()
+embed_texts(texts)
}
`class` OpenAIEmbedder {
-client
-batch_size
+embed_texts(chunks)
}
`class` SentenceTransformersEmbedder {
-model
-batch_size
+embed_texts(chunks)
}
`class` MockEmbedder {
+embed_texts(texts)
}
`class` FileHandler {
<<abstract>>
+create_project(name, description)
+add_file(project_id, file_model)
+get_file(project_id, file_path, filename)
+delete_file(file_id)
+list_files(project_id)
+search_chunks_by_text(project_id, query_text, page, page_size, similarity_threshold)
}
`class` DBFileHandler {
-engine
-embedder
-Session
-chunker
+session_scope()
+get_or_create_project(name, description)
+add_chunks(file_id, chunks)
+add_file(project_id, file_model)
+search_chunks_by_embedding(project_id, embedding, page, page_size, similarity_threshold, file_id, metadata_filter)
}
Chunker <|-- LineChunker
Chunker <|-- SizeChunker
Chunker <|-- WordChunker
Embedder <|-- OpenAIEmbedder
Embedder <|-- SentenceTransformersEmbedder
Embedder <|-- MockEmbedder
FileHandler <|-- DBFileHandler
DBFileHandler --> Chunker : uses
DBFileHandler --> Embedder : uses
Sequence Diagram: Chat Interaction with Streaming
sequenceDiagram
participant User
participant UI as Streamlit UI
participant Provider as LLM Provider
participant Conversation as Conversation Manager
User->>UI: Enter message
UI->>Provider: generate_completion_stream(prompt)
activate Provider
Provider->>Provider: Initialize streaming
Provider-->>UI: launch streaming chunks
loop For each chunk
Provider-->>UI: Send text chunk
UI->>UI: Update display with chunk
end
Provider-->>UI: Complete streaming
deactivate Provider
UI->>Conversation: Add message to history
Conversation->>Conversation: Update conversation
UI->>UI: Display full response
Sequence Diagram: RAG Integration Flow
sequenceDiagram
participant User
participant UI as Streamlit UI
participant VectorDB as Vector Database
participant LLM as LLM Provider
User->>UI: Ask question
UI->>VectorDB: search_chunks_by_text(query)
activate VectorDB
VectorDB->>VectorDB: Generate query embedding
VectorDB->>VectorDB: discover similar chunks
VectorDB-->>UI: Return relevant chunks
deactivate VectorDB
UI->>UI: Combine query with context
UI->>LLM: generate_completion(enhanced_prompt)
activate LLM
LLM->>LLM: Process with context
LLM-->>UI: Return response
deactivate LLM
UI->>UI: Display response to user
Schema Diagram: Vector-RAG Database
erDiagram
PROJECTS {
int id PK
string name
string description
datetime created_at
datetime updated_at
}
FILES {
int id PK
int project_id FK
string filename
string file_path
string crc
int file_size
datetime last_updated
datetime last_ingested
datetime created_at
}
CHUNKS {
int id PK
int file_id FK
string content
vector embedding
int chunk_index
jsonb chunk_metadata
datetime created_at
}
PROJECTS ||--o{ FILES : contains
FILES ||--o{ CHUNKS : divided_into
Entity Relationship Diagram
erDiagram
Project ||--o{ File : contains
File ||--o{ Chunk : contains
Conversation ||--o{ Message : contains
Project {
int id
string name
string description
datetime created_at
datetime updated_at
}
File {
int id
int project_id
string filename
string file_path
string crc
int file_size
datetime created_at
}
Chunk {
int id
int file_id
string content
vector embedding
int chunk_index
jsonb metadata
datetime created_at
}
Conversation {
string id
string title
datetime created_at
datetime updated_at
}
Message {
datetime timestamp
enum message_type
string content
string role
}
User Journey Diagram
journey
title Chat Application User Journey
section Initial Setup
Log in to Application: 5: User
Select LLM Provider: 3: User
Configure Provider Settings: 3: User
section Basic Chat
Type Question: 5: User
View Streaming Response: 4: User
Follow Up Question: 5: User
section Working with RAG
Ask Domain-Specific Question: 5: User
Review Sources from Documents: 4: User
Refine Query Based on Context: 4: User
section Conversation Management
Save Conversation: 5: User
Load Previous Conversation: 4: User
Export Conversation: 3: User
section Advanced Features
Try Different Providers: 4: User
Compare Model Responses: 5: User
Use `AWS` Bedrock Integration: 3: User
Use Case Diagram
graph TD
subgraph Actors
User([User])
end
subgraph Chat Application
UC1[Chat with LLMs]
UC2[Select Provider/Model]
UC3[Manage Conversations]
UC4[Configure Provider Settings]
UC5[View Streaming Responses]
UC6[Export Conversations]
end
subgraph RAG System
UC7[Search Knowledge Base]
UC8[Add Documents to KB]
UC9[Receive Context-Enhanced Responses]
UC10[Filter by Metadata]
end
User --> UC1
User --> UC2
User --> UC3
User --> UC4
User --> UC5
User --> UC6
User --> UC7
User --> UC8
User --> UC9
User --> UC10
Mind Map
mindmap
root((Multi-Provider<br>Chat App))
LLM Providers
OpenAI
GPT-4o
GPT-4.1
Anthropic
Claude 3 Opus
Claude 3 Sonnet
Claude 3 Haiku
Google Gemini
Perplexity
Ollama
Local Models
Custom Settings
`AWS` Bedrock
Claude Models
Llama Models
Titan Models
Features
Streaming Responses
Conversation Management
Save/Load
Export
Auto-Title
RAG Integration
Vector Search
Document Chunking
Metadata Filtering
Provider Settings
Temperature Control
Context Length Adjustment
UI Components
Chat Interface
Sidebar Settings
Conversation History
Provider Selection
Vector-RAG System
Database
PostgreSQL
pgvector Extension
Embedding Generation
OpenAI Embeddings
Sentence Transformers
Chunking Strategies
Line-based
Size-based
Word-based
Architecture Diagram
graph TD
subgraph "User Interface Layer"
A[Streamlit Web Interface]
B[Sidebar Controls]
C[Chat Display]
end
subgraph "Service Layer"
D[Provider Manager]
E[LLM Provider Interface]
F[Conversation Manager]
G[Streaming Controller]
H[Vector Search `API`]
end
subgraph "Provider createations"
I[OpenAI]
J[Anthropic]
K[Google Gemini]
L[Perplexity]
M[Ollama]
N[`AWS` Bedrock]
end
subgraph "Data Layer"
O[Conversation Storage]
P[PostgreSQL + pgvector]
Q[Document Processor]
R[Embedding Generation]
end
subgraph "External Services"
S[OpenAI `API`]
T[Claude `API`]
U[Gemini `API`]
V[Perplexity `API`]
W[Local Ollama Server]
X[`AWS` Bedrock Service]
end
A <--> B
A <--> C
B <--> D
C <--> F
C <--> G
D <--> E
E <--> I
E <--> J
E <--> K
E <--> L
E <--> M
E <--> N
F <--> O
G <--> E
I <--> S
J <--> T
K <--> U
L <--> V
M <--> W
N <--> X
H <--> P
H <--> Q
H <--> R
E <--> H
6. Directory Structure
Chat Application Directory Structure
chat/
├── ./
│ └── pyproject.toml # Project metadata and dependencies
├── test/ # Test directory
│ ├── test_all_providers_streaming.py # Test streaming feature
│ ├── test_bedrock.py # Test `AWS` Bedrock provider
│ ├── test_bedrock_access.py # Check `AWS` Bedrock model access
│ ├── test_litellm_streaming.py # Test litellm streaming independently
│ ├── test_stream_anthropic.py # Test Anthropic streaming
│ ├── test_stream_openai.py # Test OpenAI streaming
│ ├── test_stream_openai_json.py # Test OpenAI `JSON` streaming
│ ├── test_streaming.py # General streaming tests
│ └── chat/ # Additional chat tests
├── docs/ # Documentation
│ └── images/ # Images for documentation
└── src/ # Source code
└── chat/ # Main application code
├── __init__.py # Package initialization
├── app.py # Main application entry point
├── ai/ # LLM provider createations
│ ├── __init__.py
│ ├── anthropic.py # Anthropic Claude provider
│ ├── bedrock.py # `AWS` Bedrock provider
│ ├── google_gemini.py # Google Gemini provider
│ ├── llm_provider.py # Abstract provider interface
│ ├── ollama.py # Ollama local model provider
│ ├── open_ai.py # OpenAI provider
│ ├── perplexity.py # Perplexity provider
│ └── provider_manager.py # Provider management
├── conversation/ # Conversation management
│ ├── __init__.py
│ ├── conversation.py # Conversation and message models
│ └── conversation_storage.py # Conversation persistence
├── ui/ # UI components
│ ├── __init__.py
│ ├── chat.py # Chat display and input handling
│ ├── chat_utils.py # Chat-specific utilities
│ ├── conversation_manager.py # UI for conversation management
│ └── sidebar.py # Sidebar UI components
└── util/ # Utility functions
├── __init__.py
├── json_util.py # `JSON` processing utilities
├── logging_util.py # Logging configuration
└── streaming_util.py # Streaming helper functions
Vector-RAG Directory Structure
vector-rag/
├── ./
│ └── pyproject.toml # Project metadata and dependencies
├── tests/ # Test directory
│ ├── conftest.py # Test configuration
│ ├── embeddings/ # Embedder tests
│ │ ├── test_embedders.py # General embedder tests
│ │ ├── test_openai_embedder.py # OpenAI embedder tests
│ │ └── test_sentence_transformers_embedder.py # Local embedder tests
│ ├── integration/ # Integration tests
│ │ └── test_ingestion.py # Document ingestion testing
│ ├── chunking/ # Chunking strategy tests
│ │ ├── test_line_chunker.py # Line-based chunker tests
│ │ ├── test_size_chunker.py # Size-based chunker tests
│ │ └── test_word_chunker.py # Word-based chunker tests
│ ├── db/ # Database tests
│ │ ├── test_add_chunk_direct.py # Chunk addition tests
│ │ ├── test_db_file_handler.py # File handler tests
│ │ ├── test_dimension_utils.py # Vector dimension utilities tests
│ │ ├── test_semantic_search.py # Semantic search tests
│ │ └── test_semantic_search_with_metadata.py # Metadata filtering tests
│ └── api/ # `API` tests
│ └── test_search_api.py # Search `API` tests
├── db/ # Database scripts
│ ├── scripts/ # Database management scripts
│ │ ├── create_index.py # Index creation script
│ │ └── import_data.py # Data import script
│ └── sql/ # `SQL` scripts
│ └── init.sql # Database initialization `SQL`
├── environment/ # Environment configuration
├── src/ # Source code
├── scripts/ # Utility scripts
│ ├── init_db.py # Database initialization script
│ └── run_example.py # Example usage script
└── vector_rag/ # Main package
├── __init__.py # Package initialization
├── config.py # Configuration management
├── logging_config.py # Logging setup
├── model.py # Core data models
├── embeddings/ # Embedding generation
│ ├── __init__.py
│ ├── base.py # Base embedder interface
│ ├── mock_embedder.py # Mock embedder for testing
│ ├── openai_embedder.py # OpenAI embeddings
│ └── sentence_transformers_embedder.py # Local embeddings
├── chunking/ # Document chunking strategies
│ ├── __init__.py
│ ├── base_chunker.py # Base chunker interface
│ ├── line_chunker.py # Line-based chunking
│ ├── size_chunker.py # Size-based chunking
│ └── word_chunker.py # Word-based chunking
├── db/ # Database operations
│ ├── __init__.py
│ ├── base_file_handler.py # Abstract file handler
│ ├── db_file_handler.py # Concrete file handler
│ ├── db_model.py # Database models
│ └── dimension_utils.py # Vector dimension utilities
└── api/ # `API` interfaces
└── __init__.py # `API` package initialization
Main Components Description
Chat Application
- src/chat/ai/: Contains createations for different LLM providers. Each provider (OpenAI, Anthropic, etc.) creates the abstract
LLMProvider
interface, allowing for consistent interaction regardless of the backend service. - src/chat/conversation/: Manages conversation history and persistence. The
Conversation
class
represents a chat session, whileConversationStorage
handles saving and loading conversations from disk. - src/chat/ui/: Houses the Streamlit UI components. The
chat.py
module handles chat display and user input, whilesidebar.py
manages provider selection and settings. - src/chat/util/: Contains utility functions for
JSON
processing, logging configuration, and streaming response handling. - src/chat/app.py: The main entry point that ties all components together, defining the Streamlit application flow.
Vector-RAG System
- src/vector_rag/chunking/: Contains different strategies for breaking documents into chunks. These include
LineChunker
(line-based),SizeChunker
(character-based).WordChunker
(word-based). - src/vector_rag/embeddings/: Houses the embedding generation components. It includes
OpenAIEmbedder
for using OpenAI’sAPI
andSentenceTransformersEmbedder
for local embedding using HuggingFace models. - src/vector_rag/db/: Manages database operations. The
DBFileHandler
class
handles file, chunk. embedding storage in PostgreSQL, whiledimension_utils.py
ensures the vector dimensions are properly configured. - src/vector_rag/api/: Provides a simplified
API
for integrating the RAG system with other applications, such as the chat interface. - src/vector_rag/model.py: Defines the core data models, including
Project
,File
, andChunk
. - src/vector_rag/config.py: Handles configuration management, loading settings from environment variables and
.env
files.
Key Features and createation Highlights
- Provider Abstraction: The
LLMProvider
interface abstracts away the differences between various LLM services, allowing for consistent interaction regardless of the backend. - Streaming Support: All providers create
generate_completion_stream
for real-time, token-by-token response generation, with proper error (every developer knows this pain) handling and fallback mechanisms. AWS
Bedrock Integration: TheBedrockProvider
class
enables access to foundation models throughAWS
’s infrastructure, supporting both directAPI
calls and LiteLLM integration.- Semantic Search: The vector-RAG system creates similarity search with metadata filtering, allowing for contextually relevant document retrieval.
- Flexible Chunking: Multiple chunking strategies enable optimal document processing for different content types, with configurable chunk sizes and overlap.
- Local and Remote Embeddings: Support for both OpenAI’s
API
and local SentenceTransformers models for generating vector embeddings. - Conversation Management: Comprehensive features for saving, loading, and managing conversation history, with automatic title generation.
- Provider-Specific Settings: Tailored settings for each provider, including specialized options for Ollama’s local models and
AWS
Bedrock configuration.
This comprehensive project shows a sophisticated approach to integrating multiple LLM providers with a RAG system, creating a powerful and flexible chat application with context-aware responses.
Conclusion
The enhancements we’ve made to our multi-provider chat application—streaming responses, RAG capabilities, and AWS
Bedrock integration—transform it from a simple chat interface into a powerful knowledge access tool. These features function together to provide a more responsive, informative. flexible experience for users across a variety of use cases.
By combining the generative capabilities of leading LLMs with the context-awareness of RAG and the real-time feedback of streaming responses, all through a clean and intuitive interface, our application shows the potential of modern AI tools for practical applications.
The complete source code for this project is available on GitHub at https://github.com/RichardHightower/chat, with the RAG createation at https://github.com/SpillwaveSolutions/vector-rag.
Whether you’re building internal tools for your organization, exploring AI capabilities, or just looking for a flexible way to interact with multiple LLM providers, we hope this createation provides a valuable starting point for your own projects.
About the Author
Rick Hightower is a AI specialist with a background in ML and Data Engineering. He has a passion for AI and natural language processing. He has extensive experience in building scalable, distributed systems and is currently focused on AI integration in enterprise applications.
Connect with Rick on LinkedIn or follow their articles on Medium.
TweetApache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting