Part 5: Smart Memory Management
Recap: Parts 1-4 built our AI application from foundation to intelligent conversation memory. Our assistant now remembers context, but unlimited memory can become expensive in production.
Perfect memory sounds great until you realize that a 50-message conversation costs 10x more in tokens than a 5-message conversation. Today we’ll implement smart memory that maintains context while controlling costs.
The Cost Problem
Our current BufferMemory from Part 4 remembers everything perfectly. Great for short chats, problematic for long conversations. Every message adds tokens, and you pay for all of them with each AI call.
Window Memory Solution
BufferWindowMemory keeps only the last N messages—like a sliding window of recent context. This maintains relevant conversation history while automatically forgetting older exchanges.
Create src/services/smartMemoryService.js
:
const { AzureChatOpenAI } = require('@langchain/openai');
const { BufferWindowMemory } = require('@langchain/core/memory');
const { ConversationChain } = require('@langchain/core/chains');
class SmartMemoryService {
constructor() {
this.model = new AzureChatOpenAI({
azureOpenAIApiKey: process.env.AZURE_OPENAI_API_KEY,
azureOpenAIApiInstanceName: this.extractInstanceName(),
azureOpenAIApiDeploymentName: process.env.AZURE_OPENAI_DEPLOYMENT,
azureOpenAIApiVersion: '2023-12-01-preview'
});
// Keep only last 6 exchanges (12 total messages)
this.memory = new BufferWindowMemory({
k: 6,
returnMessages: true,
memoryKey: 'chat_history'
});
this.chain = new ConversationChain({
llm: this.model,
memory: this.memory
});
this.messageCount = 0;
}
extractInstanceName() {
const match = process.env.AZURE_OPENAI_ENDPOINT.match(/https:\/\/(.+?)\.openai\.azure\.com/);
return match ? match[1] : '';
}
async processMessage(userMessage) {
try {
const response = await this.chain.predict({
input: userMessage
});
this.messageCount++;
console.log(`Messages processed: ${this.messageCount}, Memory window: last 6 exchanges`);
return response;
} catch (error) {
console.error('Smart Memory Error:', error);
return 'Sorry, I encountered an issue.';
}
}
clearMemory() {
this.memory.clear();
this.messageCount = 0;
}
getStats() {
return {
messageCount: this.messageCount,
memoryType: 'window',
windowSize: 6
};
}
}
module.exports = new SmartMemoryService();
Adding Smart Memory Route
Update src/app.js
to include the cost-optimized endpoint:
const smartMemoryService = require('./services/smartMemoryService');
// Add this new route
app.post('/chat-smart', async (req, res) => {
const { message, clearHistory } = req.body;
if (clearHistory) {
smartMemoryService.clearMemory();
}
const response = await smartMemoryService.processMessage(message);
const stats = smartMemoryService.getStats();
res.json({
response,
stats
});
});
Testing Smart Memory
Test with a longer conversation to see how it maintains recent context while forgetting older messages:
curl -X POST http://localhost:3000/chat-smart \
-H "Content-Type: application/json" \
-d '{"message": "Hi, I am John and I have a billing question"}'
Continue with 8-10 more messages. Notice how it remembers recent context but forgets the very first messages, keeping costs predictable.
Production Benefits
Window memory provides the sweet spot for customer support:
• Cost Control: Predictable token usage regardless of conversation length
• Relevant Context: Maintains the last 6 exchanges (usually sufficient for support)
• Performance: Faster responses with shorter context windows
What We’ve Built
We now have three AI endpoints:
• /chat
– Basic AI responses (no memory)
• /chat-memory
– Perfect memory (expensive for long conversations)
• /chat-smart
– Production-ready with cost-controlled memory
Series Foundation Complete
You now have a production-ready AI customer support assistant foundation with smart memory management, cost control, and proper error handling. This is suitable for real-world deployment and can handle hundreds of simultaneous conversations efficiently.
Future parts can explore advanced features like prompt engineering, external integrations, and specialized customer support workflows—all building on this solid foundation.