Building AI Applications: Smart Memory Management

Part 5: Smart Memory Management

Recap: Parts 1-4 built our AI application from foundation to intelligent conversation memory. Our assistant now remembers context, but unlimited memory can become expensive in production.

Perfect memory sounds great until you realize that a 50-message conversation costs 10x more in tokens than a 5-message conversation. Today we’ll implement smart memory that maintains context while controlling costs.

The Cost Problem

Our current BufferMemory from Part 4 remembers everything perfectly. Great for short chats, problematic for long conversations. Every message adds tokens, and you pay for all of them with each AI call.

Window Memory Solution

BufferWindowMemory keeps only the last N messages—like a sliding window of recent context. This maintains relevant conversation history while automatically forgetting older exchanges.

Create src/services/smartMemoryService.js:

const { AzureChatOpenAI } = require('@langchain/openai');
const { BufferWindowMemory } = require('@langchain/core/memory');
const { ConversationChain } = require('@langchain/core/chains');

class SmartMemoryService {
  constructor() {
    this.model = new AzureChatOpenAI({
      azureOpenAIApiKey: process.env.AZURE_OPENAI_API_KEY,
      azureOpenAIApiInstanceName: this.extractInstanceName(),
      azureOpenAIApiDeploymentName: process.env.AZURE_OPENAI_DEPLOYMENT,
      azureOpenAIApiVersion: '2023-12-01-preview'
    });

    // Keep only last 6 exchanges (12 total messages)
    this.memory = new BufferWindowMemory({
      k: 6,
      returnMessages: true,
      memoryKey: 'chat_history'
    });

    this.chain = new ConversationChain({
      llm: this.model,
      memory: this.memory
    });

    this.messageCount = 0;
  }

  extractInstanceName() {
    const match = process.env.AZURE_OPENAI_ENDPOINT.match(/https:\/\/(.+?)\.openai\.azure\.com/);
    return match ? match[1] : '';
  }

  async processMessage(userMessage) {
    try {
      const response = await this.chain.predict({
        input: userMessage
      });
      
      this.messageCount++;
      console.log(`Messages processed: ${this.messageCount}, Memory window: last 6 exchanges`);
      
      return response;
    } catch (error) {
      console.error('Smart Memory Error:', error);
      return 'Sorry, I encountered an issue.';
    }
  }

  clearMemory() {
    this.memory.clear();
    this.messageCount = 0;
  }

  getStats() {
    return {
      messageCount: this.messageCount,
      memoryType: 'window',
      windowSize: 6
    };
  }
}

module.exports = new SmartMemoryService();

Adding Smart Memory Route

Update src/app.js to include the cost-optimized endpoint:

const smartMemoryService = require('./services/smartMemoryService');

// Add this new route
app.post('/chat-smart', async (req, res) => {
  const { message, clearHistory } = req.body;
  
  if (clearHistory) {
    smartMemoryService.clearMemory();
  }
  
  const response = await smartMemoryService.processMessage(message);
  const stats = smartMemoryService.getStats();
  
  res.json({ 
    response, 
    stats 
  });
});

Testing Smart Memory

Test with a longer conversation to see how it maintains recent context while forgetting older messages:

curl -X POST http://localhost:3000/chat-smart \
  -H "Content-Type: application/json" \
  -d '{"message": "Hi, I am John and I have a billing question"}'

Continue with 8-10 more messages. Notice how it remembers recent context but forgets the very first messages, keeping costs predictable.

Production Benefits

Window memory provides the sweet spot for customer support:

Cost Control: Predictable token usage regardless of conversation length
Relevant Context: Maintains the last 6 exchanges (usually sufficient for support)
Performance: Faster responses with shorter context windows

What We’ve Built

We now have three AI endpoints:

/chat – Basic AI responses (no memory)
/chat-memory – Perfect memory (expensive for long conversations)
/chat-smart – Production-ready with cost-controlled memory

Series Foundation Complete

You now have a production-ready AI customer support assistant foundation with smart memory management, cost control, and proper error handling. This is suitable for real-world deployment and can handle hundreds of simultaneous conversations efficiently.

Future parts can explore advanced features like prompt engineering, external integrations, and specialized customer support workflows—all building on this solid foundation.

Written by:

178 Posts

View All Posts
Follow Me :