Building AI Applications: Smart Memory Management → Explore with me!

Part 5: Smart Memory Management

Recap: Parts 1-4 built our AI application from foundation to intelligent conversation memory. Our assistant now remembers context, but unlimited memory can become expensive in production.

Perfect memory sounds great until you realize that a 50-message conversation costs 10x more in tokens than a 5-message conversation. Today we’ll implement smart memory that maintains context while controlling costs.

The Cost Problem

Our current BufferMemory from Part 4 remembers everything perfectly. Great for short chats, problematic for long conversations. Every message adds tokens, and you pay for all of them with each AI call.

Window Memory Solution

BufferWindowMemory keeps only the last N messages—like a sliding window of recent context. This maintains relevant conversation history while automatically forgetting older exchanges.

Create src/services/smartMemoryService.js:

const { AzureChatOpenAI } = require('@langchain/openai');
const { BufferWindowMemory } = require('@langchain/core/memory');
const { ConversationChain } = require('@langchain/core/chains');

class SmartMemoryService {
  constructor() {
    this.model = new AzureChatOpenAI({
      azureOpenAIApiKey: process.env.AZURE_OPENAI_API_KEY,
      azureOpenAIApiInstanceName: this.extractInstanceName(),
      azureOpenAIApiDeploymentName: process.env.AZURE_OPENAI_DEPLOYMENT,
      azureOpenAIApiVersion: '2023-12-01-preview'
    });

    // Keep only last 6 exchanges (12 total messages)
    this.memory = new BufferWindowMemory({
      k: 6,
      returnMessages: true,
      memoryKey: 'chat_history'
    });

    this.chain = new ConversationChain({
      llm: this.model,
      memory: this.memory
    });

    this.messageCount = 0;
  }

  extractInstanceName() {
    const match = process.env.AZURE_OPENAI_ENDPOINT.match(/https:\/\/(.+?)\.openai\.azure\.com/);
    return match ? match[1] : '';
  }

  async processMessage(userMessage) {
    try {
      const response = await this.chain.predict({
        input: userMessage
      });
      
      this.messageCount++;
      console.log(`Messages processed: ${this.messageCount}, Memory window: last 6 exchanges`);
      
      return response;
    } catch (error) {
      console.error('Smart Memory Error:', error);
      return 'Sorry, I encountered an issue.';
    }
  }

  clearMemory() {
    this.memory.clear();
    this.messageCount = 0;
  }

  getStats() {
    return {
      messageCount: this.messageCount,
      memoryType: 'window',
      windowSize: 6
    };
  }
}

module.exports = new SmartMemoryService();

Adding Smart Memory Route

Update src/app.js to include the cost-optimized endpoint:

const smartMemoryService = require('./services/smartMemoryService');

// Add this new route
app.post('/chat-smart', async (req, res) => {
  const { message, clearHistory } = req.body;
  
  if (clearHistory) {
    smartMemoryService.clearMemory();
  }
  
  const response = await smartMemoryService.processMessage(message);
  const stats = smartMemoryService.getStats();
  
  res.json({ 
    response, 
    stats 
  });
});

Testing Smart Memory

Test with a longer conversation to see how it maintains recent context while forgetting older messages:

curl -X POST http://localhost:3000/chat-smart \
  -H "Content-Type: application/json" \
  -d '{"message": "Hi, I am John and I have a billing question"}'

Continue with 8-10 more messages. Notice how it remembers recent context but forgets the very first messages, keeping costs predictable.

Production Benefits

Window memory provides the sweet spot for customer support:

• Cost Control: Predictable token usage regardless of conversation length
• Relevant Context: Maintains the last 6 exchanges (usually sufficient for support)
• Performance: Faster responses with shorter context windows

What We’ve Built

We now have three AI endpoints:

• /chat – Basic AI responses (no memory)
• /chat-memory – Perfect memory (expensive for long conversations)
• /chat-smart – Production-ready with cost-controlled memory

Series Foundation Complete

You now have a production-ready AI customer support assistant foundation with smart memory management, cost control, and proper error handling. This is suitable for real-world deployment and can handle hundreds of simultaneous conversations efficiently.

Future parts can explore advanced features like prompt engineering, external integrations, and specialized customer support workflows—all building on this solid foundation.

Building AI Applications: Smart Memory Management

Part 5: Smart Memory Management

The Cost Problem

Window Memory Solution

Adding Smart Memory Route

Testing Smart Memory

Production Benefits

What We’ve Built

Series Foundation Complete

Like this:

You may like

Written by:

Chandan 441 Posts

You May Have Missed

Letter to My Younger Self: You Don’t Have to Work Nights and Weekends

Letter to My Younger Self: It’s Okay to Say No

Letter to My Younger Self: You’re Not a Fraud

Letter to My Younger Self: About Burnout I Didn’t See Coming

Part 5: Smart Memory Management

The Cost Problem

Window Memory Solution

Adding Smart Memory Route

Testing Smart Memory

Production Benefits

What We’ve Built

Series Foundation Complete

Like this:

You may like

Written by:

Chandan 441 Posts

Related Posts

Building Your First MCP Server with Node.js

Sequelize with PostgreSQL in Node.js: Part 4 – Production Ready Patterns

Sequelize with PostgreSQL in Node.js: Part 3 – CRUD Operations and Services

You May Have Missed

Letter to My Younger Self: You Don’t Have to Work Nights and Weekends

Letter to My Younger Self: It’s Okay to Say No

Letter to My Younger Self: You’re Not a Fraud

Letter to My Younger Self: About Burnout I Didn’t See Coming