Building production-ready applications with Claude in Azure AI Foundry requires understanding the Node.js SDK, proper authentication patterns, error handling strategies, and performance optimization techniques. This comprehensive guide walks through creating a complete Node.js application that leverages Claude Sonnet 4.5 for intelligent conversations, code generation, and complex reasoning tasks.
Parts 1 and 2 of this series covered strategic overview and deployment fundamentals. Part 3 focuses entirely on practical Node.js implementation, providing production-ready code examples that you can adapt for your specific use cases.
Environment Setup
Proper environment configuration ensures smooth development and production deployments. We will set up a TypeScript-based Node.js project with all necessary dependencies and configuration files.
Prerequisites
Ensure you have Node.js 18 LTS or later installed. TypeScript 4.5 or higher is required for type safety. You should have completed the deployment steps from Part 2, with at least one Claude model deployed in Azure AI Foundry.
Project Initialization
Create a new Node.js project and install required dependencies:
# Create project directory
mkdir claude-azure-app
cd claude-azure-app
# Initialize Node.js project
npm init -y
# Install Anthropic Foundry SDK
npm install @anthropic-ai/foundry-sdk
# Install Azure Identity for Entra ID auth
npm install @azure/identity
# Install TypeScript and type definitions
npm install --save-dev typescript @types/node
# Install dotenv for environment variables
npm install dotenv
# Install development tools
npm install --save-dev tsx nodemonTypeScript Configuration
Create a tsconfig.json file for TypeScript compilation settings:
{
"compilerOptions": {
"target": "ES2022",
"module": "NodeNext",
"moduleResolution": "NodeNext",
"lib": ["ES2022"],
"outDir": "./dist",
"rootDir": "./src",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"resolveJsonModule": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}Environment Variables
Create a .env file to store configuration. Never commit this file to source control:
# Azure Foundry Resource Configuration
AZURE_FOUNDRY_RESOURCE=your-resource-name
AZURE_FOUNDRY_BASE_URL=https://your-resource-name.services.ai.azure.com
# API Key Authentication (Option 1)
ANTHROPIC_FOUNDRY_API_KEY=your-api-key-here
# Entra ID Authentication (Option 2)
# These are automatically detected by DefaultAzureCredential
# AZURE_CLIENT_ID=your-client-id
# AZURE_TENANT_ID=your-tenant-id
# AZURE_CLIENT_SECRET=your-client-secret
# Model Configuration
DEFAULT_MODEL=claude-sonnet-4-5
MAX_TOKENS=4096Create a .env.example file with dummy values to commit to source control, showing required configuration without exposing secrets.
Package Scripts
Update package.json with helpful development scripts:
{
"name": "claude-azure-app",
"version": "1.0.0",
"type": "module",
"scripts": {
"dev": "tsx watch src/index.ts",
"build": "tsc",
"start": "node dist/index.js",
"type-check": "tsc --noEmit"
}
}Basic Chat Implementation
Let’s build a basic chat implementation that demonstrates core SDK usage. This example uses API key authentication for simplicity.
Simple Chat Example
Create src/index.ts with a basic chat implementation:
import { AnthropicFoundry } from '@anthropic-ai/foundry-sdk';
import dotenv from 'dotenv';
// Load environment variables
dotenv.config();
// Initialize client with API key
const client = new AnthropicFoundry({
apiKey: process.env.ANTHROPIC_FOUNDRY_API_KEY,
baseURL: `${process.env.AZURE_FOUNDRY_BASE_URL}/anthropic`,
});
async function simpleChat(userMessage: string) {
try {
const response = await client.messages.create({
model: process.env.DEFAULT_MODEL || 'claude-sonnet-4-5',
max_tokens: parseInt(process.env.MAX_TOKENS || '1024'),
messages: [
{
role: 'user',
content: userMessage,
},
],
});
// Extract text content from response
const textContent = response.content
.filter((block) => block.type === 'text')
.map((block) => block.text)
.join('\n');
console.log('Claude:', textContent);
console.log('\nToken usage:', response.usage);
return textContent;
} catch (error) {
console.error('Error:', error);
throw error;
}
}
// Test the function
simpleChat('Explain quantum computing in 3 sentences.');Run the example with npm run dev. You should see Claude’s response and token usage statistics.
Understanding the Response Structure
Claude API responses include multiple components. The content array contains response blocks of different types (text, tool use, etc.). The usage object provides token consumption metrics including input tokens, output tokens, and cache statistics. The model field confirms which deployment processed the request. The stop reason indicates why generation ended (end_turn, max_tokens, stop_sequence).
Entra ID Authentication
For production deployments, Microsoft Entra ID provides superior security compared to API keys. Here’s how to implement it:
import { AnthropicFoundry } from '@anthropic-ai/foundry-sdk';
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import dotenv from 'dotenv';
dotenv.config();
// Create Azure credential
const credential = new DefaultAzureCredential();
// Get token provider for AI Foundry scope
const scope = 'https://ai.azure.com/.default';
const azureADTokenProvider = getBearerTokenProvider(credential, scope);
// Initialize client with Entra ID
const client = new AnthropicFoundry({
azureADTokenProvider,
baseURL: `${process.env.AZURE_FOUNDRY_BASE_URL}/anthropic`,
});
async function authenticatedChat(userMessage: string) {
const response = await client.messages.create({
model: process.env.DEFAULT_MODEL || 'claude-sonnet-4-5',
max_tokens: 1024,
messages: [{ role: 'user', content: userMessage }],
});
return response.content
.filter((block) => block.type === 'text')
.map((block) => block.text)
.join('\n');
}
export { authenticatedChat };DefaultAzureCredential automatically discovers credentials from environment variables, managed identity, Azure CLI, Visual Studio, and other sources in a specific order. This makes the code portable across development and production environments without changes.
Multi-Turn Conversations
Real applications require multi-turn conversations that maintain context. Here’s a conversation manager implementation:
import { AnthropicFoundry } from '@anthropic-ai/foundry-sdk';
import type { MessageParam } from '@anthropic-ai/foundry-sdk/resources/messages';
interface ConversationOptions {
systemPrompt?: string;
maxTokens?: number;
}
class ConversationManager {
private client: AnthropicFoundry;
private messages: MessageParam[] = [];
private systemPrompt?: string;
private maxTokens: number;
private model: string;
constructor(
client: AnthropicFoundry,
options: ConversationOptions = {}
) {
this.client = client;
this.systemPrompt = options.systemPrompt;
this.maxTokens = options.maxTokens || 1024;
this.model = process.env.DEFAULT_MODEL || 'claude-sonnet-4-5';
}
async sendMessage(userMessage: string): Promise {
// Add user message to conversation history
this.messages.push({
role: 'user',
content: userMessage,
});
// Call Claude API
const response = await this.client.messages.create({
model: this.model,
max_tokens: this.maxTokens,
system: this.systemPrompt,
messages: this.messages,
});
// Extract assistant response
const assistantMessage = response.content
.filter((block) => block.type === 'text')
.map((block) => block.text)
.join('\n');
// Add assistant response to conversation history
this.messages.push({
role: 'assistant',
content: assistantMessage,
});
return assistantMessage;
}
getHistory(): MessageParam[] {
return [...this.messages];
}
clearHistory(): void {
this.messages = [];
}
getTokenCount(): number {
// Estimate token count (rough approximation)
const allText = this.messages
.map((m) =>
typeof m.content === 'string'
? m.content
: m.content.map((c) =>
c.type === 'text' ? c.text : ''
).join('')
)
.join('');
// Rough estimate: 4 characters per token
return Math.ceil(allText.length / 4);
}
}
// Usage example
async function conversationExample() {
const client = new AnthropicFoundry({
apiKey: process.env.ANTHROPIC_FOUNDRY_API_KEY,
baseURL: `${process.env.AZURE_FOUNDRY_BASE_URL}/anthropic`,
});
const conversation = new ConversationManager(client, {
systemPrompt: 'You are a helpful coding assistant specializing in TypeScript and Node.js.',
maxTokens: 2048,
});
// Multi-turn conversation
const response1 = await conversation.sendMessage(
'How do I implement error handling in async functions?'
);
console.log('Claude:', response1);
const response2 = await conversation.sendMessage(
'Can you show me a practical example?'
);
console.log('Claude:', response2);
// Check conversation length
console.log('Estimated tokens:', conversation.getTokenCount());
}
export { ConversationManager }; Streaming Responses
For real-time user experiences, streaming provides incremental response delivery. This dramatically improves perceived latency for long responses:
import { AnthropicFoundry } from '@anthropic-ai/foundry-sdk';
async function streamingChat(client: AnthropicFoundry, userMessage: string) {
const stream = await client.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 2048,
messages: [{ role: 'user', content: userMessage }],
stream: true,
});
console.log('Claude: ');
let fullResponse = '';
for await (const event of stream) {
if (event.type === 'content_block_delta') {
if (event.delta.type === 'text_delta') {
process.stdout.write(event.delta.text);
fullResponse += event.delta.text;
}
}
if (event.type === 'message_stop') {
console.log('\n\nStream completed.');
}
}
return fullResponse;
}
// Advanced streaming with event handlers
async function advancedStreaming(client: AnthropicFoundry, userMessage: string) {
const stream = client.messages
.stream({
model: 'claude-sonnet-4-5',
max_tokens: 2048,
messages: [{ role: 'user', content: userMessage }],
})
.on('text', (text) => {
process.stdout.write(text);
})
.on('message', (message) => {
console.log('\n\nFull message:', message);
})
.on('error', (error) => {
console.error('Stream error:', error);
});
// Wait for completion
const finalMessage = await stream.finalMessage();
console.log('\n\nToken usage:', finalMessage.usage);
return finalMessage;
}The streaming API provides two approaches. The low-level approach iterates through events manually, giving fine-grained control. The high-level approach uses event handlers for cleaner code and automatic message accumulation.
Error Handling and Retry Logic
Production applications require robust error handling. Here’s a comprehensive error handling wrapper:
import { AnthropicFoundry } from '@anthropic-ai/foundry-sdk';
import type { MessageCreateParams } from '@anthropic-ai/foundry-sdk/resources/messages';
interface RetryOptions {
maxRetries?: number;
baseDelay?: number;
maxDelay?: number;
}
class ClaudeClient {
private client: AnthropicFoundry;
private retryOptions: Required;
constructor(client: AnthropicFoundry, retryOptions: RetryOptions = {}) {
this.client = client;
this.retryOptions = {
maxRetries: retryOptions.maxRetries || 3,
baseDelay: retryOptions.baseDelay || 1000,
maxDelay: retryOptions.maxDelay || 10000,
};
}
private async sleep(ms: number): Promise {
return new Promise((resolve) => setTimeout(resolve, ms));
}
private calculateBackoff(attempt: number): number {
const delay = this.retryOptions.baseDelay * Math.pow(2, attempt);
return Math.min(delay, this.retryOptions.maxDelay);
}
private isRetryableError(error: any): boolean {
// Retry on rate limits and temporary server errors
if (error.status === 429) return true; // Rate limit
if (error.status >= 500) return true; // Server errors
if (error.code === 'ECONNRESET') return true; // Network errors
if (error.code === 'ETIMEDOUT') return true; // Timeout errors
return false;
}
async createMessage(
params: MessageCreateParams,
attempt: number = 0
): Promise {
try {
const response = await this.client.messages.create(params);
return response;
} catch (error: any) {
// Log error details
console.error(`API error (attempt ${attempt + 1}):`, {
status: error.status,
code: error.code,
message: error.message,
});
// Check if we should retry
if (
this.isRetryableError(error) &&
attempt < this.retryOptions.maxRetries
) {
const delay = this.calculateBackoff(attempt);
console.log(`Retrying in ${delay}ms...`);
await this.sleep(delay);
return this.createMessage(params, attempt + 1);
}
// Handle specific error types
if (error.status === 401) {
throw new Error(
'Authentication failed. Check your API key or Entra ID credentials.'
);
}
if (error.status === 429) {
throw new Error(
'Rate limit exceeded. Consider implementing request queuing or requesting quota increase.'
);
}
if (error.status === 404) {
throw new Error(
'Model deployment not found. Verify deployment name and region.'
);
}
// Re-throw original error
throw error;
}
}
}
// Usage
async function robustChatExample() {
const baseClient = new AnthropicFoundry({
apiKey: process.env.ANTHROPIC_FOUNDRY_API_KEY,
baseURL: `${process.env.AZURE_FOUNDRY_BASE_URL}/anthropic`,
});
const client = new ClaudeClient(baseClient, {
maxRetries: 3,
baseDelay: 1000,
maxDelay: 10000,
});
try {
const response = await client.createMessage({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
messages: [
{
role: 'user',
content: 'Explain the benefits of TypeScript.',
},
],
});
console.log('Success:', response.content[0].text);
} catch (error) {
console.error('Final error:', error);
}
}
export { ClaudeClient }; Cost Optimization with Prompt Caching
Prompt caching can reduce costs by up to 90% for applications with repeated context. Here's how to implement it effectively:
import { AnthropicFoundry } from '@anthropic-ai/foundry-sdk';
async function cachedContextChat(
client: AnthropicFoundry,
largeContext: string,
userQuery: string
) {
const response = await client.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
system: [
{
type: 'text',
text: 'You are a helpful assistant analyzing the provided documentation.',
},
{
type: 'text',
text: largeContext,
cache_control: { type: 'ephemeral' }, // Cache this block
},
],
messages: [
{
role: 'user',
content: userQuery,
},
],
});
// Check cache usage
console.log('Cache statistics:', {
cacheCreationTokens: response.usage.cache_creation_input_tokens || 0,
cacheReadTokens: response.usage.cache_read_input_tokens || 0,
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
});
return response.content[0].text;
}
// Example: Documentation Q&A with caching
class DocumentationAssistant {
private client: AnthropicFoundry;
private documentation: string;
constructor(client: AnthropicFoundry, documentation: string) {
this.client = client;
this.documentation = documentation;
}
async ask(question: string): Promise {
const response = await this.client.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 2048,
system: [
{
type: 'text',
text: 'You are a documentation expert. Answer questions based on the provided documentation.',
},
{
type: 'text',
text: `Documentation:\n\n${this.documentation}`,
cache_control: { type: 'ephemeral' },
},
],
messages: [{ role: 'user', content: question }],
});
return response.content[0].text;
}
}
export { cachedContextChat, DocumentationAssistant }; Cache control markers indicate which content blocks should be cached. The first request with new cache content pays cache write costs (1.25x for 5-minute cache, 2x for 1-hour cache). Subsequent requests within the cache TTL pay only 0.1x for cache reads, providing 90% cost savings on repeated context.
Complete Production Application
Here's a complete example combining all best practices into a production-ready application:
import { AnthropicFoundry } from '@anthropic-ai/foundry-sdk';
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import dotenv from 'dotenv';
dotenv.config();
interface AppConfig {
useEntraID: boolean;
enableCaching: boolean;
enableRetry: boolean;
maxRetries: number;
}
class ClaudeApp {
private client: AnthropicFoundry;
private config: AppConfig;
constructor(config: Partial = {}) {
this.config = {
useEntraID: config.useEntraID ?? true,
enableCaching: config.enableCaching ?? true,
enableRetry: config.enableRetry ?? true,
maxRetries: config.maxRetries ?? 3,
};
this.client = this.initializeClient();
}
private initializeClient(): AnthropicFoundry {
const baseURL = `${process.env.AZURE_FOUNDRY_BASE_URL}/anthropic`;
if (this.config.useEntraID) {
const credential = new DefaultAzureCredential();
const scope = 'https://ai.azure.com/.default';
const azureADTokenProvider = getBearerTokenProvider(credential, scope);
return new AnthropicFoundry({
azureADTokenProvider,
baseURL,
});
} else {
return new AnthropicFoundry({
apiKey: process.env.ANTHROPIC_FOUNDRY_API_KEY,
baseURL,
});
}
}
async chat(
userMessage: string,
options: {
systemPrompt?: string;
maxTokens?: number;
stream?: boolean;
} = {}
): Promise {
const params = {
model: process.env.DEFAULT_MODEL || 'claude-sonnet-4-5',
max_tokens: options.maxTokens || 1024,
system: options.systemPrompt,
messages: [{ role: 'user' as const, content: userMessage }],
};
if (options.stream) {
return this.streamingChat(params);
}
const response = await this.client.messages.create(params);
return response.content
.filter((block) => block.type === 'text')
.map((block) => block.text)
.join('\n');
}
private async streamingChat(params: any): Promise {
let fullResponse = '';
const stream = await this.client.messages.create({
...params,
stream: true,
});
for await (const event of stream) {
if (event.type === 'content_block_delta') {
if (event.delta.type === 'text_delta') {
process.stdout.write(event.delta.text);
fullResponse += event.delta.text;
}
}
}
console.log('\n');
return fullResponse;
}
}
// Application entry point
async function main() {
const app = new ClaudeApp({
useEntraID: true,
enableCaching: true,
enableRetry: true,
});
try {
// Example 1: Simple chat
const response1 = await app.chat(
'Write a TypeScript function to calculate factorial.'
);
console.log('Response:', response1);
// Example 2: Streaming chat
console.log('\nStreaming response:');
await app.chat('Explain async/await in JavaScript.', {
stream: true,
maxTokens: 2048,
});
// Example 3: Custom system prompt
const response3 = await app.chat(
'How do I optimize database queries?',
{
systemPrompt: 'You are a database optimization expert specializing in PostgreSQL.',
maxTokens: 2048,
}
);
console.log('Expert response:', response3);
} catch (error) {
console.error('Application error:', error);
process.exit(1);
}
}
// Run if executed directly
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
export { ClaudeApp }; Testing and Debugging
Proper testing ensures reliability. Here's a testing setup using Jest:
# Install testing dependencies
npm install --save-dev jest @types/jest ts-jest
# Create Jest configuration
npx ts-jest config:initCreate a test file src/claude.test.ts:
import { ClaudeApp } from './index';
describe('ClaudeApp', () => {
let app: ClaudeApp;
beforeAll(() => {
app = new ClaudeApp({
useEntraID: false, // Use API key for testing
});
});
test('should respond to simple question', async () => {
const response = await app.chat('What is 2+2?');
expect(response).toBeTruthy();
expect(response.length).toBeGreaterThan(0);
}, 30000); // 30 second timeout
test('should handle system prompts', async () => {
const response = await app.chat('Tell me about cats', {
systemPrompt: 'You are a veterinarian.',
});
expect(response).toBeTruthy();
}, 30000);
});Performance Monitoring
Track API performance and costs with custom monitoring:
interface Metrics {
totalRequests: number;
totalInputTokens: number;
totalOutputTokens: number;
totalCost: number;
averageLatency: number;
}
class MetricsTracker {
private metrics: Metrics = {
totalRequests: 0,
totalInputTokens: 0,
totalOutputTokens: 0,
totalCost: 0,
averageLatency: 0,
};
private latencies: number[] = [];
trackRequest(
inputTokens: number,
outputTokens: number,
latencyMs: number
): void {
this.metrics.totalRequests++;
this.metrics.totalInputTokens += inputTokens;
this.metrics.totalOutputTokens += outputTokens;
// Calculate cost (Sonnet 4.5 pricing)
const inputCost = (inputTokens / 1_000_000) * 3;
const outputCost = (outputTokens / 1_000_000) * 15;
this.metrics.totalCost += inputCost + outputCost;
// Track latency
this.latencies.push(latencyMs);
this.metrics.averageLatency =
this.latencies.reduce((a, b) => a + b, 0) / this.latencies.length;
}
getMetrics(): Metrics {
return { ...this.metrics };
}
reset(): void {
this.metrics = {
totalRequests: 0,
totalInputTokens: 0,
totalOutputTokens: 0,
totalCost: 0,
averageLatency: 0,
};
this.latencies = [];
}
}
export { MetricsTracker };Conclusion
This comprehensive guide covered building production-ready Node.js applications with Claude in Azure AI Foundry. We explored environment setup, basic and advanced chat implementations, Entra ID authentication, multi-turn conversations, streaming responses, error handling with retry logic, cost optimization through prompt caching, and complete application examples.
The patterns and code examples provided form a solid foundation for building sophisticated AI applications. In Part 4, we will explore Python implementation, demonstrating how to leverage Claude's capabilities using Python's rich ecosystem and async capabilities.
