Prompt engineering has evolved from simple question-and-answer interactions to sophisticated patterns that unlock Claude’s full potential in Azure AI Foundry. As enterprises deploy Claude at scale, mastering advanced prompt patterns becomes critical for achieving consistent, high-quality results while optimizing costs and latency.
This comprehensive guide explores production-ready prompt engineering techniques specifically designed for Claude in Microsoft Foundry, covering extended thinking, structured outputs, context management, and multi-turn workflows that drive real business value.
Understanding Claude’s Reasoning Architecture in Azure
Claude in Microsoft Foundry brings Constitutional AI principles combined with Azure’s enterprise infrastructure. The model’s architecture enables sophisticated reasoning through extended thinking capabilities, allowing it to work through complex problems step-by-step before producing outputs.
Extended thinking represents a fundamental shift in how language models approach problem-solving. Rather than generating immediate responses, Claude can allocate computational resources to internal reasoning processes, exploring multiple approaches and validating solutions before presenting final outputs.
Extended Thinking Capabilities
Extended thinking in Azure AI Foundry allows Claude to use up to 128K tokens for internal reasoning on complex tasks. This capability is particularly valuable for coding, mathematical reasoning, multi-step problem solving, and agentic workflows where accuracy matters more than immediate response time.
The system automatically manages thinking tokens separately from output tokens, enabling you to optimize costs while maintaining high-quality results. For production deployments, understanding when to enable extended thinking versus standard mode becomes crucial for balancing performance and economics.
Core Prompt Engineering Patterns
Pattern 1: XML-Structured Prompts
XML tags provide Claude with clear structural boundaries, improving parsing accuracy and enabling complex nested instructions. This pattern is particularly effective for enterprise scenarios requiring precise control over input processing and output formatting.
const { AnthropicFoundry } = require("@anthropic-ai/foundry-sdk");
const client = new AnthropicFoundry({
foundryResource: process.env.ANTHROPIC_FOUNDRY_RESOURCE
});
async function analyzeFinancialDocument(document, regulations) {
const prompt = `Analyze the following financial document for compliance.
<document>
${document}
</document>
<regulations>
${regulations}
</regulations>
<instructions>
1. Identify all regulatory requirements mentioned in the regulations section
2. For each requirement, check if the document provides sufficient evidence of compliance
3. Flag any potential violations or ambiguities
4. Provide recommendations for addressing gaps
</instructions>
<output_format>
Return your analysis in this structure:
<compliance_analysis>
<requirement id="...">
<status>compliant|non_compliant|unclear</status>
<evidence>...</evidence>
<recommendation>...</recommendation>
</requirement>
</compliance_analysis>
</output_format>`;
const response = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 4000,
thinking: {
type: "enabled",
budget_tokens: 10000
},
messages: [{
role: "user",
content: prompt
}]
});
return response.content[0].text;
}
module.exports = { analyzeFinancialDocument };Pattern 2: Chain-of-Thought with Verification
Chain-of-thought prompting combined with explicit verification steps dramatically improves accuracy for complex reasoning tasks. This pattern instructs Claude to show its work and validate conclusions before finalizing outputs.
using Azure.AI.Foundry;
using Azure.Identity;
public class ChainOfThoughtAnalyzer
{
private readonly AnthropicFoundryClient _client;
public ChainOfThoughtAnalyzer(string resourceName)
{
_client = new AnthropicFoundryClient(
resourceName,
new DefaultAzureCredential());
}
public async Task<string> AnalyzeBusinessDecisionAsync(
string scenario,
string constraints,
string objectives)
{
var prompt = $@"I need you to analyze a business decision thoroughly.
<scenario>
{scenario}
</scenario>
<constraints>
{constraints}
</constraints>
<objectives>
{objectives}
</objectives>
Please think through this decision step by step:
1. First, identify all stakeholders and their interests
2. List potential options and their implications
3. Evaluate each option against the constraints
4. Score options against objectives
5. Identify risks and mitigation strategies
6. Make a recommendation with justification
After your analysis, verify your recommendation by:
- Checking if it satisfies all hard constraints
- Confirming alignment with top objectives
- Ensuring risks are adequately addressed
Show your complete reasoning process, then provide your final recommendation.";
var response = await _client.CreateMessageAsync(
model: "claude-sonnet-4-5",
maxTokens: 8000,
thinking: new ThinkingConfig
{
Type = ThinkingType.Enabled,
BudgetTokens = 20000
},
messages: new[]
{
new Message
{
Role = "user",
Content = prompt
}
});
return response.Content[0].Text;
}
}Pattern 3: Few-Shot Learning with Examples
Providing concrete examples helps Claude understand exactly what you expect. This pattern is especially effective for specialized domains, consistent formatting requirements, and scenarios where nuanced judgment is required.
from anthropic_foundry import AnthropicFoundry
client = AnthropicFoundry(
foundry_resource=os.environ["ANTHROPIC_FOUNDRY_RESOURCE"]
)
def classify_support_ticket(ticket_text: str) -> dict:
prompt = f"""Classify this support ticket into category, priority, and required expertise.
Here are examples of correct classifications:
Example 1:
Ticket: "Cannot access my account after password reset. Getting error 500."
Classification:
Category: Authentication
Priority: High
Expertise: Backend Engineering
Reasoning: Authentication failures block user access and may indicate system issue
Example 2:
Ticket: "Would like to change my email preferences for newsletters."
Classification:
Category: Account Settings
Priority: Low
Expertise: Customer Support
Reasoning: Non-urgent preference change, no system impact
Example 3:
Ticket: "Database query returning null for customer ID 12345 in production."
Classification:
Category: Data Integrity
Priority: Critical
Expertise: Database Engineering
Reasoning: Production data issue affecting specific customer requires immediate attention
Now classify this ticket:
Ticket: {ticket_text}
Provide classification in the same format, with clear reasoning."""
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1000,
thinking={{
"type": "enabled",
"budget_tokens": 5000
}},
messages=[{{
"role": "user",
"content": prompt
}}]
)
return parse_classification(response.content[0].text)
def parse_classification(response_text: str) -> dict:
# Implementation to extract structured data
passAdvanced Patterns for Production Systems
Pattern 4: Prefilled Responses for Consistency
Prefilling Claude’s response establishes the format and tone immediately, ensuring consistency across outputs. This technique is valuable for structured data generation, API-like interactions, and maintaining brand voice.
const generateStructuredReport = async (data) => {
const messages = [
{
role: "user",
content: `Generate a financial summary report for this data: ${JSON.stringify(data)}`
},
{
role: "assistant",
content: `{
"report_type": "financial_summary",
"generated_at": "${new Date().toISOString()}",
"sections": [`
}
];
const response = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 3000,
messages: messages
});
// Claude continues from the prefilled structure
const fullResponse = messages[1].content + response.content[0].text;
return JSON.parse(fullResponse);
};Pattern 5: Multi-Turn Context Management
For long-running conversations or agent workflows, proper context management prevents token bloat while maintaining coherence. This pattern implements rolling context windows with intelligent summarization.
public class ContextManagedAgent
{
private List<Message> _conversationHistory = new();
private const int MaxContextTokens = 150000;
private readonly AnthropicFoundryClient _client;
public async Task<string> ProcessMessageAsync(string userMessage)
{
_conversationHistory.Add(new Message
{
Role = "user",
Content = userMessage
});
// Check context size and compress if needed
if (EstimateTokenCount(_conversationHistory) > MaxContextTokens)
{
await CompressHistoryAsync();
}
var response = await _client.CreateMessageAsync(
model: "claude-sonnet-4-5",
maxTokens: 4000,
messages: _conversationHistory.ToArray()
);
_conversationHistory.Add(new Message
{
Role: "assistant",
Content = response.Content[0].Text
});
return response.Content[0].Text;
}
private async Task CompressHistoryAsync()
{
// Keep system prompt and recent messages, summarize middle
var systemMessage = _conversationHistory[0];
var recentMessages = _conversationHistory.TakeLast(10).ToList();
var middleMessages = _conversationHistory.Skip(1)
.Take(_conversationHistory.Count - 11).ToList();
if (middleMessages.Count > 0)
{
var summary = await SummarizeConversationAsync(middleMessages);
_conversationHistory = new List<Message>
{
systemMessage,
new Message
{
Role = "user",
Content = $"<conversation_summary>{summary}</conversation_summary>"
}
};
_conversationHistory.AddRange(recentMessages);
}
}
private async Task<string> SummarizeConversationAsync(
List<Message> messages)
{
var conversationText = string.Join("\n\n",
messages.Select(m => $"{m.Role}: {m.Content}"));
var summary = await _client.CreateMessageAsync(
model: "claude-haiku-4-5", // Use faster model for summarization
maxTokens: 1000,
messages: new[]
{
new Message
{
Role = "user",
Content = $@"Summarize this conversation concisely,
preserving key facts, decisions, and context:
{conversationText}"
}
});
return summary.Content[0].Text;
}
private int EstimateTokenCount(List<Message> messages)
{
// Rough estimation: 4 characters per token
return messages.Sum(m => m.Content.Length) / 4;
}
}Pattern 6: Prompt Caching for Repeated Context
Prompt caching in Azure AI Foundry dramatically reduces costs and latency when reusing large context blocks like documentation, code bases, or policy documents. Cache blocks persist for 5 minutes, making this ideal for interactive applications.
async def analyze_code_with_standards(code: str, coding_standards: str):
# Load large coding standards document once
system_prompt = f"""You are a code reviewer following these standards:
<coding_standards cache_control="ephemeral">
{coding_standards}
</coding_standards>
Review code for compliance with these standards."""
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=3000,
system=[
{{
"type": "text",
"text": system_prompt,
"cache_control": {{"type": "ephemeral"}}
}}
],
messages=[
{{
"role": "user",
"content": f"Review this code:\n\n```\n{code}\n```"
}}
]
)
# Subsequent calls with same standards will use cache
# Reducing latency by 80%+ and cost by 90%+
return response.content[0].textExtended Thinking Optimization
Extended thinking is Claude’s most powerful reasoning capability but requires careful configuration to balance quality, cost, and latency.
When to Enable Extended Thinking
- Complex coding tasks requiring architecture decisions
- Mathematical proofs or multi-step calculations
- Strategic analysis with multiple trade-offs
- Agentic workflows with tool chaining
- Creative tasks requiring exploration of alternatives
When to Use Standard Mode
- Simple classification or extraction tasks
- Formatting or transformation operations
- Lookups against provided context
- High-throughput, latency-sensitive applications
Configuring Thinking Budgets
const configureThinkingForTask = (taskComplexity) => {
const configs = {
simple: {
type: "enabled",
budget_tokens: 2000 // Minimal thinking for straightforward tasks
},
moderate: {
type: "enabled",
budget_tokens: 10000 // Standard for most business logic
},
complex: {
type: "enabled",
budget_tokens: 50000 // Deep reasoning for critical decisions
},
maximum: {
type: "enabled",
budget_tokens: 128000 // Full capacity for hardest problems
}
};
return configs[taskComplexity] || configs.moderate;
};
// Usage
const response = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 4000,
thinking: configureThinkingForTask("complex"),
messages: [{ role: "user", content: complexProblem }]
});Production Deployment Patterns
Pattern 7: Fallback and Retry Logic
Production systems require robust error handling and graceful degradation when API calls fail or rate limits are hit.
public class ResilientClaudeClient
{
private readonly AnthropicFoundryClient _primaryClient;
private readonly AnthropicFoundryClient _fallbackClient;
public async Task<string> ExecuteWithRetryAsync(
string prompt,
int maxRetries = 3)
{
var attempt = 0;
var baseDelay = TimeSpan.FromSeconds(1);
while (attempt < maxRetries)
{
try
{
var response = await _primaryClient.CreateMessageAsync(
model: "claude-sonnet-4-5",
maxTokens: 4000,
messages: new[] { new Message
{
Role = "user",
Content = prompt
}}
);
return response.Content[0].Text;
}
catch (RateLimitException ex)
{
attempt++;
if (attempt >= maxRetries)
throw;
// Exponential backoff
var delay = baseDelay * Math.Pow(2, attempt);
await Task.Delay(delay);
}
catch (ServiceException ex) when (ex.StatusCode >= 500)
{
// Try fallback model for server errors
return await TryFallbackModelAsync(prompt);
}
}
throw new Exception("Max retries exceeded");
}
private async Task<string> TryFallbackModelAsync(string prompt)
{
// Fallback to Haiku for better availability
var response = await _fallbackClient.CreateMessageAsync(
model: "claude-haiku-4-5",
maxTokens: 4000,
messages: new[] { new Message
{
Role = "user",
Content = prompt
}}
);
return response.Content[0].Text;
}
}Pattern 8: Response Validation and Safety
Enterprise deployments require validation of Claude’s outputs before using them in business processes or showing them to users.
async function validateAndUseResponse(prompt, validationRules) {
const response = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 4000,
messages: [{ role: "user", content: prompt }]
});
const output = response.content[0].text;
// Validate output meets requirements
const validation = {
hasRequiredFields: checkRequiredFields(output, validationRules.fields),
followsFormat: validateFormat(output, validationRules.format),
passesContentFilter: await checkContentSafety(output),
meetsLengthRequirements: checkLength(output, validationRules.length)
};
if (!Object.values(validation).every(v => v === true)) {
// Log validation failure
console.error("Response validation failed:", validation);
// Retry with more explicit instructions
const refinedPrompt = `${prompt}\n\nIMPORTANT: ${generateValidationPrompt(validation)}`;
return validateAndUseResponse(refinedPrompt, validationRules);
}
return output;
}
async function checkContentSafety(text) {
// Integrate with Azure Content Safety
const contentSafetyClient = new ContentSafetyClient(
endpoint,
new DefaultAzureCredential()
);
const result = await contentSafetyClient.analyzeText(text);
return result.categoriesAnalysis.every(c => c.severity < 2);
}Monitoring and Optimization
Production prompt engineering requires continuous monitoring and optimization based on actual usage patterns.
Key Metrics to Track
- Token usage distribution (input vs output vs thinking)
- Latency percentiles (p50, p95, p99)
- Cache hit rates for prompt caching
- Retry rates and failure modes
- Output validation failure rates
- Cost per successful completion
class PromptMetricsCollector:
def __init__(self):
self.metrics = {
"total_requests": 0,
"token_usage": {"input": 0, "output": 0, "thinking": 0},
"latency_samples": [],
"cache_hits": 0,
"validation_failures": 0
}
def record_completion(self, response, latency_ms, cache_hit,
validation_passed):
self.metrics["total_requests"] += 1
usage = response.usage
self.metrics["token_usage"]["input"] += usage.input_tokens
self.metrics["token_usage"]["output"] += usage.output_tokens
if hasattr(usage, "thinking_tokens"):
self.metrics["token_usage"]["thinking"] += usage.thinking_tokens
self.metrics["latency_samples"].append(latency_ms)
if cache_hit:
self.metrics["cache_hits"] += 1
if not validation_passed:
self.metrics["validation_failures"] += 1
def get_summary(self):
return {
"total_requests": self.metrics["total_requests"],
"avg_latency_ms": np.mean(self.metrics["latency_samples"]),
"p95_latency_ms": np.percentile(
self.metrics["latency_samples"], 95
),
"cache_hit_rate": (
self.metrics["cache_hits"] /
self.metrics["total_requests"]
),
"validation_success_rate": (
1 - (self.metrics["validation_failures"] /
self.metrics["total_requests"])
),
"token_distribution": self.metrics["token_usage"]
}Best Practices Summary
- Start with Clear Instructions: Be explicit about what you want. Claude performs best with detailed, unambiguous prompts.
- Use Structure Appropriately: XML tags help Claude understand complex inputs, but overuse adds unnecessary tokens.
- Enable Thinking Strategically: Extended thinking improves quality on complex tasks but adds cost and latency.
- Leverage Examples: Few-shot learning with quality examples dramatically improves consistency.
- Implement Caching: For repeated context like documentation or policies, prompt caching reduces costs by 90%.
- Validate Outputs: Never trust LLM outputs blindly in production systems.
- Monitor and Iterate: Track metrics and continuously refine prompts based on real performance.
- Handle Failures Gracefully: Implement retries, fallbacks, and error handling for production reliability.
Conclusion
Advanced prompt engineering patterns transform Claude from a helpful assistant into a production-grade reasoning engine. By mastering extended thinking, structured prompts, context management, and validation patterns, you can build enterprise applications that consistently deliver high-quality results at scale.
The key to success is matching prompt patterns to your specific use case, monitoring performance metrics, and iterating based on real-world results. Start with simple patterns, validate their effectiveness, then gradually introduce more sophisticated techniques as your requirements evolve.
References
- Microsoft Learn - Deploy and use Claude models in Microsoft Foundry
- Anthropic - Extended thinking tips
- Anthropic - Prompt engineering best practices
- Anthropic - Prompt engineering for business performance
- Microsoft Azure Blog - Introducing Anthropic's Claude models in Microsoft Foundry
- Anthropic - Claude in Microsoft Foundry
- Netguru - Azure Prompt Engineering Best Practices
