Azure AI Foundry with Anthropic Claude Part 2: Deployment Fundamentals – Complete Step-by-Step Guide → Explore with me!

Deploying Claude models in Azure AI Foundry is straightforward once you understand the platform’s architecture and requirements. This guide provides step-by-step instructions for creating Azure AI Foundry resources, deploying Claude models, configuring authentication, and verifying your deployment. By the end of this tutorial, you will have a fully functional Claude deployment ready for development and production workloads.

Part 1 of this series covered the strategic overview and business value of Azure AI Foundry with Claude. Part 2 focuses entirely on practical implementation, walking through every deployment step with screenshots, code examples, and troubleshooting guidance.

Prerequisites Checklist

Before beginning deployment, verify you have the following prerequisites in place. Missing any of these will block your deployment progress.

Azure Subscription Requirements

You need a paid Azure subscription with valid payment method. Free trial subscriptions work for initial testing, but production deployments require a paid subscription. The following subscription types are currently restricted and cannot deploy Claude models: Cloud Solution Providers (CSP) subscriptions, sponsored accounts with Azure credits, enterprise accounts in Singapore and South Korea, and Microsoft employee accounts.

If you do not have an appropriate Azure subscription, create a pay-as-you-go subscription through the Azure portal. The subscription must have a billing account in a country or region where Anthropic offers Claude models for purchase.

Required Permissions

You need sufficient permissions to create and manage Azure resources. At minimum, you require Contributor role at the subscription or resource group level. For Marketplace model deployments, you also need permissions to create Azure Marketplace subscriptions, which typically requires Contributor or Owner role.

To verify your permissions, navigate to the Azure portal, select your subscription, click Access Control (IAM), and check your role assignments. If you lack necessary permissions, contact your Azure administrator.

Regional Availability

Claude models are available in specific Azure regions. At launch, supported regions include East US 2 and Sweden Central for global standard deployment. The US DataZone deployment option is coming soon but not yet available. Plan to create your Azure AI Foundry resources in one of the supported regions.

Regional selection impacts latency, compliance, and data residency. For North American customers, East US 2 typically provides optimal performance. European customers should consider Sweden Central for GDPR compliance and reduced latency.

Azure Marketplace Access

Foundry Models from partners and community require Azure Marketplace access. Ensure your organization allows Marketplace purchases. Some enterprises disable Marketplace access by policy. If you cannot access Marketplace offerings, work with your Azure governance team to enable access for AI Foundry models.

Deployment Architecture Overview

Understanding the deployment architecture helps clarify the steps ahead. The following diagram illustrates the resource hierarchy and deployment flow:

graph TB
    Sub["Azure Subscription"]
    RG["Resource Group"]
    Hub["AI Foundry Hub"]
    Project["AI Foundry Project
(East US 2 or Sweden Central)"]
    
    Catalog["Model Catalog"]
    Deploy["Model Deployment"]
    
    Sonnet["Claude Sonnet 4.5
Deployment"]
    Opus["Claude Opus 4.5
Deployment"]
    Haiku["Claude Haiku 4.5
Deployment"]
    
    API["API Endpoint"]
    Auth["Authentication
(Entra ID or API Key)"]
    
    Sub --> RG
    RG --> Hub
    Hub --> Project
    Project --> Catalog
    Catalog --> Deploy
    
    Deploy --> Sonnet
    Deploy --> Opus
    Deploy --> Haiku
    
    Sonnet --> API
    Opus --> API
    Haiku --> API
    
    API --> Auth
    
    style Sub fill:#0078D4,color:#fff
    style Hub fill:#0078D4,color:#fff
    style Project fill:#0078D4,color:#fff
    style Sonnet fill:#FF6B35,color:#fff
    style Opus fill:#FF6B35,color:#fff
    style Haiku fill:#FF6B35,color:#fff

The hierarchy flows from Azure Subscription to Resource Group to AI Foundry Hub to AI Foundry Project. Each project can contain multiple model deployments. Understanding this structure is critical because permissions, billing, and network configurations cascade through this hierarchy.

Step 1: Create Azure AI Foundry Hub

The AI Foundry Hub serves as the top-level organizational container for your AI projects. It manages security configurations, billing settings, and network isolation.

Create Hub via Azure Portal

Navigate to the Azure portal at portal.azure.com. In the search bar, type “Azure AI Foundry” and select it from the results. Click Create to begin hub creation. Alternatively, search for “AI + Machine Learning” in the Azure Marketplace and locate Azure AI Foundry.

On the Basics tab, configure the following settings. Select your subscription from the dropdown. Choose an existing resource group or create a new one. Resource groups provide logical organization for related Azure resources. For production deployments, create a dedicated resource group for AI Foundry resources to simplify billing, access control, and resource lifecycle management.

Enter a unique hub name. The name must be globally unique across Azure and follow naming conventions: 3 to 24 characters, alphanumeric and hyphens only, must start with a letter. Choose East US 2 or Sweden Central as the region. This selection determines where your hub metadata and configuration are stored.

Configure networking settings. For initial deployments, select Public endpoint (all networks). This allows access from any internet-connected location. For production deployments requiring enhanced security, select Private with Internet Outbound to create a hub with private endpoints and outbound internet access, or Private with Approved Outbound for fully isolated environments.

Click Review + Create. Azure validates your configuration. Review the estimated costs and configuration summary. Click Create to provision the hub. Hub creation typically takes 5 to 10 minutes. Monitor deployment progress in the Notifications panel.

Create Hub via Azure CLI

For automated deployments, use Azure CLI. First, ensure you have Azure CLI installed and authenticated:

# Login to Azure
az login

# Set your subscription
az account set --subscription "YOUR_SUBSCRIPTION_ID"

# Create resource group
az group create \
  --name "rg-foundry-prod" \
  --location "eastus2"

# Create AI Foundry Hub
az ml workspace create \
  --kind "hub" \
  --resource-group "rg-foundry-prod" \
  --name "hub-foundry-prod" \
  --location "eastus2"

The command creates a hub with default settings. For production deployments, consider additional configuration options for managed identity, network isolation, and customer-managed encryption keys.

Step 2: Create AI Foundry Project

Projects sit within hubs and contain your actual model deployments, experiments, and application code. Each project has its own workspace, permissions, and billing tracking.

Create Project via Portal

Navigate to your newly created AI Foundry Hub in the Azure portal. In the hub overview page, locate Projects in the left navigation menu. Click Create project. Enter a project name following the same naming conventions as the hub. Optionally, provide a description to help team members understand the project’s purpose.

Configure project settings. The project inherits the hub’s region and networking configuration but can have independent access controls. Assign Azure RBAC roles to control who can deploy models, submit experiments, and manage the project.

Click Create. Project creation is faster than hub creation, typically completing in 2 to 3 minutes. Once created, you can access the project workspace through the AI Foundry portal.

Create Project via Azure CLI

# Create AI Foundry Project
az ml workspace create \
  --kind "project" \
  --resource-group "rg-foundry-prod" \
  --name "project-claude-prod" \
  --location "eastus2" \
  --hub-id "/subscriptions/YOUR_SUB_ID/resourceGroups/rg-foundry-prod/providers/Microsoft.MachineLearningServices/workspaces/hub-foundry-prod"

Replace YOUR_SUB_ID with your actual subscription ID. The hub-id parameter links the project to your hub, ensuring proper resource hierarchy.

Step 3: Deploy Claude Models

With your project created, you can now deploy Claude models from the Model Catalog. This section covers deployment for each Claude model variant.

Access Model Catalog

Navigate to ai.azure.com and sign in with your Azure credentials. Select your AI Foundry project from the project list. In the left navigation, click Model catalog under the Assets section. The catalog displays thousands of available models organized by provider, capability, and task type.

Filter the catalog to find Claude models quickly. Click the Filters button and select Anthropic under Publishers. You should see Claude Sonnet 4.5, Claude Opus 4.5, Claude Haiku 4.5, and Claude Opus 4.1 in the filtered results.

Deploy Claude Sonnet 4.5

Click on Claude Sonnet 4.5 in the catalog. The model detail page displays key information including model description, capabilities, pricing, and deployment options. Review the model card to understand capabilities and limitations. Click Deploy to begin the deployment process.

Configure deployment settings. Enter a deployment name such as “claude-sonnet-4-5” or “sonnet-prod”. The deployment name becomes part of your API endpoint and should be descriptive but concise. Deployment names must be unique within your project.

Select deployment type. Claude models support Global Standard deployment, which routes requests to Anthropic’s infrastructure with Azure handling authentication, billing, and governance. This deployment type offers the best balance of simplicity and enterprise features.

Configure rate limits and quotas. Default rate limits vary by model and are measured in Tokens Per Minute (TPM) and Requests Per Minute (RPM). For Sonnet 4.5, typical defaults are 100,000 TPM and 100 RPM. You can request quota increases through the Azure portal if your application requires higher throughput.

Review pricing. The deployment displays estimated costs based on Anthropic’s standard pricing: $3 per million input tokens and $15 per million output tokens for Sonnet 4.5. Actual costs depend on your usage patterns, prompt caching efficiency, and batch API usage.

Click Create. Deployment provisioning takes 3 to 5 minutes. Monitor deployment status in the Deployments section of your project. Once status shows “Succeeded”, the model is ready for API calls.

Deploy Additional Models

Repeat the deployment process for Claude Opus 4.5 and Claude Haiku 4.5 if your use cases require multiple models. Deploying all three models enables dynamic model routing based on query complexity and cost optimization strategies.

For Claude Opus 4.5, use deployment name “claude-opus-4-5”. Pricing is $5 per million input tokens and $25 per million output tokens. For Claude Haiku 4.5, use deployment name “claude-haiku-4-5”. Pricing is $0.80 per million input tokens and $4.00 per million output tokens.

Step 4: Configure Authentication

Azure AI Foundry supports two authentication methods: Microsoft Entra ID (recommended) and API keys. Choose based on your security requirements and deployment architecture.

Microsoft Entra ID Authentication (Recommended)

Microsoft Entra ID provides enterprise-grade authentication with support for conditional access, multi-factor authentication, and Azure RBAC. This is the recommended approach for production deployments.

Entra ID authentication uses Azure’s default credential chain, which automatically discovers credentials from multiple sources in the following order: environment variables, managed identity, Azure CLI, Visual Studio, and interactive browser authentication. This flexibility simplifies development workflows while maintaining security.

To configure Entra ID authentication, assign appropriate Azure RBAC roles. Navigate to your AI Foundry resource in the Azure portal. Click Access Control (IAM) in the left navigation. Click Add role assignment. Select Azure AI Developer or Cognitive Services User role. These roles grant permissions to invoke model endpoints without full resource management access.

Add the users, service principals, or managed identities that require access. For application deployments, create a managed identity and assign it the appropriate role. Managed identities eliminate the need to store credentials in code or configuration files.

The required Entra ID scope for API calls is “https://ai.azure.com/.default”. This scope is used when requesting access tokens for AI Foundry resources.

API Key Authentication

API key authentication provides simpler setup for development and testing environments. However, keys have broader permissions and require manual rotation, making them less suitable for production.

To retrieve API keys, navigate to your deployed Claude model in the AI Foundry portal. Click on the deployment name to open deployment details. Locate the Keys and Endpoint section. You will see two keys (primary and secondary) and the endpoint URL.

Copy the primary key and endpoint URL. Store the key securely using Azure Key Vault, environment variables, or secure configuration management. Never commit API keys to source control.

API keys have no expiration by default but should be rotated periodically as security best practice. Azure provides two keys to enable zero-downtime rotation. Update your application to use the secondary key, regenerate the primary key, then update the application back to the primary key and regenerate the secondary.

Step 5: Verify Deployment

After deployment and authentication configuration, verify everything works correctly before proceeding to application development.

Test in AI Foundry Playground

The AI Foundry Playground provides a web-based interface for testing deployments without writing code. Navigate to your deployed model in the AI Foundry portal. Click the Playground or Test button. This opens an interactive chat interface.

Enter a test prompt such as “Explain quantum computing in simple terms.” Click Submit. The model should respond within a few seconds. Verify the response quality and that token usage is tracked correctly.

Experiment with different prompts to test various capabilities. Try multi-turn conversations, long-context inputs, and requests for code generation. Check the response metadata to confirm token counts, model version, and timing information.

Test via cURL

For command-line verification, use cURL to send HTTP requests directly to your Claude deployment. This tests the deployment outside the Foundry interface.

Using API key authentication:

curl -X POST \
  "https://YOUR-RESOURCE.services.ai.azure.com/anthropic/v1/messages" \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "Hello, Claude! How are you today?"
      }
    ]
  }'

Replace YOUR-RESOURCE with your actual resource name and YOUR_API_KEY with your deployment API key. The response should include the model’s message along with usage statistics.

Using Entra ID authentication requires obtaining an access token first:

# Get access token
TOKEN=$(az account get-access-token --resource "https://ai.azure.com" --query accessToken -o tsv)

# Call API with token
curl -X POST \
  "https://YOUR-RESOURCE.services.ai.azure.com/anthropic/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "Hello, Claude! How are you today?"
      }
    ]
  }'

Verify Token Counting

Accurate token counting is critical for cost management. Claude provides a Token Count API for pre-call token estimation:

curl -X POST \
  "https://YOUR-RESOURCE.services.ai.azure.com/anthropic/v1/token-count" \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {
        "role": "user",
        "content": "This is a test message for token counting."
      }
    ]
  }'

The response includes input token count, allowing you to estimate costs before making actual API calls.

Understanding Rate Limits and Quotas

Claude deployments in Azure AI Foundry have default rate limits that control throughput and prevent runaway costs. Understanding these limits is essential for production planning.

Default Rate Limits

Rate limits are measured in two dimensions: Tokens Per Minute (TPM) and Requests Per Minute (RPM). TPM limits control total token throughput across all requests, while RPM limits control the number of individual API calls.

Default limits vary by model and deployment type. Typical starting quotas for Global Standard deployments range from 50,000 to 100,000 TPM and 50 to 100 RPM. These limits apply per deployment, not per user or application.

Rate limit headers are not included in responses from Claude deployments in Foundry. Unlike direct Anthropic API calls, Azure manages rate limiting through its own monitoring infrastructure. Monitor rate limit consumption through Azure Monitor metrics rather than response headers.

Requesting Quota Increases

To increase quotas beyond default limits, submit a request through the Azure portal. Navigate to your AI Foundry resource, select Quotas in the left navigation, identify the quota you need to increase, click Request quota increase, provide justification for the increase, and submit the request.

Quota increase requests typically process within 1 to 3 business days. For urgent requests, contact Azure support with your business justification. Include details about your use case, expected traffic patterns, and timeline requirements.

Monitoring Rate Limit Usage

Set up Azure Monitor alerts to track quota consumption and prevent service disruptions. Key metrics to monitor include tokens processed per minute, requests per minute, throttled requests count, and average response latency.

Configure alerts to notify when consumption reaches 80% of quota limits, giving you time to request increases before hitting hard limits.

Deployment Best Practices

Following these best practices ensures reliable, secure, and cost-effective deployments.

Environment Separation

Maintain separate deployments for development, staging, and production environments. Create dedicated resource groups and AI Foundry projects for each environment. This isolation prevents development activities from impacting production workloads and simplifies billing analysis.

Use naming conventions that clearly indicate environment. Examples include “project-claude-dev”, “project-claude-staging”, and “project-claude-prod”. Apply consistent naming across all Azure resources for easier management.

Network Security

For production deployments handling sensitive data, configure private endpoints to restrict network access. Private endpoints ensure traffic between your applications and AI Foundry resources never traverses the public internet.

Configure network security groups and firewall rules to limit inbound connections. Use Azure Virtual Network service endpoints to secure traffic between Azure services.

Cost Management

Enable cost alerts in Azure Cost Management to track AI Foundry spending. Set budgets for each environment and configure alerts at 50%, 80%, and 100% of budget thresholds.

Tag all AI Foundry resources with appropriate metadata for cost allocation. Common tags include environment, project, cost center, and owner. Consistent tagging enables detailed cost reporting and chargeback scenarios.

Review Azure Cost Management reports weekly to identify unexpected spending patterns. Claude pricing is token-based, so sudden cost increases often indicate application bugs, prompt inefficiencies, or unexpected traffic spikes.

Deployment Documentation

Document your deployment configuration including resource names, regions, deployment names, rate limits, authentication methods, and network configuration. Store this documentation in your team’s knowledge base or wiki.

Include runbooks for common operational tasks such as rotating API keys, requesting quota increases, troubleshooting authentication failures, and monitoring cost spikes. Well-documented deployments reduce mean time to resolution when issues occur.

Troubleshooting Common Issues

Even with careful deployment, you may encounter issues. This section covers common problems and solutions.

Deployment Failures

If model deployment fails with subscription restriction errors, verify your subscription type is not CSP, sponsored, or in a restricted region. If marketplace access errors occur, ensure your organization allows Azure Marketplace purchases and you have necessary permissions. For region availability errors, confirm you are deploying to East US 2 or Sweden Central.

Authentication Errors

401 Unauthorized errors typically indicate authentication problems. For API key authentication, verify the key is current and has not been regenerated. Ensure you are using the x-api-key header with the correct value. For Entra ID authentication, verify your access token is current and uses the correct scope. Check that your principal has appropriate RBAC role assignments.

Rate Limit Errors

429 Too Many Requests errors indicate you have exceeded rate limits. Implement exponential backoff and retry logic in your application code. Monitor your quota consumption in Azure Monitor. Submit quota increase requests if your legitimate usage exceeds default limits.

Model Not Found Errors

404 errors with messages about model not found typically result from incorrect deployment names. Verify you are using the exact deployment name, not the model ID. Deployment names are case-sensitive. Check that the deployment completed successfully and shows “Succeeded” status in the portal.

Next Steps

With your Claude models successfully deployed in Azure AI Foundry, you are ready to begin application development. Part 3 of this series covers building your first Claude application using Node.js, including environment setup, SDK installation, implementing chat completions, handling streaming responses, and production-ready error handling patterns.

The deployment configuration you completed in this guide provides the foundation for all subsequent development work. Your deployment endpoint URLs, authentication configuration, and rate limits will be referenced throughout the remaining parts of this series.

Conclusion

Deploying Claude models in Azure AI Foundry involves several steps, but the platform’s integration makes the process straightforward. By following this guide, you have created an AI Foundry hub and project, deployed Claude models from the catalog, configured authentication using either Entra ID or API keys, and verified your deployment through multiple testing methods.

The deployment infrastructure you built provides enterprise-grade security, governance, and observability. Azure manages the complexity of model hosting, scaling, and maintenance while you focus on building applications that leverage Claude’s capabilities.

In Part 3, we will put this deployment to work, building a complete Node.js application that interacts with your Claude deployment. You will learn to implement chat completions, handle streaming responses, manage conversation context, and optimize for cost and performance.

Written by:

Chandan 542 Posts

You May Have Missed

The Complete Picture: Balancing Professional and Personal Support Systems

For Parents, Partners, and Friends: A Guide to Supporting Your Loved One in Tech

The HR Conversation: When and How to Involve HR in Your Mental Health Journey

Finding Your Tech Tribe: The Power of Peer Support Groups