azure-ai-foundry
cloud
infrastructure
mlops
sagemaker
vertex-ai
Managed AI platforms trade some control for speed, governance, and operational leverage. For many teams, that trade is worth it.
★ TL;DR
What : The major managed cloud platforms used to build, deploy, and operate ML and GenAI systems.
Why : These platforms bundle notebooks, training, deployment, evaluation, security, and governance into one operating environment.
Key point : The best platform choice depends less on raw features and more on team context, cloud alignment, and governance needs.
★ Overview
Definition
Cloud ML services are managed platforms that support parts or all of the ML and GenAI lifecycle, including data prep, training, experimentation, deployment, monitoring, and governance.
Scope
This note focuses on platform categories and the major hyperscaler offerings rather than deep vendor-specific setup instructions.
Significance
Most enterprise AI teams use a managed platform somewhere in the stack.
Platform choice affects velocity, compliance posture, and operating model.
ML engineer and MLOps interviews often expect a point of view here.
Prerequisites
Last verified for major platform naming and positioning: 2026-04.
★ Deep Dive
Capability
Examples
Development
notebooks, prompt studios, SDKs
Training
managed jobs, tuning, distributed runs
Model management
registry, versioning, approval workflows
Deployment
endpoints, scaling, online and batch inference
Observability
logs, traces, metrics, monitoring
Governance
IAM, audit, approval, policy controls
Platform
Short Description
Typical Strength
AWS SageMaker AI
Broad managed ML/AI platform on AWS
strong AWS integration and end-to-end workflow support
Google Vertex AI
Unified platform for ML and GenAI on GCP
strong generative AI, model garden, and GCP workflow integration
Azure AI Foundry + Azure ML
Microsoft's platform family for AI apps, agents, and ML workflows
strong enterprise governance and Microsoft ecosystem fit
Question
Why It Matters
Does the team already live in this cloud?
biggest practical force in most decisions
Do we need managed training, managed inference, or both?
some teams only need part of the stack
How strong are governance and identity needs?
enterprise adoption depends on this
Are we building classic ML, GenAI apps, or both?
platform depth differs by workload
How much portability do we need?
affects lock-in risk
Typical Adoption Patterns
Pattern
Example
All-in platform
training, registry, deployment, monitoring in one cloud
Hybrid
managed training plus custom serving stack
Managed GenAI only
prompt tooling and model access on cloud, custom app layer elsewhere
fast team setup
tighter IAM and compliance integration
shared workflows across data science and engineering
reduced platform maintenance burden
When They Feel Heavy
small teams shipping fast prototypes
highly custom serving stacks
multi-cloud portability requirements
cost-sensitive workloads where custom infra is leaner
Quick CLI Examples
# AWS SageMaker AI
aws sagemaker list-endpoints
# Google Vertex AI
gcloud ai endpoints list --region= us-central1
# Azure AI / Azure ML
az ml online-endpoint list
◆ Quick Reference
Need
Good Direction
already on AWS
evaluate SageMaker AI first
already on GCP
evaluate Vertex AI first
already on Microsoft enterprise stack
evaluate Azure AI Foundry and Azure ML first
need full custom control
managed platform may be partial, not primary
need enterprise governance fast
managed platform usually helps
○ Gotchas & Common Mistakes
Teams overestimate how much of the platform they will actually use.
Platform convenience can turn into lock-in if abstraction boundaries are weak.
Billing complexity can hide in adjacent services, not only the platform headline cost.
Managed does not mean no architecture decisions.
○ Interview Angles
Q : How would you choose between SageMaker, Vertex AI, and Azure AI Foundry?
A : I would start with the existing cloud footprint, governance requirements, workload type, and team skills. The best choice is usually the platform that fits the organization's operating context, not the one with the longest feature list.
Q : When would you avoid a full managed platform?
A : When the team needs extreme portability, a highly custom serving stack, or the platform overhead outweighs the operational value for the size of the workload.
★ Code & Implementation
Multi-Cloud LLM API Comparison
# pip install openai>=1.60 anthropic>=0.40 google-generativeai>=0.8
# ⚠️ Last tested: 2026-04 | Requires: OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY env vars
import os , time
from openai import OpenAI
import anthropic
import google.generativeai as genai
prompt = "Explain the difference between RAG and fine-tuning in 2 sentences."
# OpenAI (Azure-compatible: set base_url to Azure endpoint)
oai = OpenAI ()
start = time . monotonic ()
oai_r = oai . chat . completions . create (
model = "gpt-4o-mini" ,
messages = [{ "role" : "user" , "content" : prompt }],
max_tokens = 120 ,
)
print ( f "OpenAI ( { time . monotonic () - start : .2f } s): { oai_r . choices [ 0 ] . message . content [: 100 ] } " )
# Anthropic Claude
ant = anthropic . Anthropic ()
start = time . monotonic ()
ant_r = ant . messages . create (
model = "claude-3-5-haiku-20241022" ,
max_tokens = 120 ,
messages = [{ "role" : "user" , "content" : prompt }],
)
print ( f "Anthropic ( { time . monotonic () - start : .2f } s): { ant_r . content [ 0 ] . text [: 100 ] } " )
# Google Gemini
genai . configure ( api_key = os . environ [ "GOOGLE_API_KEY" ])
gem = genai . GenerativeModel ( "gemini-2.0-flash" )
start = time . monotonic ()
gem_r = gem . generate_content ( prompt )
print ( f "Gemini ( { time . monotonic () - start : .2f } s): { gem_r . text [: 100 ] } " )
★ Connections
◆ Production Failure Modes
Failure
Symptoms
Root Cause
Mitigation
Vendor lock-in
Cannot migrate workloads between clouds
Proprietary APIs, custom runtimes
Use open standards (ONNX, containers), abstract service layer
Cost overrun
Monthly bill 5-10x expected
Idle GPU instances, no auto-shutdown
Spot instances, auto-scaling to zero, budget alerts
Region availability
GPU instance type unavailable in target region
Limited GPU supply in specific regions
Multi-region fallback, reserved capacity, spot pools
◆ Hands-On Exercises
Exercise 1: Deploy the Same Model on Two Clouds
Goal : Deploy an LLM endpoint on AWS and GCP, compare cost and latency
Time : 45 minutes
Steps :
1. Deploy a small model on AWS SageMaker and GCP Vertex AI
2. Run 50 inference requests against each
3. Compare cold start time, p95 latency, and per-request cost
4. Document migration considerations
Expected Output : Cloud comparison table with latency, cost, and ease-of-use scores
★ Recommended Resources
★ Sources
AWS SageMaker AI documentation and overview pages
Google Cloud Vertex AI documentation and overview pages
Microsoft Azure AI Foundry and Azure Machine Learning documentation