AWS AI/ML Services Overview

AWS organizes its AI and ML offerings into three layers. Picking the right layer is primarily about how much of the ML stack you want to own: more managed at the top, more flexibility at the bottom.


Layer 1 — Generative AI (Foundation Models)


Layer 2 — Task-Specific AI APIs


Layer 3 — ML Platform


Choosing Between Layers:

  1. Start with task-specific APIs if a managed API matches your use case — fastest to production, no training data required.
  2. Use Bedrock when the task is generative, open-ended, or benefits from foundation-model reasoning. Add Knowledge Bases for RAG and Guardrails for safety.
  3. Drop to SageMaker when you need full control over training, hosting, or custom model architectures.


Service Limits & Quotas (Common Patterns):


Pricing Model (Layer Patterns):


Code Example — Picking the Right Layer:

Same conceptual task ("understand this customer review") at three layers:


import boto3, json

# Layer 2 (task API) — predictable cost, structured output
comprehend = boto3.client("comprehend")
review = "The screen is gorgeous but battery life is awful."
sent = comprehend.detect_sentiment(Text=review, LanguageCode="en")["Sentiment"]
ents = comprehend.detect_entities(Text=review, LanguageCode="en")["Entities"]

# Layer 1 (Bedrock) — open-ended reasoning, more flexible output
bedrock = boto3.client("bedrock-runtime")
prompt = (
    "Extract per-feature sentiment from this review as JSON list of "
    "{feature, sentiment, evidence}.\n\n" + review
)
resp = bedrock.invoke_model(
    modelId="anthropic.claude-3-7-sonnet-20250219-v1:0",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 256,
        "messages": [{"role": "user", "content": prompt}],
    }),
)
print(json.loads(resp["body"].read())["content"][0]["text"])

# Layer 3 (SageMaker) — deploy a fine-tuned model when you need
# full control over architecture, latency, or custom output schemas.
  


Common Interview Questions:

How do you choose between Bedrock and a task-specific API like Comprehend?

Pick the task API when your problem is well-defined and the API output matches what you need (NER, sentiment, OCR, labels) — predictable cost, low latency, structured response. Pick Bedrock when the task is open-ended, requires reasoning, or needs free-form generation. Often the best architecture combines both.

When is SageMaker the right answer?

When neither task APIs nor Bedrock cover your need: custom architectures, training on proprietary data at scale, custom inference logic, on-device export, or fine-grained latency/cost control. SageMaker also hosts JumpStart foundation models when you need dedicated capacity.

How does Amazon Q relate to Bedrock?

Q is built on Bedrock but exposes a higher-level assistant experience. Q Business connects to your enterprise data (S3, SharePoint, Confluence, Salesforce, ServiceNow). Q Developer is the Copilot-style coding assistant inside IDEs and the AWS console. Use Bedrock directly when you want to build your own assistant.

What's the difference between Bedrock Knowledge Bases and Kendra?

Both do retrieval over enterprise content. Knowledge Bases is purpose-built for RAG with foundation models — chunk, embed, retrieve, and augment a model prompt. Kendra is a full enterprise search product with deep connectors and ranking models — useful when you need a search experience first and RAG second.

How are these services HIPAA / compliance eligible?

Most production AWS AI services (Bedrock, SageMaker, Comprehend Medical, Textract, Rekognition, Transcribe Medical) are HIPAA-eligible under a signed BAA. Always verify the current eligibility list and configure region/data residency appropriately.

How do you keep generative AI costs predictable?

Cache responses where possible, use prompt caching when supported, choose smaller models (Haiku, Mistral 7B) for simple tasks, set max-token caps, monitor token usage via CloudWatch, and consider provisioned throughput when traffic is steady enough to justify the commit.


The AWS AI/ML stack is layered for a reason: start at the highest layer that solves your problem and only drop down when the abstraction stops fitting. Most production systems mix and match — task APIs for primitives, Bedrock for reasoning, SageMaker for the few cases that need full control.