Amazon Comprehend

Amazon Comprehend is a managed natural-language processing (NLP) service that extracts insights from unstructured text. It exposes task-specific APIs — entity recognition, sentiment, key phrases, syntax, language detection, topic modeling, and PII redaction — without requiring ML expertise or model training. Comprehend Medical adds clinical concept extraction and ICD-10/RxNorm coding.


Key Features:


Common Use Cases:


Service Limits & Quotas:


Pricing Model:


Code Example:


import boto3

comprehend = boto3.client("comprehend", region_name="us-west-2")
text = "Order #A-482 shipped from Seattle on Tuesday and arrived damaged."

print(comprehend.detect_sentiment(Text=text, LanguageCode="en")["Sentiment"])
# NEGATIVE

for ent in comprehend.detect_entities(Text=text, LanguageCode="en")["Entities"]:
    print(ent["Type"], "->", ent["Text"])
# COMMERCIAL_ITEM -> Order #A-482
# LOCATION        -> Seattle
# DATE            -> Tuesday

# Detect and redact PII in one call
pii = comprehend.detect_pii_entities(Text="John Doe lives at 1 Main St, SSN 123-45-6789",
                                      LanguageCode="en")
for e in pii["Entities"]:
    print(e["Type"], e["BeginOffset"], e["EndOffset"])
  


Common Interview Questions:

When should you use Comprehend instead of a Bedrock LLM?

Use Comprehend for well-defined NLP primitives (NER, sentiment, PII redaction, language detection) where you need predictable cost, low latency, and a structured API response. Use Bedrock when the task is open-ended, requires reasoning, or benefits from instruction-following over arbitrary text.

What's the difference between standard sentiment and targeted sentiment?

Standard sentiment scores a whole document (positive/negative/neutral/mixed). Targeted sentiment associates sentiment with each detected entity in the text — essential for review analytics where one product can have positive sentiment for one feature and negative for another.

When do you train a custom classifier or entity recognizer?

When the generic taxonomies don't match your domain — e.g., classifying support tickets into your internal categories, or extracting entities like SKUs, contract clauses, or medical specialties beyond Comprehend Medical's coverage.

How does PII detection differ from PII redaction?

DetectPiiEntities returns offsets and types but doesn't modify text — you redact in your code. The async PII redaction job rewrites documents in S3 with replacements (mask or type label). Useful for compliance scrub before downstream analytics.

Is Comprehend Medical HIPAA eligible?

Yes — Comprehend Medical is in the AWS HIPAA-eligible services list under a signed BAA. It extracts conditions, medications, dosages, anatomy, and codes ICD-10-CM, RxNorm, and SNOMED CT.

How do you keep custom-classifier inference costs low?

Use async batch jobs when latency permits (cheaper per character). For real-time use, right-size inference units to actual peak QPS, scale to zero in dev/test, and combine multiple labels into a single multi-label classifier instead of multiple binary models.


Comprehend complements Bedrock and SageMaker by handling the well-defined NLP primitives that many applications need — reach for it before training a custom model when a task-specific API will do.