AWS Services Reference
AI & Machine Learning
Amazon Bedrock
Managed foundation models (Claude, Llama, Mistral, Titan) with Knowledge Bases, Agents, Guardrails.
Amazon SageMaker
End-to-end ML platform — train, tune, deploy, monitor custom models.
Amazon Comprehend
NLP — entities, sentiment, PII redaction, medical concept extraction.
Amazon Textract
Document AI — OCR, forms, tables, invoice/ID/lending extraction.
Amazon Rekognition
Computer vision — labels, faces, moderation, custom image classifiers.
AI Services Overview
How Bedrock, task-specific APIs, and SageMaker fit together.
Compute
Amazon EC2
Virtual machines — instance families, Spot, Savings Plans, Auto Scaling.
AWS Lambda
Event-driven serverless functions with pay-per-invocation pricing.
ECS, EKS & Fargate
Containers — proprietary orchestrator (ECS), Kubernetes (EKS), or serverless (Fargate).
AWS EMR
Managed Hadoop, Spark, Hive, Presto, and Trino clusters.
Auto Scaling
Scale EC2 and EMR processing resources up or down based on load.
AWS Step Functions
State-machine orchestration for multi-step serverless workflows.
Database & Data Warehouse
Amazon RDS
Managed relational DBs — MySQL, PostgreSQL, MariaDB, Oracle, SQL Server.
Amazon Aurora
Cloud-native MySQL/Postgres engine with distributed storage & Global Database.
Amazon DynamoDB
Serverless NoSQL key-value / document DB at any scale.
Amazon ElastiCache
In-memory cache — Valkey, Redis OSS, Memcached.
Amazon Redshift
Cloud data warehouse — MPP columnar SQL for analytics.
Data Mesh & Data Fabric
Data architecture patterns for large organizations.
Networking & Content Delivery
Amazon VPC
Isolated virtual networks — subnets, route tables, security groups, endpoints.
Amazon API Gateway
Managed REST, HTTP, and WebSocket APIs.
Amazon CloudFront
Global CDN with edge compute (CloudFront Functions, Lambda@Edge).
Amazon Route 53
Managed DNS, domain registration, health checks, traffic routing.
Storage
Amazon S3
Object storage — data lake foundation, storage classes, lifecycle rules.
S3 Transfer Acceleration
Accelerated uploads over the CloudFront edge network.
Building an ETL Pipeline on AWS
Data Ingestion (Extract)
Amazon S3
Store raw data in S3 buckets — the foundation of AWS data lakes.
Kinesis Data Streams
Durable, replayable real-time stream with multiple consumers.
Data Firehose
Serverless streaming delivery to S3, Redshift, OpenSearch, Snowflake.
Data Transformation
AWS Glue (ETL)
Managed serverless Spark for ETL — clean, format, enrich.
AWS Lambda
Simple transformations in real-time or small batches.
AWS EMR
Large-scale Spark/Hadoop for complex transformations.
Query & Load
Amazon Athena
Serverless SQL directly over S3 — Parquet, Iceberg, Hudi.
Amazon Redshift
Load transformed data for analytical queries and BI reporting.
Amazon QuickSight
Native BI and dashboards with generative-AI Q.
Orchestration & Metadata
Glue Data Catalog
Central metadata store for tables across Athena, Redshift Spectrum, EMR.
Lake Formation
Fine-grained access control over S3-based data lakes built on Glue Catalog.
Lake vs Cloud Formation
Side-by-side reference for two services with confusingly similar names.
AWS Glue Workflow
Orchestrate Glue crawlers, jobs, and triggers as a DAG.
Step Functions
General-purpose state machines across any AWS service.
Security, Monitoring & Governance
Security & Compliance
IAM
Identities, roles, and policies controlling access to every AWS resource.
Secrets Manager
Store, rotate, and retrieve credentials and API keys.
KMS
Encryption keys and cryptographic operations across AWS.
AWS CloudTrail
Audit log of every API call in your accounts.
Config & Inspector
Resource configuration tracking and vulnerability scanning.
Monitoring
AWS CloudWatch
Metrics, logs, alarms, dashboards, and anomaly detection.
CloudWatch Events
Event-driven triggers (now EventBridge) for schedules and state changes.
CloudFormation
Declarative infrastructure-as-code templates.