AWS Services Reference


AI & Machine Learning

  • Amazon Bedrock Managed foundation models (Claude, Llama, Mistral, Titan) with Knowledge Bases, Agents, Guardrails.
  • Amazon SageMaker End-to-end ML platform — train, tune, deploy, monitor custom models.
  • Amazon Comprehend NLP — entities, sentiment, PII redaction, medical concept extraction.
  • Amazon Textract Document AI — OCR, forms, tables, invoice/ID/lending extraction.

Compute

  • Amazon EC2 Virtual machines — instance families, Spot, Savings Plans, Auto Scaling.
  • AWS Lambda Event-driven serverless functions with pay-per-invocation pricing.
  • ECS, EKS & Fargate Containers — proprietary orchestrator (ECS), Kubernetes (EKS), or serverless (Fargate).
  • AWS EMR Managed Hadoop, Spark, Hive, Presto, and Trino clusters.
  • Auto Scaling Scale EC2 and EMR processing resources up or down based on load.
  • AWS Step Functions State-machine orchestration for multi-step serverless workflows.

Database & Data Warehouse

  • Amazon RDS Managed relational DBs — MySQL, PostgreSQL, MariaDB, Oracle, SQL Server.
  • Amazon Aurora Cloud-native MySQL/Postgres engine with distributed storage & Global Database.

Networking & Content Delivery

  • Amazon VPC Isolated virtual networks — subnets, route tables, security groups, endpoints.
  • Amazon Route 53 Managed DNS, domain registration, health checks, traffic routing.

Storage

  • Amazon S3 Object storage — data lake foundation, storage classes, lifecycle rules.

Building an ETL Pipeline on AWS


Data Ingestion (Extract)

  • Amazon S3 Store raw data in S3 buckets — the foundation of AWS data lakes.
  • Data Firehose Serverless streaming delivery to S3, Redshift, OpenSearch, Snowflake.

Data Transformation

  • AWS Glue (ETL) Managed serverless Spark for ETL — clean, format, enrich.
  • AWS Lambda Simple transformations in real-time or small batches.
  • AWS EMR Large-scale Spark/Hadoop for complex transformations.

Query & Load

  • Amazon Athena Serverless SQL directly over S3 — Parquet, Iceberg, Hudi.
  • Amazon Redshift Load transformed data for analytical queries and BI reporting.

Orchestration & Metadata

  • Glue Data Catalog Central metadata store for tables across Athena, Redshift Spectrum, EMR.
  • Lake Formation Fine-grained access control over S3-based data lakes built on Glue Catalog.
  • Lake vs Cloud Formation Side-by-side reference for two services with confusingly similar names.
  • Step Functions General-purpose state machines across any AWS service.

Security, Monitoring & Governance


Security & Compliance

  • IAM Identities, roles, and policies controlling access to every AWS resource.
  • Secrets Manager Store, rotate, and retrieve credentials and API keys.
  • KMS Encryption keys and cryptographic operations across AWS.
  • AWS CloudTrail Audit log of every API call in your accounts.

Monitoring

  • AWS CloudWatch Metrics, logs, alarms, dashboards, and anomaly detection.
  • CloudWatch Events Event-driven triggers (now EventBridge) for schedules and state changes.