AWS S3

Amazon S3 (Simple Storage Service) is a scalable object storage service provided by Amazon Web Services (AWS). It is designed for storing and retrieving any amount of data from anywhere on the internet, offering a range of features that make it suitable for a wide variety of use cases, from data backup to serving large-scale applications.

Key Features of Amazon S3:

Scalability:
S3 is designed to handle virtually unlimited amounts of data, automatically scaling up or down based on your needs.
Durability and Availability:
S3 Standard provides 99.999999999% (11 nines) durability by replicating data across multiple geographically dispersed Availability Zones within a region. It also offers high availability (99.99% SLA), ensuring that data is accessible when needed.
Storage Classes:
S3 offers different storage classes that are optimized for different use cases:
- S3 Standard: For frequently accessed data.
- S3 Intelligent-Tiering: Automatically moves data between access tiers when access patterns change.
- S3 Standard-IA (Infrequent Access): For data that is less frequently accessed but requires rapid access when needed.
- S3 One Zone-IA: Lower-cost option for infrequently accessed data that is stored in a single availability zone.
- S3 Glacier Instant Retrieval: Millisecond access for archive data accessed once a quarter or less.
- S3 Glacier Flexible Retrieval: Long-term archival with retrieval times from minutes to hours.
- S3 Glacier Deep Archive: The lowest-cost storage class for archival data that is rarely accessed (12-hour retrieval).
Security:
S3 provides multiple layers of security, including encryption at rest and in transit (SSE-S3 default since 2023), fine-grained access controls through IAM (Identity and Access Management) policies, bucket policies, and S3 Block Public Access enabled by default on new buckets.
Data Management:
S3 offers features like versioning, which allows you to keep multiple versions of an object; lifecycle policies, which enable automated transition of objects to different storage classes; replication (CRR/SRR) to other regions or buckets; and Object Lock for WORM compliance.
Integration with Other AWS Services:
S3 integrates seamlessly with many other AWS services such as AWS Lambda for serverless computing, AWS Athena for querying data stored in S3 using SQL, and AWS CloudFront for content delivery.
APIs and SDKs:
S3 provides a RESTful API and SDKs for multiple programming languages, making it easy to integrate S3 with custom applications.
Strong Read-After-Write Consistency:
Since 2020, S3 provides strong read-after-write consistency for all PUT and DELETE operations on every object — no more eventual-consistency surprises after overwrites.

Common Use Cases for Amazon S3:

Data Backup and Recovery: Storing backups of data that can be easily retrieved when needed.
Content Storage and Distribution: Hosting static content like images, videos, and documents, and serving them directly to end-users or through a CDN like AWS CloudFront.
Big Data Analytics: Storing large datasets for analytics, which can be processed using tools like AWS Athena, EMR, or Redshift Spectrum.
Archiving and Compliance: Long-term storage of data that must be retained for compliance reasons, using storage classes like S3 Glacier.
Data Lakes: Centralizing and storing diverse datasets from multiple sources, making them available for analysis and processing.
Static Website Hosting: Bucket configured as a website endpoint serving HTML/CSS/JS, typically fronted by CloudFront with ACM TLS.

Service Limits & Quotas:

Buckets per account: Default soft limit of 100; raisable to 1,000 via Service Quotas.
Object size: Maximum 5 TiB per object; single PUT max 5 GiB; objects above 100 MB should use multipart upload.
Multipart upload: 10,000 parts max per upload; each part 5 MiB to 5 GiB (last part can be smaller).
Request rate: 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD per second per partitioned prefix — automatically scales by adding more prefixes (no more "key randomization" required since 2018).
Bucket policy size: 20 KB.
Lifecycle rules per bucket: 1,000.
Replication rules per bucket: 1,000.
Tags per object: 10.

Pricing Model:

Storage: Per GB-month, varies by class (Standard ~$0.023, Standard-IA ~$0.0125, Glacier Deep Archive ~$0.00099 in us-east-1 — check current pricing).
Requests: Per 1,000 requests; PUT/POST/COPY more expensive than GET; LIST priced separately. Glacier classes have retrieval fees.
Data transfer: Inbound free; outbound to internet metered per GB; transfer between S3 and EC2/Lambda in same region is free; cross-region replication metered.
Management features: Inventory reports, S3 Storage Lens, replication time control (RTC), and Object Lambda each have per-object or per-request fees.
Free tier: 5 GB of S3 Standard, 20,000 GET, 2,000 PUT for 12 months for new accounts.
Common cost surprises: millions of small files in Standard-IA (priced with 128 KB minimum object size billing — small objects cost the same as 128 KB), early-deletion fees for IA/Glacier (30/90/180 day minimums), expensive Glacier expedited retrievals, NAT Gateway data processing when accessing S3 from a private subnet without an S3 Gateway Endpoint, and forgotten incomplete multipart uploads silently accruing storage cost (use a lifecycle rule to abort after 7 days).

Code Example:

Uploading a file with server-side encryption and a lifecycle-friendly storage class, then generating a presigned URL:

import boto3
from botocore.config import Config

s3 = boto3.client(
    "s3",
    region_name="us-west-2",
    config=Config(signature_version="s3v4"),
)

bucket = "my-data-lake-prod"
key = "incoming/2026-04-25/events.parquet"

# Multipart upload happens automatically for large files via upload_file
s3.upload_file(
    Filename="events.parquet",
    Bucket=bucket,
    Key=key,
    ExtraArgs={
        "ServerSideEncryption": "aws:kms",
        "SSEKMSKeyId": "alias/data-lake",
        "StorageClass": "INTELLIGENT_TIERING",
        "Metadata": {"source": "ingest-service", "date": "2026-04-25"},
    },
)

# Generate a 15-minute presigned download link
url = s3.generate_presigned_url(
    ClientMethod="get_object",
    Params={"Bucket": bucket, "Key": key},
    ExpiresIn=900,
)
print(url)

A lifecycle policy that transitions to IA at 30 days, Glacier at 90, and expires after 7 years:

{
  "Rules": [{
    "ID": "tier-and-expire",
    "Status": "Enabled",
    "Filter": {"Prefix": "logs/"},
    "Transitions": [
      {"Days": 30,  "StorageClass": "STANDARD_IA"},
      {"Days": 90,  "StorageClass": "GLACIER"}
    ],
    "Expiration": {"Days": 2555},
    "AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 7}
  }]
}

Common Interview Questions:

How does S3 achieve 11 nines of durability?

S3 Standard synchronously replicates each object across at least three Availability Zones (typically by erasure coding or full replication), continuously checksums data with MD5/CRC32C, and runs background scrubbing that detects and repairs bit rot. The 11-nines figure means you'd statistically expect to lose one object out of 100 billion per year.

When would you use S3 Intelligent-Tiering vs. lifecycle rules?

Intelligent-Tiering is best when access patterns are unknown or vary per object — S3 monitors per-object access and moves objects between Frequent/Infrequent/Archive tiers automatically (small monitoring fee per object). Lifecycle rules are best when you know the access pattern (logs are hot for 30 days then cold) — no monitoring fee, deterministic transitions.

What is the difference between a bucket policy and an IAM policy?

IAM policies are attached to identities (users, roles) and define what those identities can do across AWS. Bucket policies are attached to the bucket and define who can access the bucket — including cross-account principals or anonymous public access. They evaluate together: a request is allowed only if both the identity policy and the bucket policy permit it (and SCPs/permission boundaries don't deny).

How do you secure an S3 bucket against accidental public exposure?

Enable S3 Block Public Access at the account and bucket level (default since 2023), require SSE-KMS encryption, use bucket policies that deny non-TLS requests (aws:SecureTransport: false), set up AWS Config rules to alert on public buckets, and enable Access Analyzer for S3 to detect cross-account exposure. Use VPC endpoints to keep traffic off the public internet.

What's the right way to handle high request rates on a single bucket?

Since 2018, S3 automatically scales request rates per partitioned prefix — so simply distribute keys across many prefixes (e.g., year=2026/month=04/day=25/) and S3 partitions them transparently. The old advice of randomizing key prefixes is no longer needed. For peak traffic spikes, use CloudFront in front of S3 to absorb GETs at the edge.

How does S3 versioning interact with lifecycle and deletion?

With versioning enabled, a DELETE inserts a "delete marker" rather than removing the object — older versions remain billable. To actually free space, lifecycle rules should expire noncurrent versions (e.g., delete noncurrent versions after 30 days) and clean up expired delete markers. MFA Delete adds a second factor on permanent version deletion for high-security buckets.

Amazon S3 is a cornerstone of cloud storage in AWS, offering flexibility, reliability, and security for storing data at any scale.