Amazon S3 (Simple Storage Service) is a scalable object storage service provided by Amazon Web Services (AWS). It is designed for storing and retrieving any amount of data from anywhere on the internet, offering a range of features that make it suitable for a wide variety of use cases, from data backup to serving large-scale applications.
S3 is designed to handle virtually unlimited amounts of data, automatically scaling up or down based on your needs.
S3 Standard provides 99.999999999% (11 nines) durability by replicating data across multiple geographically dispersed Availability Zones within a region. It also offers high availability (99.99% SLA), ensuring that data is accessible when needed.
S3 offers different storage classes that are optimized for different use cases:
S3 provides multiple layers of security, including encryption at rest and in transit (SSE-S3 default since 2023), fine-grained access controls through IAM (Identity and Access Management) policies, bucket policies, and S3 Block Public Access enabled by default on new buckets.
S3 offers features like versioning, which allows you to keep multiple versions of an object; lifecycle policies, which enable automated transition of objects to different storage classes; replication (CRR/SRR) to other regions or buckets; and Object Lock for WORM compliance.
S3 integrates seamlessly with many other AWS services such as AWS Lambda for serverless computing, AWS Athena for querying data stored in S3 using SQL, and AWS CloudFront for content delivery.
S3 provides a RESTful API and SDKs for multiple programming languages, making it easy to integrate S3 with custom applications.
Since 2020, S3 provides strong read-after-write consistency for all PUT and DELETE operations on every object — no more eventual-consistency surprises after overwrites.
Uploading a file with server-side encryption and a lifecycle-friendly storage class, then generating a presigned URL:
import boto3
from botocore.config import Config
s3 = boto3.client(
"s3",
region_name="us-west-2",
config=Config(signature_version="s3v4"),
)
bucket = "my-data-lake-prod"
key = "incoming/2026-04-25/events.parquet"
# Multipart upload happens automatically for large files via upload_file
s3.upload_file(
Filename="events.parquet",
Bucket=bucket,
Key=key,
ExtraArgs={
"ServerSideEncryption": "aws:kms",
"SSEKMSKeyId": "alias/data-lake",
"StorageClass": "INTELLIGENT_TIERING",
"Metadata": {"source": "ingest-service", "date": "2026-04-25"},
},
)
# Generate a 15-minute presigned download link
url = s3.generate_presigned_url(
ClientMethod="get_object",
Params={"Bucket": bucket, "Key": key},
ExpiresIn=900,
)
print(url)
A lifecycle policy that transitions to IA at 30 days, Glacier at 90, and expires after 7 years:
{
"Rules": [{
"ID": "tier-and-expire",
"Status": "Enabled",
"Filter": {"Prefix": "logs/"},
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER"}
],
"Expiration": {"Days": 2555},
"AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 7}
}]
}
S3 Standard synchronously replicates each object across at least three Availability Zones (typically by erasure coding or full replication), continuously checksums data with MD5/CRC32C, and runs background scrubbing that detects and repairs bit rot. The 11-nines figure means you'd statistically expect to lose one object out of 100 billion per year.
Intelligent-Tiering is best when access patterns are unknown or vary per object — S3 monitors per-object access and moves objects between Frequent/Infrequent/Archive tiers automatically (small monitoring fee per object). Lifecycle rules are best when you know the access pattern (logs are hot for 30 days then cold) — no monitoring fee, deterministic transitions.
IAM policies are attached to identities (users, roles) and define what those identities can do across AWS. Bucket policies are attached to the bucket and define who can access the bucket — including cross-account principals or anonymous public access. They evaluate together: a request is allowed only if both the identity policy and the bucket policy permit it (and SCPs/permission boundaries don't deny).
Enable S3 Block Public Access at the account and bucket level (default since 2023), require SSE-KMS encryption, use bucket policies that deny non-TLS requests (aws:SecureTransport: false), set up AWS Config rules to alert on public buckets, and enable Access Analyzer for S3 to detect cross-account exposure. Use VPC endpoints to keep traffic off the public internet.
Since 2018, S3 automatically scales request rates per partitioned prefix — so simply distribute keys across many prefixes (e.g., year=2026/month=04/day=25/) and S3 partitions them transparently. The old advice of randomizing key prefixes is no longer needed. For peak traffic spikes, use CloudFront in front of S3 to absorb GETs at the edge.
With versioning enabled, a DELETE inserts a "delete marker" rather than removing the object — older versions remain billable. To actually free space, lifecycle rules should expire noncurrent versions (e.g., delete noncurrent versions after 30 days) and clean up expired delete markers. MFA Delete adds a second factor on permanent version deletion for high-security buckets.
Amazon S3 is a cornerstone of cloud storage in AWS, offering flexibility, reliability, and security for storing data at any scale.