AWS S3 - Advanced
๐ Moving Between Storage Classes
You can move objects between storage classes:
- Manually via AWS CLI/SDK
- Automatically using Lifecycle Rules
- Examples:
- From S3 Standard โ S3 Infrequent Access (IA) after 30 days
- From S3 IA โ S3 Glacier after 90 days
๐ ๏ธ CLI Example:
๐ Lifecycle Rules
Lifecycle rules automate transitions and deletions of S3 objects based on age, size, or version.
Rule Type | Description |
---|---|
Transition | Move to cheaper storage class after N days |
Expiration | Delete object after N days |
Non-current Version Expiration | Delete old versions of files |
ย
๐ง Useful for:
- Archiving
- Cost optimization
- Data retention policies
๐ Storage Class Analysis
S3 analyzes object access patterns to recommend storage class transitions.
- Helps identify infrequently accessed data
- Based on object size, last access time, frequency
- Works with lifecycle rules for automation
๐ Enable in Management tab โ After analysis โ Use to automate transitions.
๐ธ Requester Pays
Normally, the bucket owner pays for all access.
With Requester Pays, the user downloading data pays the transfer cost.
Use Case | Description |
---|---|
Data lakes or public datasets | Useful when you host large files |
Shared resources | Offload cost to API users |
ย
Enable via:
๐ฃ Event Notifications with Amazon EventBridge
You can configure S3 to emit events to EventBridge for actions like:
- Object created
- Object deleted
- Restore completed
Target Service | Use Case |
---|---|
Lambda | Process uploaded images |
SNS | Send SMS/email notifications |
SQS | Queue object creation events |
EventBridge | Trigger complex workflows or apps |
ย
๐ง This allows real-time processing and event-driven architecture.
โ๏ธ Baseline Performance
- S3 automatically scales for massive concurrency
- Recommended best practices:
- Use parallel uploads/downloads
- Use multipart upload for files >100MB
- Optimize prefixes (S3 now supports high parallelism even under same prefix)
๐ Each S3 prefix supports 3,500 PUTs/sec and 5,500 GETs/sec
๐ S3 Select & Glacier Select
-
S3 Select allows you to query part of a file using SQL, instead of downloading the entire file.
Example:
- Works with CSV, JSON, and Parquet
- Glacier Select enables similar queries on archived data (Glacier)
๐ฏ Reduces cost and speeds up processing for analytics workloads
๐งบ S3 Batch Operations
Used to perform large-scale operations on many S3 objects:
- Replace object metadata
- Restore objects from Glacier
- Copy objects across buckets
- Trigger Lambda functions on each object
๐งฐ Supports millions of objects in a single job
โ You specify:
- Manifest (list of objects)
- Operation (e.g.,
PUT
,DELETE
,COPY
) - Optional Lambda
๐ S3 Storage Lens
A centralized analytics dashboard that provides:
- Usage metrics
- Cost optimization suggestions
- Object count, size, age, storage class distribution
๐ง Useful for:
- Understanding storage trends
- Identifying unused data
- Enforcing data governance policies
๐ Found under: S3 โ Storage Lens
โ Summary Table
Feature | Purpose |
---|---|
Moving between storage classes | Cost optimization manually or via lifecycle rules |
Lifecycle Rules | Automate transitions/deletion of objects |
Storage Class Analysis | Suggests best class based on access patterns |
Requester Pays | Shifts cost to the user requesting the object |
EventBridge Integration | Enables real-time processing with serverless workflows |
Baseline Performance | High concurrency with optimized access patterns |
S3 Select / Glacier Select | Query object content without full download |
Batch Operations | Bulk operations on millions of objects |
S3 Storage Lens | Visibility into storage usage and optimization |