What is AWS SQS (Amazon Simple Queue Service)?
Share
AWS SQS (Simple Queue Service) is a fully managed message queuing service that enables decoupling and asynchronous communication between microservices, distributed systems, or serverless applications.
Key Features:
Feature | Description |
---|---|
Fully Managed | No infrastructure to manage — AWS handles scaling & reliability |
Message Queues | Stores and transmits messages between components |
Decoupling | Services don’t need to know about each other's availability |
At-least-once Delivery | Messages are delivered at least once (may be duplicate) |
Durable | Messages are stored across multiple AZs |
Use Case Example:
Imagine an e-commerce app:
- User places an order → Message sent to SQS
- Order processing system polls SQS → Processes order asynchronously
This helps ensure:
- Loose coupling
- Scalability
- Reliability even if processing is delayed
Types of SQS Queues:
Queue Type | Description |
---|---|
Standard Queue | Default; high throughput, at-least-once delivery, may have duplicate messages |
FIFO Queue | Guarantees First-In-First-Out delivery and exactly-once processing |
Sample Workflow (Using Java or Boto3):
-
Send message:
SendMessage API → Queue
-
Receive message:
ReceiveMessage API → Consumer polls queue
-
Delete message after processing:
DeleteMessage API
Security & Management:
- Access Control via IAM
- Message Encryption with KMS
- Dead Letter Queues for failed messages
Summary:
Feature | Description |
---|---|
Service | Managed message queuing |
Use Case | Microservices, async processing |
Types | Standard & FIFO queues |
Integration | Works with Lambda, EC2, ECS, etc. |
What is Visibility Timeout in AWS SQS?
Visibility Timeout is a temporary lock on a message after it's retrieved from the queue.
It prevents other consumers from processing the same message while it’s being worked on.
How It Works:
- A consumer calls
ReceiveMessage
→ gets a message from the SQS queue. - The message becomes invisible for the duration of the visibility timeout.
- If the consumer successfully processes and deletes the message → all good.
- If it fails to delete (e.g. crash), after timeout, the message becomes visible again → another consumer can process it.
Default & Range:
Setting | Value |
---|---|
Default timeout | 30 seconds |
Range | 0 to 12 hours (43,200 sec) |
Real-Life Analogy:
Think of it like taking a test paper from a pile. You get exclusive access for a time. If you submit it, it’s done. If not, someone else gets a chance after your time runs out.
Best Practices:
- Set visibility timeout based on processing time.
- Use
ChangeMessageVisibility
API to extend timeout if processing takes longer. - Combine with Dead Letter Queues (DLQ) for failed retries.
🧬 Example:
# Step 1: Receive message (starts 30s visibility timeout)
ReceiveMessage
# Step 2: Process message
# Step 3: Delete if done
DeleteMessage
# If not deleted → message reappears after timeout
Important Note:
Visibility timeout is not a message lock — it's a temporary hiding mechanism.
You must still explicitly delete the message after processing to prevent reprocessing.
What is SQS Long Polling?
SQS Long Polling is a feature that reduces cost and improves efficiency by waiting for messages to arrive in the queue instead of returning immediately if the queue is empty.
Key Difference:
Polling Type | Behavior |
---|---|
Short Polling | Instantly checks for messages and returns — even if queue is empty |
Long Polling | Waits up to 20 seconds for messages to arrive before returning |
How It Works:
- You set a
WaitTimeSeconds
(0–20 seconds) inReceiveMessage
API. - If the queue is empty, SQS waits up to that time for a message.
- If a message arrives during that time → it is returned immediately.
Example (Using AWS CLI):
--queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
--wait-time-seconds 20
Benefits of Long Polling:
Advantage | Why It Matters |
---|---|
✅ Reduces API calls | Fewer empty responses |
✅ Lowers cost | You pay per API call |
✅ Improves latency | Messages are delivered as soon as available |
Ways to Enable Long Polling:
-
Per request – set
WaitTimeSeconds
inReceiveMessage
API call - Default setting – configure the queue's default polling time
Summary:
Feature | Long Polling |
---|---|
Wait time | Up to 20 seconds |
Cost-effective? | ✅ Yes (fewer empty responses) |
Better than short polling? | ✅ Yes, for most use cases |
Integrating AWS SQS with Auto Scaling Group (ASG)
Combining Amazon SQS with an Auto Scaling Group (ASG) lets you dynamically scale EC2 instances based on the number of messages in a queue — ideal for decoupled, event-driven architectures like job processing, image rendering, or order handling.
Why Integrate SQS with ASG?
- Automatically add EC2 instances when message load increases.
- Reduce costs by scaling in when the queue is empty.
- Ideal for consumer-based architecture (e.g., worker nodes pulling tasks from a queue).
Architecture Flow:
⬇
Each instance polls and processes messages
Steps to Integrate SQS with ASG:
1. Create SQS Queue
- Standard or FIFO, depending on use case.
- Set proper permissions and visibility timeout.
2. Create a Launch Template or Launch Configuration
- Use an EC2 AMI configured with:
- A script or service that pulls messages from SQS.
- Necessary IAM roles, Java/Python runtime, etc.
3. Create Auto Scaling Group
- Attach the launch template
- Set min, max, and desired instance count
4. Create CloudWatch Alarms on SQS Metrics
Use the ApproximateNumberOfMessagesVisible metric:
- Create Alarm #1: High queue depth → scale out
- Create Alarm #2: Low/zero queue depth → scale in
5. Attach Alarms to ASG Policies
- Scale out when messages exceed a threshold (e.g., >50)
- Scale in when messages drop below a threshold (e.g., <10)
Example CloudWatch Metric Rule:
Metric: ApproximateNumberOfMessagesVisible
Threshold: > 50
Action: Add 1 EC2 instance
IAM Role Permissions Needed:
- EC2 Instance:
sqs:ReceiveMessage
,sqs:DeleteMessage
,sqs:GetQueueAttributes
- ASG Role: CloudWatch and EC2 Auto Scaling permissions
Best Practices:
- Use long polling in your EC2 message consumers to reduce API costs.
- Set the Visibility Timeout based on processing time.
- Use Dead Letter Queue (DLQ) for failed message handling.
- Enable detailed monitoring on SQS and ASG for faster response.
Summary:
Component | Role |
---|---|
SQS | Message buffer |
ASG | Dynamically adjusts EC2 capacity |
CloudWatch | Triggers scaling policies based on queue size |
EC2 Instances | Consume messages and process them asynchronously |
Using Amazon SQS as a Buffer for Database Writes
Amazon SQS can act as a reliable buffer between high-traffic producers (like APIs or applications) and back-end databases, helping to:
- Smooth out spikes in write traffic
- Prevent DB overload
- Increase system reliability and decoupling
Architecture Flow:
⬇
Send message to SQS
⬇
[Worker/Consumer Service]
⬇
Read message → Validate/Transform → Write to DB
Why Use SQS Before DB Writes?
Problem | Solution via SQS |
---|---|
High write bursts to DB | SQS absorbs traffic and queues it |
Risk of data loss on failure | Durable, fault-tolerant queueing |
Tight coupling of services | Decouples producers and DB writers |
Scaling complexity | Easily scale consumers dynamically |
Example Scenario: Order Processing System
- User places order → API sends order data to SQS.
- Worker service polls SQS, validates the message.
- Writes data into RDS, DynamoDB, or any target DB.
This setup ensures:
- No dropped orders even if DB is slow/down temporarily
- You can retry failed writes
- Enables asynchronous processing
Key Components to Configure:
Component | Recommendation |
---|---|
SQS Queue | Standard queue (or FIFO if strict order is needed) |
Visibility Timeout | Set based on DB write time (e.g., 30–60s) |
Dead Letter Queue (DLQ) | For failed writes after max retries |
Worker/Consumer | Use long polling and exponential backoff |
Tips for Reliable Implementation:
- Use batch processing (up to 10 messages per batch) to optimize DB writes.
- Make DB writes idempotent to handle duplicate messages.
- Store a message ID or unique key in the DB to prevent double entries.
- Monitor queue length and scale consumers with Auto Scaling or Lambda.
Summary:
Benefit | Description |
---|---|
✅ Scalability | Buffer high-throughput without choking DB |
✅ Reliability | Messages persisted across failures |
✅ Decoupling | Independent scaling of producers/consumers |
✅ Retry Handling | DLQs for failed message writes |