What is AWS SQS (Amazon Simple Queue Service)?

AWS SQS (Simple Queue Service) is a fully managed message queuing service that enables decoupling and asynchronous communication between microservices, distributed systems, or serverless applications.


Key Features:

Feature Description
Fully Managed No infrastructure to manage — AWS handles scaling & reliability
Message Queues Stores and transmits messages between components
Decoupling Services don’t need to know about each other's availability
At-least-once Delivery Messages are delivered at least once (may be duplicate)
Durable Messages are stored across multiple AZs


Use Case Example:

Imagine an e-commerce app:

  • User places an order → Message sent to SQS
  • Order processing system polls SQS → Processes order asynchronously

This helps ensure:

  • Loose coupling
  • Scalability
  • Reliability even if processing is delayed

Types of SQS Queues:

Queue Type Description
Standard Queue Default; high throughput, at-least-once delivery, may have duplicate messages
FIFO Queue Guarantees First-In-First-Out delivery and exactly-once processing

Sample Workflow (Using Java or Boto3):

  1. Send message:

    SendMessage API → Queue

  2. Receive message:

    ReceiveMessage API → Consumer polls queue

  3. Delete message after processing:

    DeleteMessage API

Security & Management:

  • Access Control via IAM
  • Message Encryption with KMS
  • Dead Letter Queues for failed messages

Summary:

Feature Description
Service Managed message queuing
Use Case Microservices, async processing
Types Standard & FIFO queues
Integration Works with Lambda, EC2, ECS, etc.


What is Visibility Timeout in AWS SQS?

Visibility Timeout is a temporary lock on a message after it's retrieved from the queue.
It prevents other consumers from processing the same message while it’s being worked on.


How It Works:

  1. A consumer calls ReceiveMessage → gets a message from the SQS queue.
  2. The message becomes invisible for the duration of the visibility timeout.
  3. If the consumer successfully processes and deletes the message → all good.
  4. If it fails to delete (e.g. crash), after timeout, the message becomes visible again → another consumer can process it.

Default & Range:

Setting Value
Default timeout 30 seconds
Range 0 to 12 hours (43,200 sec)

Real-Life Analogy:

Think of it like taking a test paper from a pile. You get exclusive access for a time. If you submit it, it’s done. If not, someone else gets a chance after your time runs out.


Best Practices:

  • Set visibility timeout based on processing time.
  • Use ChangeMessageVisibility API to extend timeout if processing takes longer.
  • Combine with Dead Letter Queues (DLQ) for failed retries.

🧬 Example:

# Step 1: Receive message (starts 30s visibility timeout)
ReceiveMessage

# Step 2: Process message

# Step 3: Delete if done
DeleteMessage

# If not deleted → message reappears after timeout


Important Note:

Visibility timeout is not a message lock — it's a temporary hiding mechanism.
You must still explicitly delete the message after processing to prevent reprocessing.

What is SQS Long Polling?

SQS Long Polling is a feature that reduces cost and improves efficiency by waiting for messages to arrive in the queue instead of returning immediately if the queue is empty.


Key Difference:

Polling Type Behavior
Short Polling Instantly checks for messages and returns — even if queue is empty
Long Polling Waits up to 20 seconds for messages to arrive before returning


How It Works:

  • You set a WaitTimeSeconds (0–20 seconds) in ReceiveMessage API.
  • If the queue is empty, SQS waits up to that time for a message.
  • If a message arrives during that time → it is returned immediately.

Example (Using AWS CLI):

aws sqs receive-message \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
  --wait-time-seconds 20


Benefits of Long Polling:

Advantage Why It Matters
Reduces API calls Fewer empty responses
Lowers cost You pay per API call
Improves latency Messages are delivered as soon as available


Ways to Enable Long Polling:

  1. Per request – set WaitTimeSeconds in ReceiveMessage API call
  2. Default setting – configure the queue's default polling time

Summary:

Feature Long Polling
Wait time Up to 20 seconds
Cost-effective? ✅ Yes (fewer empty responses)
Better than short polling? ✅ Yes, for most use cases


Integrating AWS SQS with Auto Scaling Group (ASG)

Combining Amazon SQS with an Auto Scaling Group (ASG) lets you dynamically scale EC2 instances based on the number of messages in a queue — ideal for decoupled, event-driven architectures like job processing, image rendering, or order handling.


Why Integrate SQS with ASG?

  • Automatically add EC2 instances when message load increases.
  • Reduce costs by scaling in when the queue is empty.
  • Ideal for consumer-based architecture (e.g., worker nodes pulling tasks from a queue).

Architecture Flow:

[SQS Queue] → [Auto Scaling Group with EC2 instances]
                     ⬇
         Each instance polls and processes messages


Steps to Integrate SQS with ASG:

1. Create SQS Queue

  • Standard or FIFO, depending on use case.
  • Set proper permissions and visibility timeout.

2. Create a Launch Template or Launch Configuration

  • Use an EC2 AMI configured with:
    • A script or service that pulls messages from SQS.
    • Necessary IAM roles, Java/Python runtime, etc.

3. Create Auto Scaling Group

  • Attach the launch template
  • Set min, max, and desired instance count

4. Create CloudWatch Alarms on SQS Metrics

Use the ApproximateNumberOfMessagesVisible metric:

  • Create Alarm #1: High queue depth → scale out
  • Create Alarm #2: Low/zero queue depth → scale in

5. Attach Alarms to ASG Policies

  • Scale out when messages exceed a threshold (e.g., >50)
  • Scale in when messages drop below a threshold (e.g., <10)

Example CloudWatch Metric Rule:

Alarm Name: SQS-ScaleOut
Metric: ApproximateNumberOfMessagesVisible
Threshold: > 50
Action: Add 1 EC2 instance


IAM Role Permissions Needed:

  • EC2 Instance: sqs:ReceiveMessage, sqs:DeleteMessage, sqs:GetQueueAttributes
  • ASG Role: CloudWatch and EC2 Auto Scaling permissions

Best Practices:

  • Use long polling in your EC2 message consumers to reduce API costs.
  • Set the Visibility Timeout based on processing time.
  • Use Dead Letter Queue (DLQ) for failed message handling.
  • Enable detailed monitoring on SQS and ASG for faster response.

Summary:

Component Role
SQS Message buffer
ASG Dynamically adjusts EC2 capacity
CloudWatch Triggers scaling policies based on queue size
EC2 Instances Consume messages and process them asynchronously

 

Using Amazon SQS as a Buffer for Database Writes

Amazon SQS can act as a reliable buffer between high-traffic producers (like APIs or applications) and back-end databases, helping to:

  • Smooth out spikes in write traffic
  • Prevent DB overload
  • Increase system reliability and decoupling

Architecture Flow:

[Producer App/API]
      ⬇
 Send message to SQS
      ⬇
[Worker/Consumer Service]
      ⬇
 Read message → Validate/Transform → Write to DB


Why Use SQS Before DB Writes?

Problem Solution via SQS
High write bursts to DB SQS absorbs traffic and queues it
Risk of data loss on failure Durable, fault-tolerant queueing
Tight coupling of services Decouples producers and DB writers
Scaling complexity Easily scale consumers dynamically


Example Scenario: Order Processing System

  1. User places order → API sends order data to SQS.
  2. Worker service polls SQS, validates the message.
  3. Writes data into RDS, DynamoDB, or any target DB.

This setup ensures:

  • No dropped orders even if DB is slow/down temporarily
  • You can retry failed writes
  • Enables asynchronous processing

Key Components to Configure:

Component Recommendation
SQS Queue Standard queue (or FIFO if strict order is needed)
Visibility Timeout Set based on DB write time (e.g., 30–60s)
Dead Letter Queue (DLQ) For failed writes after max retries
Worker/Consumer Use long polling and exponential backoff


Tips for Reliable Implementation:

  • Use batch processing (up to 10 messages per batch) to optimize DB writes.
  • Make DB writes idempotent to handle duplicate messages.
  • Store a message ID or unique key in the DB to prevent double entries.
  • Monitor queue length and scale consumers with Auto Scaling or Lambda.

Summary:

Benefit Description
✅ Scalability Buffer high-throughput without choking DB
✅ Reliability Messages persisted across failures
✅ Decoupling Independent scaling of producers/consumers
✅ Retry Handling DLQs for failed message writes
Back to blog

Leave a comment