Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 5 Database

Loading ⏳
Please wait...

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for AWS Database Services

AWS Database Services

Overview of available AWS DB Services, their types, use cases, pros and cons.

Hitesh Sahu
Hitesh Sahu

Mon Sep 29 2025

Database Service

DB by Use Cases

Database Type Use Cases AWS Service
Relational Traditional applications, ERP, CRM, e-commerce Amazon Aurora , Amazon RDS , Amazon Redshift
Key-value High-traffic web apps, e-commerce systems, gaming applications Amazon DynamoDB
In-memory Caching, session management, gaming leader boards, geospatial applications Amazon ElastiCache , Amazon MemoryDB for Redis
Document Content management, catalogs, user profiles Amazon DocumentDB (with MongoDB compatibility)
Wide-column High scale industrial apps for equipment maintenance, fleet management, and route optimization Amazon Key spaces
Graph Fraud detection, social networking, recommendation engines Amazon Neptune
Time series IoT applications, DevOps, industrial telemetry Amazon Time stream
Ledger Systems of record, supply chain, registrations, banking transactions Amazon Quantum Ledger Database (QLDB)

Relational Database for OLTP (Online Transaction Processing)

Data Stored as Table-> Record-> Field(Value+ Primary Key)

1. Amazon RDS(RELATIONAL DATABASE SERVICE)

Manage relational databases in the AWS Cloud to support SQL

  • Manged DB: Automatic Provision & Patching with Monitoring Dashboard
  • Come in free tier
  • Scale Vertically & Horizontally
  • Backed by EBS(gp2, io1)
  • Data is encrypted when stored & transit(sent/ receive)
  • Disaster recovery using Multi AVZ replica
    • Continuos Backup to S3
    • Transactions Logs backup every 5 Minute
    • Point & Time Recovery:Disaster Recovery upto 5 Minute
    • Daily full backup of DB which store up to 35 days
  • Limitations
    • Cant do SSH to DB
    • Does not support active-active configuration with cross-region support.

RDS DB Type

Supported RDBMS(Relational DBMS):

DB Port IAM
Aurora 5432 (if PostgreSQL compatible) or 3306 (if MySQL compatible)
PostgreSQL 5432
MySQL 3306
MariaDB 3306 (same as MySQL)
Oracle RDS 1521 Does not support IAM
MSSQL Server 1433

Read Replica

Asynchronous Replication of DB to read replicas to scale read workload

  • 6 Replicated copies with over 15 read replicas of DB
  • Supported: Within AZ, Cross AZ, Cross Region
  • Cross AZ replication Traffic is free but Cross Region Replication is paid.
  • Each Replica can be promoted to main DB
  • Read Replica support only SELECT (Read)
  • Add new endpoints with their own DNS name.
  • Need to change application to reference new DNS individually to balance the read load.
  • Use Case: Run Analytics, Reporting on Read Replica

Multi AZ

Synchronous Replication of DB for Disaster Recovery

  • Increase Availability by automatic failover
  • Support Multi AVZ
  • Standby Database promoted to main DB in case of Disaster
  • StandbyDB does not support any Read/Write
  • Keeps the same connection string regardless of which database is up. When used does not require to change the SQL connection string

Encryption

Master & Read Replica can be encrypted using AWS KMS AES256

  • Encryption must be enabled while creating master DB
  • Snapshot of Encrypted RDS is Encrypted
  • Snapshot of Un-encrypted RDS is un-encrypted
  • Un-encrypted RDS can be encrypted: DB => snapshot =>copy snapshot as encrypted => restore DB from snapshot
  • Oracle & SQL Server can be encrypted using Transparent Data Encryption (TDE)
  • In Flight Encryption supported using SSL
  • Force SSL using
    • PostgreSQL: using Parameter Group: rds.force_ssl=1
    • MySQL: SQl Command: GRANT USAGE ON *.* TO mysqluser'@'%' REQUIRE SSL;

Security

  • Deploy DB in private VPC within Security Group. DB must not be internet facing
  • Use IAM policy for Access Management
  • IAM Authentication for MySQL & PostgreSQL

Autoscaling

Scale automatically to meet demand of DB to meet unpredictable DB demand

  • 10% storage left for more than 5 Minutes
  • Set Maximum Storage Threshold to avoid scaling infinitely

Amazon ARORA:

Fast & cost effective RDBMS Built for cloud

  • Backed by benefic of RDS but not in free tier
  • Self heal corrupted data
  • Support Cross Region Replication
  • 6 Replica across 3 AZ
    • 4/6 needed for Write
    • 3/6 needed for read
  • MySQL & PostgreSQL compatible
  • Fast: 5X MySQL & 3X PostgreSQL
  • Cost 20% more but more efficient
  • Scale automatically in multiple of 10GB up to 128TB

Aurora cluster: one Primary DB instance with up to 15 Read Only replicas.

  • Failover happen over 30 Second with help of 15 Replicas

Writer End Point

Connect to main DB for Write Operation

Reader End Point:

Connect to read replica for Connection load Balancing

  • Reader endpoint automatically performs load-balancing among all the Aurora Replicas.

  • Reader Endpoint can be custom defined:

  • DB instances in a multi-master cluster must be in the same AWS Region
  • Can't enable cross-Region replicas from multi-master clusters.

Aurora Global Databases

allows you to have an Aurora Replica in another AWS Region, with up to 5 secondary regions.

Security

  • Encryption ast rest using KMS
  • Encryption in flight using SSL
  • Authentication using IAM
  • Hide behind Security Group

NoSQL Database

1. Amazon DYNAMO DB

Serverless Fully Manged Highly Available DB with Replication over 3 AVZ

  • NoSQL DB with key values pair
  • Single Digit Latency: MilliSecond Response Time
  • High Performance: Automatic Scaling up to 10 Trillion request per day
  • Integrated with IAM
  • Low cost & Auto Scaling Capability
  • Offers active-active cross-region support
  • Multi AZ by default
  • DynamoDB Lamda-> Stream all changes in DB
  • backup & restore
  • Can only be sort by primary key & index
  • Stored as item <key,attributes> pair, where Key = Partition Key+Sort Key

Data Support

  • Max Size 400KB
  • Scaler: Null, Binary, Boolean, String, Number
  • Document Type: List , Map
  • Set Type: Binary Set, String Set, Number Set

Dynamo DB table

Local Secondary Index(LSI)

Alternative Sort key for Table along with partition Key

  • Query to a DynamoDB table using an attribute that's NOT part of your table's Primary Key. You need to use the >= predicate while keeping the same Partition Key.
  • Must defined on table creation
  • Data: Boolean, String, Number
  • Limit: 5/Table
  • Can contain one or more Attributes
  • UseCase: Server side filtering of data based on Attributes
  • Throttling Throttling of LSI DOES NOT Throttles MainTable
    • Uses same R/WCU as main Table

Global Secondary Index(GSI)

Alternate Primary Key :HASH Key, Partition Key, Sort Key

  • Query to a DynamoDB table using an attribute that's NOT part of your table's Primary Key. What should you do to make this query efficient?
  • Speed up queries on non key Attribute which can only be sort at Client side
  • Create a secondary table with Attribute/s as Index as Table
  • Data: Boolean, String, Number
  • Need to provisioned WCu & RCU for new Table
  • Throttling Throttling of GSI Throttles MainTable
    • Must provisioned R/WCU of GSI in advance carefully to not Bottleneck the MainTable

Capacity Units

RCU & WCU are Spread Evenly Across Partition of DB

  • A Hash code is generated using partition key to figure out partition for a Record

1. Write Capacity Unit(WCU):

Write/Sec for 1KB of Data

WCU  = NumberOfItems* Math.round(WriteSize in KB/1KB)
  • 6 item/Sec of 4.5KB = 64.5 = 65 = 30WCU <-- Round up 4.5KB to 6KB
  • 10 item/Sec of 2KB = 10*2 = 20WCU
  • 120 item/Min of 2KB = (120/60)*2 = 4WCU

Write Modes

  • Standard(Default)

Some Write may Fail in some table

  • Transactional

    Write works across all Table or none of them works

Write Types

  • Concurrent Write: Update Data At same time result in data overwrite
  • Conditional Write: Conditional Update data
  • Atomic Write: Both works
  • Batch Write: Update multiple values at same time

2. Read Capacity Unit(WCU):

1 Read/Sec for Strongly Consistent Read of 2 Read/Sec for for Eventually Consistent for Data 4KB

Read Modes

  • Eventually Consistent Read(Default)

    Get Stale data for read right after write in distributed read replicas

  • Strongly Consistent Read

    Get latest data for read right after write in distributed read replicas

    • ConsistentRead Parameter must be set true in: GetItem, Query, San, BatchGetItem
    • Consume 2 RCU/Read.
    • Higher latency & costly
  • Transactional

    Read data from multiple Table and get Transactional View

     RCU_Consistent  = NumberOfItems* Math.round(WriteSize in KB/ 4KB ) 

     RCU_EventuallyConsistent  = (NumberOfItems/2)* Math.round(WriteSize in KB/ 4KB ) 
  • 10 item/Sec SC of 4KB = 10*(4/4)= 1RCU
  • 16 item/Sec EC of 16KB = (16/2)*(12/4)= 24RCU
  • 10 item/Sec SC of 6KB = (10)*(8/4)= 20RCU <-- Round up 6 KB to 8KB

Read/Write Capacity Mode

Control Table Capacity

1. Provisioned Mode(Default)

  • Need to think beforehand capacity & R/W per Second
  • Pay for provisioned R/W Capacity Unit
  • Autoscaling can be enabled to increase capacity
  • Cost can be estimated.

Burst Capacity: Temporary increase Throughput when RCU/WCU exceed provisioned capacity

ProvisionedThroughputExceedException: When R/WCU exceed provisioned Capacity

  • Reason:
    • Hot Partition: Same Partition is used frequently
    • Hot Key
    • Very Large Item
  • Solutions
    • Exponential Back off
    • Distributed Partition Key
    • Use Dynamo DB Accelerator to reduce read load

2. On-Demand Mode

  • No capacity planning
  • For unpredictable load Scale Automatically
  • Unlimited R/W-CU but 2-3X expansive
  • Pay for what you use in term of Read Request Unit(RRU) & Write Request unit(WRU)
  • UseCase:unknown workload, unpredictable workload

Dynamo DB API:

API Usage

WRITE | PutItem | Put or Fully Replace item | | UpdateItem | Update or Edit item Attributes | | Conditional Write | Write/Update/Delete if condition met or send Error | READ | GetItem | Read 1 Item for primary key(HASH or HASH+RANGE). Supports both Strongly Consistent or Eventually Consistent Mode ProjectionExpression | Return only certain Attributes Query| Return Items based Key with Limit(no of item) or 1MB data | KeyConditionExpression| Query based on Partition Key with = Operator or SortKey with (=,<.<=>,>=, BeginsWith, Between) FilterExpression| Additional filter after Query before data is returned on NonKey Attribute Scan| Return entire table upto 1MB Data. Consumes Lots of RCU. Use LIMIT to limit number of data. Use Parallel Scan to increase throughput. Can pair with both FilterExpression & ProjectionExpression DELETE DeleteItem | Delete item with optional conditions DeleteTable | Drop Table BathOperation| Parallel Low Latency Operation on Multiple items. A part of batch can fail BatchWriteItem| Upto 25 PutItem and DeleteItem per call: 16MB(400KB/item) BatchGetItem| Upto 100 GetItem: 16MB in Parallel Call from 1 or more DB at a time

Dynamo DB CLI

  • --projection-expression: subset of attribute
  • --filter expression: filter data before return
  • --page-size: return subset of data (default 1000)
  • --max-items: max number of item to show in CLI(return NextToken)
  • --starting-token: retrieve next set of item starting form last NextToken

Dynamo DB Optimistic Locking

Optimize WRITE Update/Delete item only if they are changed.

  • Each item can have an Attribute with work as Version ID to avoid overwrite

Dynamo DB Stream

Ordered Stream of item level changes(create/update/delete) in DB table

  • List of all changes over time in DynamoDB
  • Retain for 24 Hour
  • Made of Shards provisioned by AWS
  • Retroactive: Only new modifications are shown as Stream.
  • Can choose information written to Stream
    • KEYS_ONLY: Attributes of Modified item
    • NEW_IMAGE: Entire item after changes
    • OLD_IMAGE: Entire Item before changes
    • NEW_AND_OLD_IMAGE: Both new & old Item Images

Use Case:

  • Kinesis Data Stream: for persisting Data
  • Lambda: Invoked Synchronously: Must have IAM permissions to poll from dynamoDb stream using Event Source Mapping
  • Kinesis Client Lib
  • Elastic Search: Indexing
  • Email for data processing
  • Global Table

Dynamo DB Global Table:

Active/ Active Replication of Data Table in Multiple region to provide Low Latency

  • Replicate DB in multiple region to make Data Accessible across multiple region at low latency
  • User can Read/Write from/to table close to them
  • Must have Dynamo DB stream enabled to allow replication

Dynamo DB Local Table:

Offline Local DB for testing

Clean Up

  • Scan + Delete-> Costly & Slow
  • Drop & Recreate Table-> Fast & More efficient

Copy/ BackUp

  • Data Pipeline--> External AWS Service using EMR to copy data
  • Backup & Restore DB: No additional service require
  • Scan+ BatchWrite--> Slow & Costly

Security

  • IAM: Access Control
  • VPC: Hide behind private VPC or use VPC End Point
  • KMS: encryption at rest
  • IdentityFederation: Temporary access using AWS Cognito to Modify own data
  • Leading Key: Access management for user at low level

Dynamo DB Transaction

Coordinated One All or Nothing Transaction to add/update/delete to multiple Table or none of them )

  • Provide: Atomicity, Consistency,Isolation, Durability(ACID)

  • ReadMode: Eventual Consistent, Strongly Consistent, Transactional

  • WriteMode: Standard, Transactional

  • Consume 2X R/WCU: Perform 2 Operation per item (prepare & commit)

    WCU =  WCU * 2
    RCU =  RCU * 2
    
  • API: upto 25 Item or 4MB of data

    • TransactGetItems: 1 or more GetItems per operation
    • TransactWriteItems: 1 or more PutItem, DeleteItem, UpdateItem per operation
  • Use case: Banks, Orders, Games

Expiry Time (ExpTime)

Auto delete a items after time expired TTL

  • Does not consume WCU
  • TTL must be number as Unix Epoch Timestamp
  • Deleted Within 48 Hours: Expired item might be shown in queries so must filter out client side
  • Auto Delete item form GSI, LSI
  • Deleted item can be recovered using Dynamo DB stream
  • Use Case: Session Data, Regulation

Session Sate

DynamoDB ElastiCache EFS
Serverless AutoScaling In memory Shared File System attache to EC2
Key/Value Key/Value File System

Amazon ElasticSearch (ES)/ OpenSearch

Search & Index any field or partial match in DB

  • Used with Big Data or NoSQl DB
  • Based on Elastic Search Project
  • Run on Server (not serverless)

ES Access Policy

IP based policy to allow access to CIDR block of limited IPs

  • Allow unsigned request using CURL, Kibana
  • Private Subnet can access using SigV4
  • Public CLient can Search via Proxy Server in public Subnet

Kibana does not support IAM Authentication

use following to authenticate access to ES

  • SAML
  • HTTP Basic AUthentication
  • Cognito (using AD)

ES Production Setup

  • VPC across 3 AVZ
  • 3 Master Nodes: 1 per AVZ, 2 on standby 1 Active
  • 6 Data Nodes: 2 per AVZ/Master Node. All connected with Active Master Node

2. Amazon DocumentDB:

MongoDB Database service

  • Aurora Version for MongoDB
  • Same features as Aurora

3. Amazon Neptune:

Graph database service.

  • Highly available 3 AVZ & 15 Replica
  • Can hold Billion of relation
  • Point in Time Recovery, Multi AZ Clustering

InMemory Database

1. Amazon DynamoDB Accelerator (DAX):

Fully Manged in-memory cache used only for DynamoDB.

  • 10X performance with Dynamo DB
  • 5 Minute TTL for Cached Data
  • Upto 10 Nodes per Cluster.
  • Multi AZ: 3 Node /AVZ
  • Does not need to modify app code as it only add a layer
  • Micro Second Latency
  • Highly Scalable & Available
  • Does not offer active-active cross-Region configuration.
  • Solve Problem of HotKey: Reduce read of same data for a key above RCU
  • Support IAM authentication

2. Amazon ElastiCache:

service to add caching layers to help improve the read times of common requests.

  • In memory manged DB for All DB type
  • Can be used along with DAX to store aggregated result
  • Faster retrieve for common result
  • Need to modify code a lot
  • Support for Clustering in Redis to Multi AVZ
  • Backup, Snapshot, point in time restore
  • Monitor through Cloudwatch
  • Pay per hour for EC2 type used & Storage used
  • DO Not support IAM Bases Authentication

REDIS MEMCacheD
MultiAZ Auto Failover Multi Node for Partition of Data(Sharding)
Highly Available No High Availability
Backup & Restore No Backup & Restore
Data Durability Non persistent
Can also work as DB High Performance Multi Threaded Distributed Cache
Redis AUTH for Encryption on Rest SASL based Authentication
SSL for Encryption in Transit
HIPAA compliant

Caching pattern

  1. Lazy Loading: All Data is loaded when cache miss
  2. Write Through: Add to Cache when added/updated to DB
  3. Session: Store session data

Elasticache


Big DATA & Business Analytics/ Intelligence OLAP(Online Analytical Processing )

Amazon Glue

Manged Extract, Transform & Load (ETL) Service to transform data

  • Serverless service to transform data for analytics

Amazon Glue

Glue Data Catalog

MetaData Catalog of DataSet created by Glue provide Column Info to Analytics Service

Amazon Glue

1. Amazon Red Shift

Data Analysis Service for Data Warehouse, OLAP

  • Work well with Huge Volume & Variety of data
  • 10X performance for Data Warehouse & Analytics
  • Based on PostgreSQL
  • Data is Store in Column
  • Scale to PetaByte & Highly Available
  • Load data every hour
  • Massive parallel Query Execution (MPP) with SQL interface for query
  • BI Tools integrated to create Dashboard
  • Load from S3, Dynamo DB, DMS
  • Support upto 125 Node with each node can handle 128 TB
  • Node can be Leader Node & Query Node
  • Redshift enhanced VPC Routing: COPY/UNLOAD go through S3 instead of Internet
  • No Multi AZ Mode, snapshot allow working in other AZ

Loading Data to Redshift:

  • Amazon Kinesis FireHose: Put Data to S3 & Route Data to Cluster
  • Route Data through Internet or enhanced VPC routing
  • Use JDBC Connector

RedShift Spectrum:

Efficiently query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables.

2. Amazon Elastic Map Reduce (EMR)

Provision to Create Cluster of EC2 instance

  • For Cluster Analysis of Big Data
  • Can be used to create
    • Hadoop Cluster for Big Data Analysis
    • Apache Spark
    • HBase, Presto, fling
  • Autoscaling & spot instance
  • Usage: BigData, ML, Indexing

3. Amazon ATHENA

Serverless Query service to perform Analytics against S3 Objects

  • Use SQL to Query files in S3
  • Work on top of CSV, JSON, ORC,Avro, Parquet
  • Athena Query & Analyze the Data & create report on Amazon QuickSight
  • Price: $5/TB Data scan
  • Pay per query/ per TB of data scanned: compressed dta scan save cost
  • Usage: Analytics, BI, reporting, Analysis, Flow Log,

QuickSights

Serverless ML powered Business Intelligence service to create Interactive Dashboards

  • Give insights of data in cool dashboard
  • Integrated on top of Aurora, Athena, Redshift, S3 ...
  • Usage; Business Insights & Analytics, Visualization,

Ledger

1. Amazon (QLDB Quantum Ledger Database)

ledger database service. Review a complete history of all the changes that have been made to your application data.

  • Central Manged DB
  • Immutable: Write one Read many
  • Fully Manged, serverless, highly available
  • Hold Sequence of Modification using Crypt Hash
  • 2X3 Time better performance
  • SQL Support
  • Usage: Financial Transaction

2. Amazon Managed Blockchain:

service to create and manage blockchain networks with open-source frameworks. distributed ledger system

  • DeCentralized DB
  • Public Block chain network with support for Ethereum

AWS DB MIGRATION SERVICE(DMS)

Lift & Shift Migrate DB to EC2 securely & easily between Source & Target DB.

  • Source can remain operation to reduce down time

  • Support Multi AZ Migration

  • EC2 instance perform the data migration

  • Source:

    • On Premises: Oracle,MS SQL Server, MySQL, Maria DB, PostgreSQL, MongoDB, DB2, SAP
    • AWS: RDS, S3
    • Azure SQL Server
  • Target:

    • On Premises: Oracle,MS SQL Server, MySQL, Maria DB, PostgreSQL, MongoDB, DB2, SAP
    • DynamoDB, RDS, Redshift, S3, ElasticSearch, Kinesis, Document DB
  • DMS supports specifying Amazon S3 as the source and streaming services like Kinesis and Amazon Managed Streaming of Kafka (Amazon MSK) as the target.

Schema Conversion Tool(SCT)

Convert Data base schema from one engine to another

  • Source DB--> SCT +DMS --> Target DB

SCT

Support:

- Homogeneous Migration

between same & target type eg SQL to SQL

  • Not need SCT

- Heterogeneous Migration

between different source & target type eg. SQL to noSQL.

  • Need SCT

Use cases

  • Develop & test DB Migration
  • Consolidate: combine several DB into one DB
  • Continuos Replication: ongoing copies

Pricing

  • Pay for Instance
    • Pay as Go
    • Reserve Instance
  • pay for Storage & IO
  • Pay for data transfer between region

AWS DATA SYNC

Incremental Scheduled sync of large amount of data data from on premise server to S3 (include Glacier), FSX for Windows, EFS

  • Scheduled Replication: replication can be scheduled on regular interval hourly, weekly
  • Use Data Sync Agent to connect to system on premises or EFS
  • Move Data using NFS or SMB protocols
  • Can Sync EFS across regions
  • Possible to set bandwidth to reduce network usage
  • Copy un-encrypted EFS to encrypted EFS or move data across EFS
  • CopyObject APIs to copy objects between S3 buckets

Data Sync

AWS BACKUP

Fully manged Automate Backup process to backup resources in S3 cloud

Automate/manage cross-regions backups of AWS resources:

  • EC2, Storage Gateway
  • RDS, Aurora, DynamoDB,
  • EFS, FSX, EBS

AWS BACKUP

  • On demand or automatic scheduled backup based on Backup Plan(policy)

    • Frequency: Can be Daily, weekly Monthly.
    • Transition to cold storage
    • Retention period
    • Resource with tag can be added to backup plan
  • Support Cross Region/Account backup

  • Serverless: No need to create script to run backup

  • Support Point in Time recovery (PITR)

CLOUD ENDURE

Continuos replication of Data from corporate server to cloud

  • Quickly migrate to cloud in case of business server fail