Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

AWS Database Services

Overview of available AWS DB Services, their types, use cases, pros and cons.

Written by Hitesh Sahu, a passionate developer and blogger.

Mon Sep 29 2025

Share This on

Database Service

DB by Use Cases

Database Type	Use Cases	AWS Service
Relational	Traditional applications, ERP, CRM, e-commerce	Amazon Aurora , Amazon RDS , Amazon Redshift
Key-value	High-traffic web apps, e-commerce systems, gaming applications	Amazon DynamoDB
In-memory	Caching, session management, gaming leader boards, geospatial applications	Amazon ElastiCache , Amazon MemoryDB for Redis
Document	Content management, catalogs, user profiles	Amazon DocumentDB (with MongoDB compatibility)
Wide-column	High scale industrial apps for equipment maintenance, fleet management, and route optimization	Amazon Key spaces
Graph	Fraud detection, social networking, recommendation engines	Amazon Neptune
Time series	IoT applications, DevOps, industrial telemetry	Amazon Time stream
Ledger	Systems of record, supply chain, registrations, banking transactions	Amazon Quantum Ledger Database (QLDB)

Relational Database for OLTP (Online Transaction Processing)

Data Stored as Table-> Record-> Field(Value+ Primary Key)

1. Amazon RDS(RELATIONAL DATABASE SERVICE)

Manage relational databases in the AWS Cloud to support SQL

Manged DB: Automatic Provision & Patching with Monitoring Dashboard
Come in free tier
Scale Vertically & Horizontally
Backed by EBS(gp2, io1)
Data is encrypted when stored & transit(sent/ receive)
Disaster recovery using Multi AVZ replica
- Continuos Backup to S3
- Transactions Logs backup every 5 Minute
- Point & Time Recovery:Disaster Recovery upto 5 Minute
- Daily full backup of DB which store up to 35 days
Limitations
- Cant do SSH to DB
- Does not support active-active configuration with cross-region support.

RDS DB Type

Supported RDBMS(Relational DBMS):

DB	Port	IAM
Aurora	5432 (if PostgreSQL compatible) or 3306 (if MySQL compatible)
PostgreSQL	5432
MySQL	3306
MariaDB	3306 (same as MySQL)
Oracle RDS	1521	Does not support IAM
MSSQL Server	1433

Read Replica

Asynchronous Replication of DB to read replicas to scale read workload

6 Replicated copies with over 15 read replicas of DB
Supported: Within AZ, Cross AZ, Cross Region
Cross AZ replication Traffic is free but Cross Region Replication is paid.
Each Replica can be promoted to main DB
Read Replica support only SELECT (Read)
Add new endpoints with their own DNS name.
Need to change application to reference new DNS individually to balance the read load.
Use Case: Run Analytics, Reporting on Read Replica

Multi AZ

Synchronous Replication of DB for Disaster Recovery

Increase Availability by automatic failover
Support Multi AVZ
Standby Database promoted to main DB in case of Disaster
StandbyDB does not support any Read/Write
Keeps the same connection string regardless of which database is up. When used does not require to change the SQL connection string

Encryption

Master & Read Replica can be encrypted using AWS KMS AES256

Encryption must be enabled while creating master DB
Snapshot of Encrypted RDS is Encrypted
Snapshot of Un-encrypted RDS is un-encrypted
Un-encrypted RDS can be encrypted: DB => snapshot =>copy snapshot as encrypted => restore DB from snapshot
Oracle & SQL Server can be encrypted using Transparent Data Encryption (TDE)
In Flight Encryption supported using SSL
Force SSL using
- PostgreSQL: using Parameter Group: rds.force_ssl=1
- MySQL: SQl Command: GRANT USAGE ON *.* TO mysqluser'@'%' REQUIRE SSL;

Security

Deploy DB in private VPC within Security Group. DB must not be internet facing
Use IAM policy for Access Management
IAM Authentication for MySQL & PostgreSQL

Autoscaling

Scale automatically to meet demand of DB to meet unpredictable DB demand

10% storage left for more than 5 Minutes
Set Maximum Storage Threshold to avoid scaling infinitely

Amazon ARORA:

Fast & cost effective RDBMS Built for cloud

Backed by benefic of RDS but not in free tier
Self heal corrupted data
Support Cross Region Replication
6 Replica across 3 AZ
- 4/6 needed for Write
- 3/6 needed for read
MySQL & PostgreSQL compatible
Fast: 5X MySQL & 3X PostgreSQL
Cost 20% more but more efficient
Scale automatically in multiple of 10GB up to 128TB

Aurora cluster: one Primary DB instance with up to 15 Read Only replicas.

Failover happen over 30 Second with help of 15 Replicas

Writer End Point

Connect to main DB for Write Operation

Reader End Point:

Connect to read replica for Connection load Balancing

Reader endpoint automatically performs load-balancing among all the Aurora Replicas.

Reader Endpoint can be custom defined:

DB instances in a multi-master cluster must be in the same AWS Region
Can't enable cross-Region replicas from multi-master clusters.

Aurora Global Databases

allows you to have an Aurora Replica in another AWS Region, with up to 5 secondary regions.

Security

Encryption ast rest using KMS
Encryption in flight using SSL
Authentication using IAM
Hide behind Security Group

NoSQL Database

1. Amazon DYNAMO DB

Serverless Fully Manged Highly Available DB with Replication over 3 AVZ

NoSQL DB with key values pair
Single Digit Latency: MilliSecond Response Time
High Performance: Automatic Scaling up to 10 Trillion request per day
Integrated with IAM
Low cost & Auto Scaling Capability
Offers active-active cross-region support
Multi AZ by default
DynamoDB Lamda-> Stream all changes in DB
backup & restore
Can only be sort by primary key & index
Stored as item <key,attributes> pair, where Key = Partition Key+Sort Key

Data Support

Max Size 400KB
Scaler: Null, Binary, Boolean, String, Number
Document Type: List , Map
Set Type: Binary Set, String Set, Number Set

Dynamo DB table

Local Secondary Index(LSI)

Alternative Sort key for Table along with partition Key

Query to a DynamoDB table using an attribute that's NOT part of your table's Primary Key. You need to use the >= predicate while keeping the same Partition Key.
Must defined on table creation
Data: Boolean, String, Number
Limit: 5/Table
Can contain one or more Attributes
UseCase: Server side filtering of data based on Attributes
Throttling Throttling of LSI DOES NOT Throttles MainTable
- Uses same R/WCU as main Table

Global Secondary Index(GSI)

Alternate Primary Key :HASH Key, Partition Key, Sort Key

Query to a DynamoDB table using an attribute that's NOT part of your table's Primary Key. What should you do to make this query efficient?
Speed up queries on non key Attribute which can only be sort at Client side
Create a secondary table with Attribute/s as Index as Table
Data: Boolean, String, Number
Need to provisioned WCu & RCU for new Table
Throttling Throttling of GSI Throttles MainTable
- Must provisioned R/WCU of GSI in advance carefully to not Bottleneck the MainTable

Capacity Units

RCU & WCU are Spread Evenly Across Partition of DB

A Hash code is generated using partition key to figure out partition for a Record

1. Write Capacity Unit(WCU):

Write/Sec for 1KB of Data

WCU  = NumberOfItems* Math.round(WriteSize in KB/1KB)

6 item/Sec of 4.5KB = 64.5 = 65 = 30WCU <-- Round up 4.5KB to 6KB
10 item/Sec of 2KB = 10*2 = 20WCU
120 item/Min of 2KB = (120/60)*2 = 4WCU

Write Modes

Standard(Default)

Some Write may Fail in some table

Transactional

Write works across all Table or none of them works

Write Types

Concurrent Write: Update Data At same time result in data overwrite
Conditional Write: Conditional Update data
Atomic Write: Both works
Batch Write: Update multiple values at same time

2. Read Capacity Unit(WCU):

1 Read/Sec for Strongly Consistent Read of 2 Read/Sec for for Eventually Consistent for Data 4KB

Read Modes

Eventually Consistent Read(Default)

Get Stale data for read right after write in distributed read replicas
Strongly Consistent Read

Get latest data for read right after write in distributed read replicas
- ConsistentRead Parameter must be set true in: GetItem, Query, San, BatchGetItem
- Consume 2 RCU/Read.
- Higher latency & costly
Transactional

Read data from multiple Table and get Transactional View

     RCU_Consistent  = NumberOfItems* Math.round(WriteSize in KB/ 4KB ) 

     RCU_EventuallyConsistent  = (NumberOfItems/2)* Math.round(WriteSize in KB/ 4KB )

10 item/Sec SC of 4KB = 10*(4/4)= 1RCU
16 item/Sec EC of 16KB = (16/2)*(12/4)= 24RCU
10 item/Sec SC of 6KB = (10)*(8/4)= 20RCU <-- Round up 6 KB to 8KB

Read/Write Capacity Mode

Control Table Capacity

1. Provisioned Mode(Default)

Need to think beforehand capacity & R/W per Second
Pay for provisioned R/W Capacity Unit
Autoscaling can be enabled to increase capacity
Cost can be estimated.

Burst Capacity: Temporary increase Throughput when RCU/WCU exceed provisioned capacity

ProvisionedThroughputExceedException: When R/WCU exceed provisioned Capacity

Reason:
- Hot Partition: Same Partition is used frequently
- Hot Key
- Very Large Item
Solutions
- Exponential Back off
- Distributed Partition Key
- Use Dynamo DB Accelerator to reduce read load

2. On-Demand Mode

No capacity planning
For unpredictable load Scale Automatically
Unlimited R/W-CU but 2-3X expansive
Pay for what you use in term of Read Request Unit(RRU) & Write Request unit(WRU)
UseCase:unknown workload, unpredictable workload

Dynamo DB API:

WRITE

API	Usage
`PutItem`	Put or Fully Replace item
`UpdateItem`	Update or Edit item Attributes
`Conditional Write`	Write/Update/Delete if condition met or send Error
READ

API	Usage
`GetItem`	Read 1 Item for primary key(HASH or HASH+RANGE). Supports both Strongly Consistent or Eventually Consistent Mode
`ProjectionExpression`	Return only certain Attributes
`Query`	Return Items based Key with Limit(no of item) or `1MB` data
`KeyConditionExpression`	`Query` based on Partition Key with = Operator or SortKey with (=,<.<=>,>=, BeginsWith, Between)
`FilterExpression`	Additional filter after Query before data is returned on NonKey Attribute
`Scan`	Return entire table upto `1MB` Data. Consumes Lots of RCU. Use `LIMIT` to limit number of data. Use `Parallel Scan` to increase throughput. Can pair with both `FilterExpression` & `ProjectionExpression`

DELETE

API	Usage
`DeleteItem`	Delete item with optional conditions
`DeleteTable`	Drop Table
`BathOperation`	Parallel Low Latency Operation on Multiple items. A part of batch can fail
`BatchWriteItem`	Upto 25 `PutItem` and `DeleteItem` per call: `16MB(400KB/item)`
`BatchGetItem`	Upto 100 `GetItem`: `16MB` in Parallel Call from 1 or more DB at a time

Dynamo DB CLI

--projection-expression: subset of attribute
--filter expression: filter data before return
--page-size: return subset of data (default 1000)
--max-items: max number of item to show in CLI(return NextToken)
--starting-token: retrieve next set of item starting form last NextToken

Dynamo DB Optimistic Locking

Optimize WRITE Update/Delete item only if they are changed.

Each item can have an Attribute with work as Version ID to avoid overwrite

Dynamo DB Stream

Ordered Stream of item level changes(create/update/delete) in DB table

List of all changes over time in DynamoDB
Retain for 24 Hour
Made of Shards provisioned by AWS
Retroactive: Only new modifications are shown as Stream.
Can choose information written to Stream
- KEYS_ONLY: Attributes of Modified item
- NEW_IMAGE: Entire item after changes
- OLD_IMAGE: Entire Item before changes
- NEW_AND_OLD_IMAGE: Both new & old Item Images

Use Case:

Kinesis Data Stream: for persisting Data
Lambda: Invoked Synchronously: Must have IAM permissions to poll from dynamoDb stream using Event Source Mapping
Kinesis Client Lib
Elastic Search: Indexing
Email for data processing
Global Table

Dynamo DB Global Table:

Active/ Active Replication of Data Table in Multiple region to provide Low Latency

Replicate DB in multiple region to make Data Accessible across multiple region at low latency
User can Read/Write from/to table close to them
Must have Dynamo DB stream enabled to allow replication

Dynamo DB Local Table:

Offline Local DB for testing

Clean Up

Scan + Delete-> Costly & Slow
Drop & Recreate Table-> Fast & More efficient

Copy/ BackUp

Data Pipeline--> External AWS Service using EMR to copy data
Backup & Restore DB: No additional service require
Scan+ BatchWrite--> Slow & Costly

Security

IAM: Access Control
VPC: Hide behind private VPC or use VPC End Point
KMS: encryption at rest
IdentityFederation: Temporary access using AWS Cognito to Modify own data
Leading Key: Access management for user at low level

Dynamo DB Transaction

Coordinated One All or Nothing Transaction to add/update/delete to multiple Table or none of them )

Provide: Atomicity, Consistency,Isolation, Durability(ACID)
ReadMode: Eventual Consistent, Strongly Consistent, Transactional
WriteMode: Standard, Transactional
Consume 2X R/WCU: Perform 2 Operation per item (prepare & commit)
```
WCU =  WCU * 2
RCU =  RCU * 2
```
API: upto 25 Item or 4MB of data
- TransactGetItems: 1 or more GetItems per operation
- TransactWriteItems: 1 or more PutItem, DeleteItem, UpdateItem per operation
Use case: Banks, Orders, Games

Expiry Time (ExpTime)

Auto delete a items after time expired TTL

Does not consume WCU
TTL must be number as Unix Epoch Timestamp
Deleted Within 48 Hours: Expired item might be shown in queries so must filter out client side
Auto Delete item form GSI, LSI
Deleted item can be recovered using Dynamo DB stream
Use Case: Session Data, Regulation

Session Sate

DynamoDB	ElastiCache	EFS
Serverless AutoScaling	In memory	Shared File System attache to EC2
Key/Value	Key/Value	File System

Amazon ElasticSearch (ES)/ OpenSearch

Search & Index any field or partial match in DB

Used with Big Data or NoSQl DB
Based on Elastic Search Project
Run on Server (not serverless)

ES Access Policy

IP based policy to allow access to CIDR block of limited IPs

Allow unsigned request using CURL, Kibana
Private Subnet can access using SigV4
Public CLient can Search via Proxy Server in public Subnet

Kibana does not support IAM Authentication

use following to authenticate access to ES

SAML
HTTP Basic AUthentication
Cognito (using AD)

ES Production Setup

VPC across 3 AVZ
3 Master Nodes: 1 per AVZ, 2 on standby 1 Active
6 Data Nodes: 2 per AVZ/Master Node. All connected with Active Master Node

2. Amazon DocumentDB:

MongoDB Database service

Aurora Version for MongoDB
Same features as Aurora

3. Amazon Neptune:

Graph database service.

Highly available 3 AVZ & 15 Replica
Can hold Billion of relation
Point in Time Recovery, Multi AZ Clustering

InMemory Database

1. Amazon DynamoDB Accelerator (DAX):

Fully Manged in-memory cache used only for DynamoDB.

10X performance with Dynamo DB
5 Minute TTL for Cached Data
Upto 10 Nodes per Cluster.
Multi AZ: 3 Node /AVZ
Does not need to modify app code as it only add a layer
Micro Second Latency
Highly Scalable & Available
Does not offer active-active cross-Region configuration.
Solve Problem of HotKey: Reduce read of same data for a key above RCU
Support IAM authentication

2. Amazon ElastiCache:

service to add caching layers to help improve the read times of common requests.

In memory manged DB for All DB type
Can be used along with DAX to store aggregated result
Faster retrieve for common result
Need to modify code a lot
Support for Clustering in Redis to Multi AVZ
Backup, Snapshot, point in time restore
Monitor through Cloudwatch
Pay per hour for EC2 type used & Storage used
DO Not support IAM Bases Authentication

REDIS	MEMCacheD
MultiAZ Auto Failover	Multi Node for Partition of Data(Sharding)
`Highly Available`	No High Availability
Backup & Restore	No Backup & Restore
`Data Durability`	Non persistent
Can also work as DB	`High Performance Multi Threaded Distributed Cache`
`Redis AUTH` for Encryption on Rest	SASL based Authentication
SSL for Encryption in Transit
HIPAA compliant

Caching pattern

Lazy Loading: All Data is loaded when cache miss
Write Through: Add to Cache when added/updated to DB
Session: Store session data

Elasticache

Big DATA & Business Analytics/ Intelligence OLAP(Online Analytical Processing )

Amazon Glue

Manged Extract, Transform & Load (ETL) Service to transform data

Serverless service to transform data for analytics

Amazon Glue

Glue Data Catalog

MetaData Catalog of DataSet created by Glue provide Column Info to Analytics Service

Amazon Glue

1. Amazon Red Shift

Data Analysis Service for Data Warehouse, OLAP

Work well with Huge Volume & Variety of data
10X performance for Data Warehouse & Analytics
Based on PostgreSQL
Data is Store in Column
Scale to PetaByte & Highly Available
Load data every hour
Massive parallel Query Execution (MPP) with SQL interface for query
BI Tools integrated to create Dashboard
Load from S3, Dynamo DB, DMS
Support upto 125 Node with each node can handle 128 TB
Node can be Leader Node & Query Node
Redshift enhanced VPC Routing: COPY/UNLOAD go through S3 instead of Internet
No Multi AZ Mode, snapshot allow working in other AZ

Loading Data to Redshift:

Amazon Kinesis FireHose: Put Data to S3 & Route Data to Cluster
Route Data through Internet or enhanced VPC routing
Use JDBC Connector

RedShift Spectrum:

Efficiently query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables.

2. Amazon Elastic Map Reduce (EMR)

Provision to Create Cluster of EC2 instance

For Cluster Analysis of Big Data
Can be used to create
- Hadoop Cluster for Big Data Analysis
- Apache Spark
- HBase, Presto, fling
Autoscaling & spot instance
Usage: BigData, ML, Indexing

3. Amazon ATHENA

Serverless Query service to perform Analytics against S3 Objects

Use SQL to Query files in S3
Work on top of CSV, JSON, ORC,Avro, Parquet
Athena Query & Analyze the Data & create report on Amazon QuickSight
Price: $5/TB Data scan
Pay per query/ per TB of data scanned: compressed dta scan save cost
Usage: Analytics, BI, reporting, Analysis, Flow Log,

QuickSights

Serverless ML powered Business Intelligence service to create Interactive Dashboards

Give insights of data in cool dashboard
Integrated on top of Aurora, Athena, Redshift, S3 ...
Usage; Business Insights & Analytics, Visualization,

Ledger

1. Amazon (QLDB Quantum Ledger Database)

ledger database service. Review a complete history of all the changes that have been made to your application data.

Central Manged DB
Immutable: Write one Read many
Fully Manged, serverless, highly available
Hold Sequence of Modification using Crypt Hash
2X3 Time better performance
SQL Support
Usage: Financial Transaction

2. Amazon Managed Blockchain:

service to create and manage blockchain networks with open-source frameworks. distributed ledger system

DeCentralized DB
Public Block chain network with support for Ethereum

AWS DB MIGRATION SERVICE(DMS)

Lift & Shift Migrate DB to EC2 securely & easily between Source & Target DB.

Source can remain operation to reduce down time
Support Multi AZ Migration
EC2 instance perform the data migration
Source:
- On Premises: Oracle,MS SQL Server, MySQL, Maria DB, PostgreSQL, MongoDB, DB2, SAP
- AWS: RDS, S3
- Azure SQL Server
Target:
- On Premises: Oracle,MS SQL Server, MySQL, Maria DB, PostgreSQL, MongoDB, DB2, SAP
- DynamoDB, RDS, Redshift, S3, ElasticSearch, Kinesis, Document DB
DMS supports specifying Amazon S3 as the source and streaming services like Kinesis and Amazon Managed Streaming of Kafka (Amazon MSK) as the target.

Schema Conversion Tool(SCT)

Convert Data base schema from one engine to another

Source DB--> SCT +DMS --> Target DB

SCT

Support:

- Homogeneous Migration

between same & target type eg SQL to SQL

Not need SCT

- Heterogeneous Migration

between different source & target type eg. SQL to noSQL.

Need SCT

Use cases

Develop & test DB Migration
Consolidate: combine several DB into one DB
Continuos Replication: ongoing copies

Pricing

Pay for Instance
- Pay as Go
- Reserve Instance
pay for Storage & IO
Pay for data transfer between region

AWS DATA SYNC

Incremental Scheduled sync of large amount of data data from on premise server to S3 (include Glacier), FSX for Windows, EFS

Scheduled Replication: replication can be scheduled on regular interval hourly, weekly
Use Data Sync Agent to connect to system on premises or EFS
Move Data using NFS or SMB protocols
Can Sync EFS across regions
Possible to set bandwidth to reduce network usage
Copy un-encrypted EFS to encrypted EFS or move data across EFS
CopyObject APIs to copy objects between S3 buckets

Data Sync

AWS BACKUP

Fully manged Automate Backup process to backup resources in S3 cloud

Automate/manage cross-regions backups of AWS resources:

EC2, Storage Gateway
RDS, Aurora, DynamoDB,
EFS, FSX, EBS

AWS BACKUP

On demand or automatic scheduled backup based on Backup Plan(policy)
- Frequency: Can be Daily, weekly Monthly.
- Transition to cold storage
- Retention period
- Resource with tag can be added to backup plan
Support Cross Region/Account backup
Serverless: No need to create script to run backup
Support Point in Time recovery (PITR)

CLOUD ENDURE

Continuous replication of Data from corporate server to cloud

Quickly migrate to cloud in case of business server fail

AWS/5-Database

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

AWS Database Services

Overview of available AWS DB Services, their types, use cases, pros and cons.

Written by Hitesh Sahu, a passionate developer and blogger.

Mon Sep 29 2025

Share This on

Database Service

DB by Use Cases

Database Type	Use Cases	AWS Service
Relational	Traditional applications, ERP, CRM, e-commerce	Amazon Aurora , Amazon RDS , Amazon Redshift
Key-value	High-traffic web apps, e-commerce systems, gaming applications	Amazon DynamoDB
In-memory	Caching, session management, gaming leader boards, geospatial applications	Amazon ElastiCache , Amazon MemoryDB for Redis
Document	Content management, catalogs, user profiles	Amazon DocumentDB (with MongoDB compatibility)
Wide-column	High scale industrial apps for equipment maintenance, fleet management, and route optimization	Amazon Key spaces
Graph	Fraud detection, social networking, recommendation engines	Amazon Neptune
Time series	IoT applications, DevOps, industrial telemetry	Amazon Time stream
Ledger	Systems of record, supply chain, registrations, banking transactions	Amazon Quantum Ledger Database (QLDB)

Relational Database for OLTP (Online Transaction Processing)

Data Stored as Table-> Record-> Field(Value+ Primary Key)

1. Amazon RDS(RELATIONAL DATABASE SERVICE)

Manage relational databases in the AWS Cloud to support SQL

Manged DB: Automatic Provision & Patching with Monitoring Dashboard
Come in free tier
Scale Vertically & Horizontally
Backed by EBS(gp2, io1)
Data is encrypted when stored & transit(sent/ receive)
Disaster recovery using Multi AVZ replica
- Continuos Backup to S3
- Transactions Logs backup every 5 Minute
- Point & Time Recovery:Disaster Recovery upto 5 Minute
- Daily full backup of DB which store up to 35 days
Limitations
- Cant do SSH to DB
- Does not support active-active configuration with cross-region support.

RDS DB Type

Supported RDBMS(Relational DBMS):

DB	Port	IAM
Aurora	5432 (if PostgreSQL compatible) or 3306 (if MySQL compatible)
PostgreSQL	5432
MySQL	3306
MariaDB	3306 (same as MySQL)
Oracle RDS	1521	Does not support IAM
MSSQL Server	1433

Read Replica

Asynchronous Replication of DB to read replicas to scale read workload

6 Replicated copies with over 15 read replicas of DB
Supported: Within AZ, Cross AZ, Cross Region
Cross AZ replication Traffic is free but Cross Region Replication is paid.
Each Replica can be promoted to main DB
Read Replica support only SELECT (Read)
Add new endpoints with their own DNS name.
Need to change application to reference new DNS individually to balance the read load.
Use Case: Run Analytics, Reporting on Read Replica

Multi AZ

Synchronous Replication of DB for Disaster Recovery

Increase Availability by automatic failover
Support Multi AVZ
Standby Database promoted to main DB in case of Disaster
StandbyDB does not support any Read/Write
Keeps the same connection string regardless of which database is up. When used does not require to change the SQL connection string

Encryption

Master & Read Replica can be encrypted using AWS KMS AES256

Encryption must be enabled while creating master DB
Snapshot of Encrypted RDS is Encrypted
Snapshot of Un-encrypted RDS is un-encrypted
Un-encrypted RDS can be encrypted: DB => snapshot =>copy snapshot as encrypted => restore DB from snapshot
Oracle & SQL Server can be encrypted using Transparent Data Encryption (TDE)
In Flight Encryption supported using SSL
Force SSL using
- PostgreSQL: using Parameter Group: rds.force_ssl=1
- MySQL: SQl Command: GRANT USAGE ON *.* TO mysqluser'@'%' REQUIRE SSL;

Security

Deploy DB in private VPC within Security Group. DB must not be internet facing
Use IAM policy for Access Management
IAM Authentication for MySQL & PostgreSQL

Autoscaling

Scale automatically to meet demand of DB to meet unpredictable DB demand

10% storage left for more than 5 Minutes
Set Maximum Storage Threshold to avoid scaling infinitely

Amazon ARORA:

Fast & cost effective RDBMS Built for cloud

Backed by benefic of RDS but not in free tier
Self heal corrupted data
Support Cross Region Replication
6 Replica across 3 AZ
- 4/6 needed for Write
- 3/6 needed for read
MySQL & PostgreSQL compatible
Fast: 5X MySQL & 3X PostgreSQL
Cost 20% more but more efficient
Scale automatically in multiple of 10GB up to 128TB

Aurora cluster: one Primary DB instance with up to 15 Read Only replicas.

Failover happen over 30 Second with help of 15 Replicas

Writer End Point

Connect to main DB for Write Operation

Reader End Point:

Connect to read replica for Connection load Balancing

Reader endpoint automatically performs load-balancing among all the Aurora Replicas.

Reader Endpoint can be custom defined:

DB instances in a multi-master cluster must be in the same AWS Region
Can't enable cross-Region replicas from multi-master clusters.

Aurora Global Databases

allows you to have an Aurora Replica in another AWS Region, with up to 5 secondary regions.

Security

Encryption ast rest using KMS
Encryption in flight using SSL
Authentication using IAM
Hide behind Security Group

NoSQL Database

1. Amazon DYNAMO DB

Serverless Fully Manged Highly Available DB with Replication over 3 AVZ

NoSQL DB with key values pair
Single Digit Latency: MilliSecond Response Time
High Performance: Automatic Scaling up to 10 Trillion request per day
Integrated with IAM
Low cost & Auto Scaling Capability
Offers active-active cross-region support
Multi AZ by default
DynamoDB Lamda-> Stream all changes in DB
backup & restore
Can only be sort by primary key & index
Stored as item <key,attributes> pair, where Key = Partition Key+Sort Key

Data Support

Max Size 400KB
Scaler: Null, Binary, Boolean, String, Number
Document Type: List , Map
Set Type: Binary Set, String Set, Number Set

Dynamo DB table

Local Secondary Index(LSI)

Alternative Sort key for Table along with partition Key

Query to a DynamoDB table using an attribute that's NOT part of your table's Primary Key. You need to use the >= predicate while keeping the same Partition Key.
Must defined on table creation
Data: Boolean, String, Number
Limit: 5/Table
Can contain one or more Attributes
UseCase: Server side filtering of data based on Attributes
Throttling Throttling of LSI DOES NOT Throttles MainTable
- Uses same R/WCU as main Table

Global Secondary Index(GSI)

Alternate Primary Key :HASH Key, Partition Key, Sort Key

Query to a DynamoDB table using an attribute that's NOT part of your table's Primary Key. What should you do to make this query efficient?
Speed up queries on non key Attribute which can only be sort at Client side
Create a secondary table with Attribute/s as Index as Table
Data: Boolean, String, Number
Need to provisioned WCu & RCU for new Table
Throttling Throttling of GSI Throttles MainTable
- Must provisioned R/WCU of GSI in advance carefully to not Bottleneck the MainTable

Capacity Units

RCU & WCU are Spread Evenly Across Partition of DB

A Hash code is generated using partition key to figure out partition for a Record

1. Write Capacity Unit(WCU):

Write/Sec for 1KB of Data

WCU  = NumberOfItems* Math.round(WriteSize in KB/1KB)

6 item/Sec of 4.5KB = 64.5 = 65 = 30WCU <-- Round up 4.5KB to 6KB
10 item/Sec of 2KB = 10*2 = 20WCU
120 item/Min of 2KB = (120/60)*2 = 4WCU

Write Modes

Standard(Default)

Some Write may Fail in some table

Transactional

Write works across all Table or none of them works

Write Types

Concurrent Write: Update Data At same time result in data overwrite
Conditional Write: Conditional Update data
Atomic Write: Both works
Batch Write: Update multiple values at same time

2. Read Capacity Unit(WCU):

1 Read/Sec for Strongly Consistent Read of 2 Read/Sec for for Eventually Consistent for Data 4KB

Read Modes

Eventually Consistent Read(Default)

Get Stale data for read right after write in distributed read replicas
Strongly Consistent Read

Get latest data for read right after write in distributed read replicas
- ConsistentRead Parameter must be set true in: GetItem, Query, San, BatchGetItem
- Consume 2 RCU/Read.
- Higher latency & costly
Transactional

Read data from multiple Table and get Transactional View

     RCU_Consistent  = NumberOfItems* Math.round(WriteSize in KB/ 4KB ) 

     RCU_EventuallyConsistent  = (NumberOfItems/2)* Math.round(WriteSize in KB/ 4KB )

10 item/Sec SC of 4KB = 10*(4/4)= 1RCU
16 item/Sec EC of 16KB = (16/2)*(12/4)= 24RCU
10 item/Sec SC of 6KB = (10)*(8/4)= 20RCU <-- Round up 6 KB to 8KB

Read/Write Capacity Mode

Control Table Capacity

1. Provisioned Mode(Default)

Need to think beforehand capacity & R/W per Second
Pay for provisioned R/W Capacity Unit
Autoscaling can be enabled to increase capacity
Cost can be estimated.

Burst Capacity: Temporary increase Throughput when RCU/WCU exceed provisioned capacity

ProvisionedThroughputExceedException: When R/WCU exceed provisioned Capacity

Reason:
- Hot Partition: Same Partition is used frequently
- Hot Key
- Very Large Item
Solutions
- Exponential Back off
- Distributed Partition Key
- Use Dynamo DB Accelerator to reduce read load

2. On-Demand Mode

No capacity planning
For unpredictable load Scale Automatically
Unlimited R/W-CU but 2-3X expansive
Pay for what you use in term of Read Request Unit(RRU) & Write Request unit(WRU)
UseCase:unknown workload, unpredictable workload

Dynamo DB API:

WRITE

API	Usage
`PutItem`	Put or Fully Replace item
`UpdateItem`	Update or Edit item Attributes
`Conditional Write`	Write/Update/Delete if condition met or send Error
READ

API	Usage
`GetItem`	Read 1 Item for primary key(HASH or HASH+RANGE). Supports both Strongly Consistent or Eventually Consistent Mode
`ProjectionExpression`	Return only certain Attributes
`Query`	Return Items based Key with Limit(no of item) or `1MB` data
`KeyConditionExpression`	`Query` based on Partition Key with = Operator or SortKey with (=,<.<=>,>=, BeginsWith, Between)
`FilterExpression`	Additional filter after Query before data is returned on NonKey Attribute
`Scan`	Return entire table upto `1MB` Data. Consumes Lots of RCU. Use `LIMIT` to limit number of data. Use `Parallel Scan` to increase throughput. Can pair with both `FilterExpression` & `ProjectionExpression`

DELETE

API	Usage
`DeleteItem`	Delete item with optional conditions
`DeleteTable`	Drop Table
`BathOperation`	Parallel Low Latency Operation on Multiple items. A part of batch can fail
`BatchWriteItem`	Upto 25 `PutItem` and `DeleteItem` per call: `16MB(400KB/item)`
`BatchGetItem`	Upto 100 `GetItem`: `16MB` in Parallel Call from 1 or more DB at a time

Dynamo DB CLI

--projection-expression: subset of attribute
--filter expression: filter data before return
--page-size: return subset of data (default 1000)
--max-items: max number of item to show in CLI(return NextToken)
--starting-token: retrieve next set of item starting form last NextToken

Dynamo DB Optimistic Locking

Optimize WRITE Update/Delete item only if they are changed.

Each item can have an Attribute with work as Version ID to avoid overwrite

Dynamo DB Stream

Ordered Stream of item level changes(create/update/delete) in DB table

List of all changes over time in DynamoDB
Retain for 24 Hour
Made of Shards provisioned by AWS
Retroactive: Only new modifications are shown as Stream.
Can choose information written to Stream
- KEYS_ONLY: Attributes of Modified item
- NEW_IMAGE: Entire item after changes
- OLD_IMAGE: Entire Item before changes
- NEW_AND_OLD_IMAGE: Both new & old Item Images

Use Case:

Kinesis Data Stream: for persisting Data
Lambda: Invoked Synchronously: Must have IAM permissions to poll from dynamoDb stream using Event Source Mapping
Kinesis Client Lib
Elastic Search: Indexing
Email for data processing
Global Table

Dynamo DB Global Table:

Active/ Active Replication of Data Table in Multiple region to provide Low Latency

Replicate DB in multiple region to make Data Accessible across multiple region at low latency
User can Read/Write from/to table close to them
Must have Dynamo DB stream enabled to allow replication

Dynamo DB Local Table:

Offline Local DB for testing

Clean Up

Scan + Delete-> Costly & Slow
Drop & Recreate Table-> Fast & More efficient

Copy/ BackUp

Data Pipeline--> External AWS Service using EMR to copy data
Backup & Restore DB: No additional service require
Scan+ BatchWrite--> Slow & Costly

Security

IAM: Access Control
VPC: Hide behind private VPC or use VPC End Point
KMS: encryption at rest
IdentityFederation: Temporary access using AWS Cognito to Modify own data
Leading Key: Access management for user at low level

Dynamo DB Transaction

Coordinated One All or Nothing Transaction to add/update/delete to multiple Table or none of them )

Provide: Atomicity, Consistency,Isolation, Durability(ACID)
ReadMode: Eventual Consistent, Strongly Consistent, Transactional
WriteMode: Standard, Transactional
Consume 2X R/WCU: Perform 2 Operation per item (prepare & commit)
```
WCU =  WCU * 2
RCU =  RCU * 2
```
API: upto 25 Item or 4MB of data
- TransactGetItems: 1 or more GetItems per operation
- TransactWriteItems: 1 or more PutItem, DeleteItem, UpdateItem per operation
Use case: Banks, Orders, Games

Expiry Time (ExpTime)

Auto delete a items after time expired TTL

Does not consume WCU
TTL must be number as Unix Epoch Timestamp
Deleted Within 48 Hours: Expired item might be shown in queries so must filter out client side
Auto Delete item form GSI, LSI
Deleted item can be recovered using Dynamo DB stream
Use Case: Session Data, Regulation

Session Sate

DynamoDB	ElastiCache	EFS
Serverless AutoScaling	In memory	Shared File System attache to EC2
Key/Value	Key/Value	File System

Amazon ElasticSearch (ES)/ OpenSearch

Search & Index any field or partial match in DB

Used with Big Data or NoSQl DB
Based on Elastic Search Project
Run on Server (not serverless)

ES Access Policy

IP based policy to allow access to CIDR block of limited IPs

Allow unsigned request using CURL, Kibana
Private Subnet can access using SigV4
Public CLient can Search via Proxy Server in public Subnet

Kibana does not support IAM Authentication

use following to authenticate access to ES

SAML
HTTP Basic AUthentication
Cognito (using AD)

ES Production Setup

VPC across 3 AVZ
3 Master Nodes: 1 per AVZ, 2 on standby 1 Active
6 Data Nodes: 2 per AVZ/Master Node. All connected with Active Master Node

2. Amazon DocumentDB:

MongoDB Database service

Aurora Version for MongoDB
Same features as Aurora

3. Amazon Neptune:

Graph database service.

Highly available 3 AVZ & 15 Replica
Can hold Billion of relation
Point in Time Recovery, Multi AZ Clustering

InMemory Database

1. Amazon DynamoDB Accelerator (DAX):

Fully Manged in-memory cache used only for DynamoDB.

10X performance with Dynamo DB
5 Minute TTL for Cached Data
Upto 10 Nodes per Cluster.
Multi AZ: 3 Node /AVZ
Does not need to modify app code as it only add a layer
Micro Second Latency
Highly Scalable & Available
Does not offer active-active cross-Region configuration.
Solve Problem of HotKey: Reduce read of same data for a key above RCU
Support IAM authentication

2. Amazon ElastiCache:

service to add caching layers to help improve the read times of common requests.

In memory manged DB for All DB type
Can be used along with DAX to store aggregated result
Faster retrieve for common result
Need to modify code a lot
Support for Clustering in Redis to Multi AVZ
Backup, Snapshot, point in time restore
Monitor through Cloudwatch
Pay per hour for EC2 type used & Storage used
DO Not support IAM Bases Authentication

REDIS	MEMCacheD
MultiAZ Auto Failover	Multi Node for Partition of Data(Sharding)
`Highly Available`	No High Availability
Backup & Restore	No Backup & Restore
`Data Durability`	Non persistent
Can also work as DB	`High Performance Multi Threaded Distributed Cache`
`Redis AUTH` for Encryption on Rest	SASL based Authentication
SSL for Encryption in Transit
HIPAA compliant

Caching pattern

Lazy Loading: All Data is loaded when cache miss
Write Through: Add to Cache when added/updated to DB
Session: Store session data

Elasticache

Big DATA & Business Analytics/ Intelligence OLAP(Online Analytical Processing )

Amazon Glue

Manged Extract, Transform & Load (ETL) Service to transform data

Serverless service to transform data for analytics

Amazon Glue

Glue Data Catalog

MetaData Catalog of DataSet created by Glue provide Column Info to Analytics Service

Amazon Glue

1. Amazon Red Shift

Data Analysis Service for Data Warehouse, OLAP

Work well with Huge Volume & Variety of data
10X performance for Data Warehouse & Analytics
Based on PostgreSQL
Data is Store in Column
Scale to PetaByte & Highly Available
Load data every hour
Massive parallel Query Execution (MPP) with SQL interface for query
BI Tools integrated to create Dashboard
Load from S3, Dynamo DB, DMS
Support upto 125 Node with each node can handle 128 TB
Node can be Leader Node & Query Node
Redshift enhanced VPC Routing: COPY/UNLOAD go through S3 instead of Internet
No Multi AZ Mode, snapshot allow working in other AZ

Loading Data to Redshift:

Amazon Kinesis FireHose: Put Data to S3 & Route Data to Cluster
Route Data through Internet or enhanced VPC routing
Use JDBC Connector

RedShift Spectrum:

Efficiently query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables.

2. Amazon Elastic Map Reduce (EMR)

Provision to Create Cluster of EC2 instance

For Cluster Analysis of Big Data
Can be used to create
- Hadoop Cluster for Big Data Analysis
- Apache Spark
- HBase, Presto, fling
Autoscaling & spot instance
Usage: BigData, ML, Indexing

3. Amazon ATHENA

Serverless Query service to perform Analytics against S3 Objects

Use SQL to Query files in S3
Work on top of CSV, JSON, ORC,Avro, Parquet
Athena Query & Analyze the Data & create report on Amazon QuickSight
Price: $5/TB Data scan
Pay per query/ per TB of data scanned: compressed dta scan save cost
Usage: Analytics, BI, reporting, Analysis, Flow Log,

QuickSights

Serverless ML powered Business Intelligence service to create Interactive Dashboards

Give insights of data in cool dashboard
Integrated on top of Aurora, Athena, Redshift, S3 ...
Usage; Business Insights & Analytics, Visualization,

Ledger

1. Amazon (QLDB Quantum Ledger Database)

ledger database service. Review a complete history of all the changes that have been made to your application data.

Central Manged DB
Immutable: Write one Read many
Fully Manged, serverless, highly available
Hold Sequence of Modification using Crypt Hash
2X3 Time better performance
SQL Support
Usage: Financial Transaction

2. Amazon Managed Blockchain:

service to create and manage blockchain networks with open-source frameworks. distributed ledger system

DeCentralized DB
Public Block chain network with support for Ethereum

AWS DB MIGRATION SERVICE(DMS)

Lift & Shift Migrate DB to EC2 securely & easily between Source & Target DB.

Source can remain operation to reduce down time
Support Multi AZ Migration
EC2 instance perform the data migration
Source:
- On Premises: Oracle,MS SQL Server, MySQL, Maria DB, PostgreSQL, MongoDB, DB2, SAP
- AWS: RDS, S3
- Azure SQL Server
Target:
- On Premises: Oracle,MS SQL Server, MySQL, Maria DB, PostgreSQL, MongoDB, DB2, SAP
- DynamoDB, RDS, Redshift, S3, ElasticSearch, Kinesis, Document DB
DMS supports specifying Amazon S3 as the source and streaming services like Kinesis and Amazon Managed Streaming of Kafka (Amazon MSK) as the target.

Schema Conversion Tool(SCT)

Convert Data base schema from one engine to another

Source DB--> SCT +DMS --> Target DB

SCT

Support:

- Homogeneous Migration

between same & target type eg SQL to SQL

Not need SCT

- Heterogeneous Migration

between different source & target type eg. SQL to noSQL.

Need SCT

Use cases

Develop & test DB Migration
Consolidate: combine several DB into one DB
Continuos Replication: ongoing copies

Pricing

Pay for Instance
- Pay as Go
- Reserve Instance
pay for Storage & IO
Pay for data transfer between region

AWS DATA SYNC

Incremental Scheduled sync of large amount of data data from on premise server to S3 (include Glacier), FSX for Windows, EFS

Scheduled Replication: replication can be scheduled on regular interval hourly, weekly
Use Data Sync Agent to connect to system on premises or EFS
Move Data using NFS or SMB protocols
Can Sync EFS across regions
Possible to set bandwidth to reduce network usage
Copy un-encrypted EFS to encrypted EFS or move data across EFS
CopyObject APIs to copy objects between S3 buckets

Data Sync

AWS BACKUP

Fully manged Automate Backup process to backup resources in S3 cloud

Automate/manage cross-regions backups of AWS resources:

EC2, Storage Gateway
RDS, Aurora, DynamoDB,
EFS, FSX, EBS

AWS BACKUP

On demand or automatic scheduled backup based on Backup Plan(policy)
- Frequency: Can be Daily, weekly Monthly.
- Transition to cold storage
- Retention period
- Resource with tag can be added to backup plan
Support Cross Region/Account backup
Serverless: No need to create script to run backup
Support Point in Time recovery (PITR)

CLOUD ENDURE

Continuous replication of Data from corporate server to cloud

Quickly migrate to cloud in case of business server fail

AWS/5-Database