Chloe McAree (McAteer)
AWS Databases


This is part of a blog series giving a high level overview of the different services examined on the AWS Solution Architect Associate exam, to view the whole series click here.

Relational Database Service (RDS)

  • Allows you to create and scale relational databases in the cloud

  • RDS runs on virtual machines (can’t log in to the OS or SSH in)

  • AWS handles admin tasks for you like hardware provisioning, patching & backups.

  • RDS is not serverless — (one exception Aurora Serverless)

  • Allows you to control network access to your database

  • Offers encryption at rest — done with KMS (data stored, automated backups, read replicas and snapshots all encrypted)

Supported AWS Relational Database Platforms

  • Aurora

  • Postgres SQL

  • MySQL Server

  • SQL Server

  • Oracle

  • Maria DB

RDS Main Features

Multi AZ Recovery

  • Have a primary and secondary database, if you lose the primary database, AWS would detect and automatically update the DNS to point at the secondary database.

  • Used for DISASTER RECOVERY, it doesn’t improve performance.

Read Replicas

  • Every time you write to the main database, it is replicated in the secondary database.

  • If you lose the primary database there is no automatic failover, you need to **manually update the URL **to it yourself


  • Used for scaling

  • Automatic backups must be turned on

  • Up to 5 read replicas of any database

  • It is possible to have read replicas of read replicas - but this can introduce latency.

  • Each read replica has its own DNS

  • Can have multi AZ

RDS Backups

Automated Backups

  • Allows you to recover your database to any point in time within the specified retention period (Max 35 days)

  • Takes daily snapshots and stores transition logs

  • When recovering AWS will choose the most recent backup

  • Enabled by default

  • Backup data is stored in S3

  • May experience latency when backup is being taken

  • Backups are deleted once you remove the original RDS instances

Database Snapshot

  • User-initiated, must be manually done by yourself

  • Stored until you explicitly delete them, even after you delete the original RDS instance they are still persisted. However this is not the case with automated backups.

Data Warehousing

  • Creates a central place for data and information to be analysed

  • Can consolidate data from multiple sources

  • Used for business intelligence tools typically for business analysts, data scientists/engineers.

  • Used to pull very large complex datasets usually used by management to do queries on data

  • RedShift is AWS’s data warehouse solution


  • Powerful data warehouse, that can combine/query exabytes of data.

  • Can work with structured or semi-structured data

  • Can save query results directly back into your S3 data lake

  • Can be single node or multi node

  • Has column compression — compress columns instead of rows because of similar data.

  • One day backups are enable by default (max days = 35)

  • Only Redshift can delete these automated snapshots, you can’t delete them manually.

  • Pricing — compute node hours, backups and data transfer

  • Encrypted in transport using SSL

  • Encrypted at rest using KMS or HSM

  • Only available in one AZ

  • Can restore to a new AZ


  • Allows you to deploy, operate & scale in-memory data stores in the cloud.

  • Improves the performance of web applications, as it allows you to retrieve data fast from memory with high throughput and low latency.

  • Fully managed hardware provisioning, software patching, setup etc.

  • Scalable

  • There are two types of in-memory caching engines:

  1. Memcached — designed for simplicity, so used with you need the simplest model possible.

  2. Redis — works for a wide range of use cases and have multi AZ. You can also complete backups/restores of redis.

Services capable of caching

  • CloudFront

  • API Gateway

  • ElasticCache

  • Dynamo DB Accelerator

Caching is a balancing act between up-to-date accurate information and latency.

The further up you cache in your architecture the better e.g. at CloudFront level instead of waiting to DB level.


  • Fast flexible NoSQL database

  • Allows for storage of large text and binary, but there is a limit of 400KB item size

  • Delivers single digit millisecond latency at any scale

  • Fully managed serverless database — no servers to provision, patch, or manage.

  • Stored on SSD Storage

  • Spread across 3 geographically distinct datacenters

  • DynamoDB supports eventually consistent and strongly consistent reads. (eventual consistency is default)

  • Streams → time ordered sequence of item level modifications in a table (stored up to 24 hours)

Eventual Consistency (best read performance)→ Consistency across data within a second, meaning the response might not reflect the results of a just completed write operation, but if you repeat the read request again it should return the updated data.

Strong Consistency → Returns the latest data. Results should reflect all writes that received a successful response prior to that read!

Global Tables

  • Fully managed, multi-active & multi-region database

  • Replicate your DynamoDB tables across selected regions

  • Used for globally distributed apps

  • Based on DynamoDB streams

  • Can be used for Disaster Recovery or high availability

Security in DynamoDB

  • Encryption at rest using KMS

  • Can use site to site VPN, direct connect and IAM policies and roles

  • Can implement fine grain access

  • Can monitor on Cloud Watch and Cloud trail

DynamoDB Accelerator (DAX)

  • Managed, highly available in memory cache for DynamoDB

  • Has up to 10 times performance improvement

  • Request time reduced to microseconds

  • DAX manages all in-memory acceleration, so you don’t need to mange things like cache invalidations

  • Compatible with Dynamo API calls


  • MySQL & PostgresSQL compatible relational database.

  • Provides 5x better performance than MySQL

  • Provides 3x better performance than Postgres SQL

  • Distributed, fault-tolerant, self-healing storage system

  • 2 copies of your data is contained in each Availability Zone (AZ) — minimum of 3 AZ’s and 6 copies.

  • Can handle the loss of up to 2 copies without affecting write ability.

  • Can handle lose of up to 3 copies of data without affecting read ability.

  • Automated backups always enabled — doesn’t impact performance.

Aurora Serverless

  • On demand autoscaling configuration of Aurora

  • Automatically starts up, shuts down, and scales based on app needs

  • Used for simple, cost effective infrequently used, intermittent or unpredictable workloads

  • Only pay for invocation.

Database migration service (DMS)

  • Transfer a database to another (on-premise or in cloud or both )

  • Runs replication software

  • Source stays functioning the whole time during the migration

Types of migrations

  1. Supports Homogenous Migrations — Identical e.g. oracle to oracle

  2. Supports Hetrogenous Migrations — Different e.g. SQLServer to Aurora. If you do this you will need to use a Schema Conversion Tool (SCT)

Elastic Map Reduce (EMR)

  • Big data platform for processing large amounts of data

  • Run petabyte scale analysis

  • 3x faster than apache spark

  • Makes it easy to set up, operate, & scale big data environments

  • Workloads run on clusters of EC2 instances call nodes

  • Different software components are installed in each node

  • Data is stored on S3 by default

  • Can configure replication on S3 on 5 min intervals — only on creation!

Node Types

  1. Master Node → Manages cluster, tracks subtasks and monitors health.

  2. Core Node → Has software components to run tasks & store data.

  3. Task Node → Has software component, only runs tasks, can’t store data.