AWS Databases

This is part of a blog series giving a high level overview of the different services examined on the AWS Solution Architect Associate exam, to view the whole series click here.

Relational Database Service (RDS)

RDS logo

Allows you to create and scale relational databases in the cloud
RDS runs on virtual machines (can’t log in to the OS or SSH in)
AWS handles admin tasks for you like hardware provisioning, patching & backups.
RDS is not serverless — (one exception Aurora Serverless)
Allows you to control network access to your database
Offers encryption at rest — done with KMS (data stored, automated backups, read replicas and snapshots all encrypted)

Supported AWS Relational Database Platforms

Aurora
Postgres SQL
MySQL Server
SQL Server
Oracle
Maria DB

RDS Main Features

Multi AZ Recovery

Have a primary and secondary database, if you lose the primary database, AWS would detect and automatically update the DNS to point at the secondary database.
Used for DISASTER RECOVERY, it doesn’t improve performance.

Read Replicas

Every time you write to the main database, it is replicated in the secondary database.
If you lose the primary database there is no automatic failover, you need to **manually update the URL **to it yourself
IMPROVES PERFORMANCE
Used for scaling
Automatic backups must be turned on
Up to 5 read replicas of any database
It is possible to have read replicas of read replicas - but this can introduce latency.
Each read replica has its own DNS
Can have multi AZ

RDS Backups

Automated Backups

Allows you to recover your database to any point in time within the specified retention period (Max 35 days)
Takes daily snapshots and stores transition logs
When recovering AWS will choose the most recent backup
Enabled by default
Backup data is stored in S3
May experience latency when backup is being taken
Backups are deleted once you remove the original RDS instances

Database Snapshot

User-initiated, must be manually done by yourself
Stored until you explicitly delete them, even after you delete the original RDS instance they are still persisted. However this is not the case with automated backups.

Data Warehousing

RedShift logo

Creates a central place for data and information to be analysed
Can consolidate data from multiple sources
Used for business intelligence tools typically for business analysts, data scientists/engineers.
Used to pull very large complex datasets usually used by management to do queries on data
RedShift is AWS’s data warehouse solution

RedShift

Powerful data warehouse, that can combine/query exabytes of data.
Can work with structured or semi-structured data
Can save query results directly back into your S3 data lake
Can be single node or multi node
Has column compression — compress columns instead of rows because of similar data.
One day backups are enable by default (max days = 35)
Only Redshift can delete these automated snapshots, you can’t delete them manually.
Pricing — compute node hours, backups and data transfer
Encrypted in transport using SSL
Encrypted at rest using KMS or HSM
Only available in one AZ
Can restore to a new AZ

ElastiCache

Elasticache logo

Allows you to deploy, operate & scale in-memory data stores in the cloud.
Improves the performance of web applications, as it allows you to retrieve data fast from memory with high throughput and low latency.
Fully managed hardware provisioning, software patching, setup etc.
Scalable
There are two types of in-memory caching engines:

Memcached — designed for simplicity, so used with you need the simplest model possible.
Redis — works for a wide range of use cases and have multi AZ. You can also complete backups/restores of redis.

Services capable of caching

CloudFront
API Gateway
ElasticCache
Dynamo DB Accelerator

Caching is a balancing act between up-to-date accurate information and latency.

The further up you cache in your architecture the better e.g. at CloudFront level instead of waiting to DB level.

DynamoDB

DynamDB logo

Fast flexible NoSQL database
Allows for storage of large text and binary, but there is a limit of 400KB item size
Delivers single digit millisecond latency at any scale
Fully managed serverless database — no servers to provision, patch, or manage.
Stored on SSD Storage
Spread across 3 geographically distinct datacenters
DynamoDB supports eventually consistent and strongly consistent reads. (eventual consistency is default)
Streams → time ordered sequence of item level modifications in a table (stored up to 24 hours)

Eventual Consistency (best read performance)→ Consistency across data within a second, meaning the response might not reflect the results of a just completed write operation, but if you repeat the read request again it should return the updated data.

Strong Consistency → Returns the latest data. Results should reflect all writes that received a successful response prior to that read!

Global Tables

Fully managed, multi-active & multi-region database
Replicate your DynamoDB tables across selected regions
Used for globally distributed apps
Based on DynamoDB streams
Can be used for Disaster Recovery or high availability

Security in DynamoDB

Encryption at rest using KMS
Can use site to site VPN, direct connect and IAM policies and roles
Can implement fine grain access
Can monitor on Cloud Watch and Cloud trail

DynamoDB Accelerator (DAX)

Managed, highly available in memory cache for DynamoDB
Has up to 10 times performance improvement
Request time reduced to microseconds
DAX manages all in-memory acceleration, so you don’t need to mange things like cache invalidations
Compatible with Dynamo API calls

Aurora

RDS logo

MySQL & PostgresSQL compatible relational database.
Provides 5x better performance than MySQL
Provides 3x better performance than Postgres SQL
Distributed, fault-tolerant, self-healing storage system
2 copies of your data is contained in each Availability Zone (AZ) — minimum of 3 AZ’s and 6 copies.
Can handle the loss of up to 2 copies without affecting write ability.
Can handle lose of up to 3 copies of data without affecting read ability.
Automated backups always enabled — doesn’t impact performance.

Aurora Serverless

On demand autoscaling configuration of Aurora
Automatically starts up, shuts down, and scales based on app needs
Used for simple, cost effective infrequently used, intermittent or unpredictable workloads
Only pay for invocation.

Database migration service (DMS)

DMS logo

Transfer a database to another (on-premise or in cloud or both )
Runs replication software
Source stays functioning the whole time during the migration

Types of migrations

Supports Homogenous Migrations — Identical e.g. oracle to oracle
Supports Hetrogenous Migrations — Different e.g. SQLServer to Aurora. If you do this you will need to use a Schema Conversion Tool (SCT)

Elastic Map Reduce (EMR)

EMR logo

Big data platform for processing large amounts of data
Run petabyte scale analysis
3x faster than apache spark
Makes it easy to set up, operate, & scale big data environments
Workloads run on clusters of EC2 instances call nodes
Different software components are installed in each node
Data is stored on S3 by default
Can configure replication on S3 on 5 min intervals — only on creation!

Node Types

Master Node → Manages cluster, tracks subtasks and monitors health.
Core Node → Has software components to run tasks & store data.
Task Node → Has software component, only runs tasks, can’t store data.