Chloe McAree (McAteer)
Published on

AWS Databases

Authors

This is part of a blog series giving a high level overview of the different services examined on the AWS Solution Architect Associate exam, to view the whole series click here.

Relational Database Service (RDS)

RDS logo

  • Allows you to create and scale relational databases in the cloud

  • RDS runs on virtual machines (can’t log in to the OS or SSH in)

  • AWS handles admin tasks for you like hardware provisioning, patching & backups.

  • RDS is not serverless — (one exception Aurora Serverless)

  • Allows you to control network access to your database

  • Offers encryption at rest — done with KMS (data stored, automated backups, read replicas and snapshots all encrypted)

Supported AWS Relational Database Platforms

  • Aurora

  • Postgres SQL

  • MySQL Server

  • SQL Server

  • Oracle

  • Maria DB

RDS Main Features

Multi AZ Recovery

  • Have a primary and secondary database, if you lose the primary database, AWS would detect and automatically update the DNS to point at the secondary database.

  • Used for DISASTER RECOVERY, it doesn’t improve performance.

Read Replicas

  • Every time you write to the main database, it is replicated in the secondary database.

  • If you lose the primary database there is no automatic failover, you need to **manually update the URL **to it yourself

  • IMPROVES PERFORMANCE

  • Used for scaling

  • Automatic backups must be turned on

  • Up to 5 read replicas of any database

  • It is possible to have read replicas of read replicas - but this can introduce latency.

  • Each read replica has its own DNS

  • Can have multi AZ

RDS Backups

Automated Backups

  • Allows you to recover your database to any point in time within the specified retention period (Max 35 days)

  • Takes daily snapshots and stores transition logs

  • When recovering AWS will choose the most recent backup

  • Enabled by default

  • Backup data is stored in S3

  • May experience latency when backup is being taken

  • Backups are deleted once you remove the original RDS instances

Database Snapshot

  • User-initiated, must be manually done by yourself

  • Stored until you explicitly delete them, even after you delete the original RDS instance they are still persisted. However this is not the case with automated backups.

Data Warehousing

RedShift logo

  • Creates a central place for data and information to be analysed

  • Can consolidate data from multiple sources

  • Used for business intelligence tools typically for business analysts, data scientists/engineers.

  • Used to pull very large complex datasets usually used by management to do queries on data

  • RedShift is AWS’s data warehouse solution

RedShift

  • Powerful data warehouse, that can combine/query exabytes of data.

  • Can work with structured or semi-structured data

  • Can save query results directly back into your S3 data lake

  • Can be single node or multi node

  • Has column compression — compress columns instead of rows because of similar data.

  • One day backups are enable by default (max days = 35)

  • Only Redshift can delete these automated snapshots, you can’t delete them manually.

  • Pricing — compute node hours, backups and data transfer

  • Encrypted in transport using SSL

  • Encrypted at rest using KMS or HSM

  • Only available in one AZ

  • Can restore to a new AZ

ElastiCache

Elasticache logo

  • Allows you to deploy, operate & scale in-memory data stores in the cloud.

  • Improves the performance of web applications, as it allows you to retrieve data fast from memory with high throughput and low latency.

  • Fully managed hardware provisioning, software patching, setup etc.

  • Scalable

  • There are two types of in-memory caching engines:

  1. Memcached — designed for simplicity, so used with you need the simplest model possible.

  2. Redis — works for a wide range of use cases and have multi AZ. You can also complete backups/restores of redis.

Services capable of caching

  • CloudFront

  • API Gateway

  • ElasticCache

  • Dynamo DB Accelerator

Caching is a balancing act between up-to-date accurate information and latency.

The further up you cache in your architecture the better e.g. at CloudFront level instead of waiting to DB level.

DynamoDB

DynamDB logo

  • Fast flexible NoSQL database

  • Allows for storage of large text and binary, but there is a limit of 400KB item size

  • Delivers single digit millisecond latency at any scale

  • Fully managed serverless database — no servers to provision, patch, or manage.

  • Stored on SSD Storage

  • Spread across 3 geographically distinct datacenters

  • DynamoDB supports eventually consistent and strongly consistent reads. (eventual consistency is default)

  • Streams → time ordered sequence of item level modifications in a table (stored up to 24 hours)

Eventual Consistency (best read performance)→ Consistency across data within a second, meaning the response might not reflect the results of a just completed write operation, but if you repeat the read request again it should return the updated data.

Strong Consistency → Returns the latest data. Results should reflect all writes that received a successful response prior to that read!

Global Tables

  • Fully managed, multi-active & multi-region database

  • Replicate your DynamoDB tables across selected regions

  • Used for globally distributed apps

  • Based on DynamoDB streams

  • Can be used for Disaster Recovery or high availability

Security in DynamoDB

  • Encryption at rest using KMS

  • Can use site to site VPN, direct connect and IAM policies and roles

  • Can implement fine grain access

  • Can monitor on Cloud Watch and Cloud trail

DynamoDB Accelerator (DAX)

  • Managed, highly available in memory cache for DynamoDB

  • Has up to 10 times performance improvement

  • Request time reduced to microseconds

  • DAX manages all in-memory acceleration, so you don’t need to mange things like cache invalidations

  • Compatible with Dynamo API calls

Aurora

RDS logo

  • MySQL & PostgresSQL compatible relational database.

  • Provides 5x better performance than MySQL

  • Provides 3x better performance than Postgres SQL

  • Distributed, fault-tolerant, self-healing storage system

  • 2 copies of your data is contained in each Availability Zone (AZ) — minimum of 3 AZ’s and 6 copies.

  • Can handle the loss of up to 2 copies without affecting write ability.

  • Can handle lose of up to 3 copies of data without affecting read ability.

  • Automated backups always enabled — doesn’t impact performance.

Aurora Serverless

  • On demand autoscaling configuration of Aurora

  • Automatically starts up, shuts down, and scales based on app needs

  • Used for simple, cost effective infrequently used, intermittent or unpredictable workloads

  • Only pay for invocation.

Database migration service (DMS)

DMS logo

  • Transfer a database to another (on-premise or in cloud or both )

  • Runs replication software

  • Source stays functioning the whole time during the migration

Types of migrations

  1. Supports Homogenous Migrations — Identical e.g. oracle to oracle

  2. Supports Hetrogenous Migrations — Different e.g. SQLServer to Aurora. If you do this you will need to use a Schema Conversion Tool (SCT)

Elastic Map Reduce (EMR)

EMR logo

  • Big data platform for processing large amounts of data

  • Run petabyte scale analysis

  • 3x faster than apache spark

  • Makes it easy to set up, operate, & scale big data environments

  • Workloads run on clusters of EC2 instances call nodes

  • Different software components are installed in each node

  • Data is stored on S3 by default

  • Can configure replication on S3 on 5 min intervals — only on creation!

Node Types

  1. Master Node → Manages cluster, tracks subtasks and monitors health.

  2. Core Node → Has software components to run tasks & store data.

  3. Task Node → Has software component, only runs tasks, can’t store data.