Purpose Built Databases

Purpose-Built Databases

This section describes the evolution of databases and AWS’s purpose-built database offerings designed for specific use cases and workloads.

Evolution of Purpose-Built Databases

The evolution of database technology has paralleled the evolution of application architecture, focusing on splitting applications to improve scaling and resiliency.

Database Evolution Timeline

Opportunity	Database Evolution	AWS Examples
Improve on hierarchical databases’ limited abilities to define relationships among data	Relational	Amazon RDS
Improve performance by separating read-heavy reporting from application’s transactional database	Data warehouse/OLAP	Amazon Redshift
Analyze more varied types of data being generated in large amounts on the internet	Non-relational	DynamoDB
Take advantage of cloud computing’s freedom to scale data stores and ease of connecting microservices to purpose-built data stores	Purpose-built: Document, Wide-column, In-memory, Graph, Timeseries, Ledger	Fully managed database services

Cloud computing gave organizations freedom to scale data stores based on actual usage. Coupled with move toward microservices, cloud flexibility made it attractive to connect different application components to different databases rather than relying on single multipurpose one.

Amazon Redshift

Amazon Redshift is a fully managed, cloud-based data warehousing service designed to handle petabyte-scale analytics workloads.

Key Features

Data Warehouse: Enterprise-class relational database query and management system optimized for reporting and analytics
Columnar Storage: Achieves efficient storage and optimum query performance through massively parallel processing and columnar data storage
Compression: Very efficient, targeted data compression encoding schemes
Managed Service: Handles all work of setting up, operating, and scaling data warehouse
Serverless Option: Amazon Redshift Serverless adjusts capacity in seconds for unpredictable workloads

Use Cases

OLAP Applications: Suitable for storing and analyzing massive amounts of data quickly and efficiently
Machine Learning: Automatically create, train, and deploy ML models for financial and demand forecasts
Data Sharing: Securely share data among accounts, organizations, and partners
Developer Productivity: Simplified data access without configuring drivers and managing database connections

Data warehouses require data organized in tabular format so SQL can be used to query the data, involving reading large amounts of data to understand relationships and trends.

AWS Fully Managed Purpose-Built Database Options

Amazon DocumentDB

Document Database: Fast, scalable document database service with MongoDB compatibility

Amazon Keyspaces

Wide-Column: Highly available managed Apache Cassandra-compatible database service

Amazon MemoryDB

In-Memory: Redis-compatible, durable, in-memory database service for ultra-fast performance

Amazon Neptune

Graph Database: Fast, reliable, fully managed graph database service for highly connected datasets

Amazon Timestream

Timeseries Database: Scalable, fully managed, fast timeseries database service for IoT and operational applications

Amazon QLDB

Ledger Database: Fully managed ledger database that tracks each application data change with complete and verifiable history

All of these databases are fully managed cloud services that help limit time and cost of experimenting and maintaining different types of databases.

Matching Database to Business Need

When choosing a purpose-built database, consider these four key elements:

Analyze your workload requirements to see if they match the database’s capabilities

The concept of purpose-built databases aligns with the performance efficiency pillar of the AWS Well-Architected Framework - selecting the right tool for the job.

Amazon DocumentDB

Suitable Workloads

Require flexible schema for fast, iterative development
Need to store data that has different attributes and data values
Organizations with online customer profiles where different users provide different types of information

Data Model

Document Data Model: Uses JSON-like documents stored as field-value pairs
Flexible Schema: Each document can have different structure
Query Capability: Query on any attribute with flexible indexing and powerful one-time queries

Example simple document:

{
"LName": "Rivera",
"FName": "Martha",
"DOB": "1992-11-16"
}

Features and Benefits

MongoDB-compatible, JSON document database
Great for complex documents that are dynamic and may require one-time querying, indexing, and aggregations
Native integrations with AWS Database Migration Service (AWS DMS) for migration with virtually no downtime
IAM integration for access control to management operations

Common Use Cases

Content Management System (CMS): Collect and aggregate content from variety of sources with flexible schema
Customer Profiles: Store user-generated content including images, comments, and videos
Real-time Big Data: Store and manage operational data from any source while concurrently feeding to business intelligence engines

Amazon Keyspaces

Suitable Workloads

Fast querying capability of high volumes of data
Scalability and consistent performance on heavy write loads
Read/write balanced or write-heavy workloads

Data Model

Wide Column Data Model: Data stored in flexible columns, permits data to evolve over time
Partitioned Storage: Data partitioned across distributed database systems
Two-Dimensional Key-Value: Extends basic key-value data model with additional dimension
CQL Support: Can use Cassandra Query Language (CQL) on your data

Wide-column databases work well when read/write is balanced or for heavy write operations, providing massive scalability and consistent performance.

Features and Benefits

Scalable, highly available, and managed Apache Cassandra-compatible database service
Performance, elasticity, and enterprise features for business-critical Cassandra workloads at scale
Pay only for resources used with automatic scaling up and down based on application traffic
Continuous table backups with hundreds of terabytes with no performance impact
Point-in-time recovery for preceding 35 days

Common Use Cases

Industrial Equipment Maintenance: Process data at high speeds for applications requiring single-digit-millisecond latency
Trade Monitoring: High-speed data processing for financial applications
Fleet Management: Route optimization and fleet management applications
Migration: Move existing Cassandra workloads to the cloud

Amazon MemoryDB

Suitable Workloads

Latency-sensitive workloads requiring extremely low response times
High request rates (up to hundreds of millions of requests per second)
High data throughput needs (GB per second read data access)
High durability requirements with no data loss

Data Model

In-Memory Database: Relies primarily on memory for data storage
Minimal Response Time: Eliminates need to access disks for fastest possible performance
Memory-First Architecture: Entire dataset stored in memory for immediate access

Features and Benefits

Redis Compatibility: Compatible with open source Redis, supporting same data types, parameters, and commands
Data Durability: Leverages distributed transactional log to provide both in-memory speed and data durability
Consistency and Recoverability: Provides data consistency and recoverability alongside in-memory performance
Fully Managed: Primary database solution without separately managing cache or durable database

Common Use Cases

Amazon Neptune

Suitable Workloads

Find connections or paths in data
Combine data with complex relationships across data silos
Navigate highly connected datasets and filter results based on certain variables
Answer questions about relationships themselves rather than just business processes

Data Model

Graph Data Model: Stores data and relationships of that data to other data as equally important
Nodes and Edges Structure: Quickly create and navigate relationships between data
Relationship-First: Optimized for storing and querying relationships rather than traditional JOIN operations

Features and Benefits

High Performance: Graph database engine that efficiently stores and navigates graph data
Scale-Up Architecture: In-memory-optimized architecture for fast query evaluation over large graphs
Multiple Query Languages: Supports Apache TinkerPop Gremlin, W3C SPARQL, and openCypher
High Availability: Support for up to 15 read replicas, hundreds of thousands of queries per second

Common Use Cases

Recommendation Engines: Product recommendations based on user behavior and preferences
Fraud Detection: Identify suspicious patterns and connections in financial transactions
Knowledge Graphs: Organize and navigate complex information relationships
Drug Discovery: Model complex molecular and biological relationships
Social Networking: Navigate and analyze social connections and interactions

Amazon Timestream

Suitable Workloads

Identify patterns and trends over time
Determine value or performance over time for data-driven business decisions
Rely on efficient data processing and analytics of time-sequenced data
Require ease of data management for timeseries workloads

Data Model

Timeseries Data Model: Sequence of data points recorded over time interval for measuring events that change over time
Time-Sequenced Storage: Collect, store, and process data sequenced by time
Temporal Analytics: Built for analyzing how values change over temporal sequences

Features and Benefits

Serverless Database: Simplifies data access and provides durable, secure way to derive insights
Transparent Access: Query engine transparently accesses and combines data across storage tiers
Built-in Functions: SQL-based analysis with built-in timeseries functions for smoothing, approximation, and interpolation
Advanced Analytics: Supports advanced aggregates, window functions, and complex data types
Multi-AZ Durability: Automatically replicates data across different Availability Zones

Common Use Cases

IoT Applications: Analyze timeseries data generated by IoT applications using built-in analytic functions to identify trends and patterns
Operational Metrics: Collect and analyze operational metrics to monitor health and usage
Web Traffic Analysis: Store and process web traffic data for applications with real-time analysis for performance improvements
Financial Analysis: Track measure and values over sequence of timestamps for cyclical trend analysis
Call Center Analytics: Collect call volume data over timestamps to scale business processes

Amazon QLDB

Suitable Workloads

Maintain accurate history of application data
Track history of financial transactions requiring audit trails
Verify data lineage of claims with cryptographic verification
Meet audit and compliance requirements with verifiable data

Data Model

Ledger Database: Provides immutable and verifiable history of all changes to application data
Cryptographic Verification: Uses different methods to ensure data is immutable and cryptographically verifiable
Audit-Focused: Designed specifically for maintaining accurate, traceable data history

Features and Benefits

Transparent and Immutable: Provides transparent, immutable, and cryptographically verifiable transaction log
Built-in Data Integrity: Trust integrity of data with built-in cryptographic verification enabling third-party validation
Consistent Event Store: Track and maintain sequenced history of every application data change using immutable journal
ACID Transactions: Support for real-time streaming to Amazon Kinesis with ACID transaction support
SQL-Based Queries: Query data using SQL-based language called PartiQL for document database flexibility

Common Use Cases

Financial Transactions: Creating complete and accurate record of all financial transactions (credit and debit)
Supply Chain: Recording history of each transaction and providing details of every batch manufactured, shipped, stored, and sold
Insurance Claims: Tracking claim over lifetime and cryptographically verifying data integrity for resilience against data entry errors and manipulation
Compliance: Maintaining verifiable audit trails for regulatory compliance requirements

Ledger databases are often slower and not ideal for complex or quick reads and writes, but provide unmatched data integrity and verification capabilities.

Purpose-built databases enable organizations to select optimal database solutions that match specific workload requirements, data models, and use cases rather than forcing all applications to use a single multipurpose database solution.