This section describes the evolution of databases and AWS’s purpose-built database offerings designed for specific use cases and workloads.
The evolution of database technology has paralleled the evolution of application architecture, focusing on splitting applications to improve scaling and resiliency.
Opportunity Database Evolution AWS Examples Improve on hierarchical databases’ limited abilities to define relationships among data Relational Amazon RDS Improve performance by separating read-heavy reporting from application’s transactional database Data warehouse/OLAP Amazon Redshift Analyze more varied types of data being generated in large amounts on the internet Non-relational DynamoDB Take advantage of cloud computing’s freedom to scale data stores and ease of connecting microservices to purpose-built data stores Purpose-built : Document, Wide-column, In-memory, Graph, Timeseries, LedgerFully managed database services
Cloud computing gave organizations freedom to scale data stores based on actual usage. Coupled with move toward microservices, cloud flexibility made it attractive to connect different application components to different databases rather than relying on single multipurpose one.
Amazon Redshift is a fully managed, cloud-based data warehousing service designed to handle petabyte-scale analytics workloads.
Data Warehouse : Enterprise-class relational database query and management system optimized for reporting and analytics
Columnar Storage : Achieves efficient storage and optimum query performance through massively parallel processing and columnar data storage
Compression : Very efficient, targeted data compression encoding schemes
Managed Service : Handles all work of setting up, operating, and scaling data warehouse
Serverless Option : Amazon Redshift Serverless adjusts capacity in seconds for unpredictable workloads
OLAP Applications : Suitable for storing and analyzing massive amounts of data quickly and efficiently
Machine Learning : Automatically create, train, and deploy ML models for financial and demand forecasts
Data Sharing : Securely share data among accounts, organizations, and partners
Developer Productivity : Simplified data access without configuring drivers and managing database connections
Data warehouses require data organized in tabular format so SQL can be used to query the data, involving reading large amounts of data to understand relationships and trends.
Amazon DocumentDB
Document Database : Fast, scalable document database service with MongoDB compatibility
Amazon Keyspaces
Wide-Column : Highly available managed Apache Cassandra-compatible database service
Amazon MemoryDB
In-Memory : Redis-compatible, durable, in-memory database service for ultra-fast performance
Amazon Neptune
Graph Database : Fast, reliable, fully managed graph database service for highly connected datasets
Amazon Timestream
Timeseries Database : Scalable, fully managed, fast timeseries database service for IoT and operational applications
Amazon QLDB
Ledger Database : Fully managed ledger database that tracks each application data change with complete and verifiable history
All of these databases are fully managed cloud services that help limit time and cost of experimenting and maintaining different types of databases.
When choosing a purpose-built database, consider these four key elements:
Analyze your workload requirements to see if they match the database’s capabilities
Understand the characteristics of the data model you would need to use with the database
Familiarize yourself with key features and configuration options to optimize performance
Review common use cases to find reference architectures and examples
The concept of purpose-built databases aligns with the performance efficiency pillar of the AWS Well-Architected Framework - selecting the right tool for the job.
Require flexible schema for fast, iterative development
Need to store data that has different attributes and data values
Organizations with online customer profiles where different users provide different types of information
Document Data Model : Uses JSON-like documents stored as field-value pairs
Flexible Schema : Each document can have different structure
Query Capability : Query on any attribute with flexible indexing and powerful one-time queries
Example simple document:
MongoDB-compatible, JSON document database
Great for complex documents that are dynamic and may require one-time querying, indexing, and aggregations
Native integrations with AWS Database Migration Service (AWS DMS) for migration with virtually no downtime
IAM integration for access control to management operations
Content Management System (CMS) : Collect and aggregate content from variety of sources with flexible schema
Customer Profiles : Store user-generated content including images, comments, and videos
Real-time Big Data : Store and manage operational data from any source while concurrently feeding to business intelligence engines
Fast querying capability of high volumes of data
Scalability and consistent performance on heavy write loads
Read/write balanced or write-heavy workloads
Wide Column Data Model : Data stored in flexible columns, permits data to evolve over time
Partitioned Storage : Data partitioned across distributed database systems
Two-Dimensional Key-Value : Extends basic key-value data model with additional dimension
CQL Support : Can use Cassandra Query Language (CQL) on your data
Wide-column databases work well when read/write is balanced or for heavy write operations, providing massive scalability and consistent performance.
Scalable, highly available, and managed Apache Cassandra-compatible database service
Performance, elasticity, and enterprise features for business-critical Cassandra workloads at scale
Pay only for resources used with automatic scaling up and down based on application traffic
Continuous table backups with hundreds of terabytes with no performance impact
Point-in-time recovery for preceding 35 days
Industrial Equipment Maintenance : Process data at high speeds for applications requiring single-digit-millisecond latency
Trade Monitoring : High-speed data processing for financial applications
Fleet Management : Route optimization and fleet management applications
Migration : Move existing Cassandra workloads to the cloud
Latency-sensitive workloads requiring extremely low response times
High request rates (up to hundreds of millions of requests per second)
High data throughput needs (GB per second read data access)
High durability requirements with no data loss
In-Memory Database : Relies primarily on memory for data storage
Minimal Response Time : Eliminates need to access disks for fastest possible performance
Memory-First Architecture : Entire dataset stored in memory for immediate access
Redis Compatibility : Compatible with open source Redis, supporting same data types, parameters, and commands
Data Durability : Leverages distributed transactional log to provide both in-memory speed and data durability
Consistency and Recoverability : Provides data consistency and recoverability alongside in-memory performance
Fully Managed : Primary database solution without separately managing cache or durable database
Tip
Industry Applications :
Retail : Customer profiles requiring immediate access
Gaming : Leaderboards with real-time updates
Banking : User transactions requiring instant processing
Caching : Scenarios needing multiple requests or highly dynamic data without storage latency
Find connections or paths in data
Combine data with complex relationships across data silos
Navigate highly connected datasets and filter results based on certain variables
Answer questions about relationships themselves rather than just business processes
Graph Data Model : Stores data and relationships of that data to other data as equally important
Nodes and Edges Structure : Quickly create and navigate relationships between data
Relationship-First : Optimized for storing and querying relationships rather than traditional JOIN operations
High Performance : Graph database engine that efficiently stores and navigates graph data
Scale-Up Architecture : In-memory-optimized architecture for fast query evaluation over large graphs
Multiple Query Languages : Supports Apache TinkerPop Gremlin, W3C SPARQL, and openCypher
High Availability : Support for up to 15 read replicas, hundreds of thousands of queries per second
Recommendation Engines : Product recommendations based on user behavior and preferences
Fraud Detection : Identify suspicious patterns and connections in financial transactions
Knowledge Graphs : Organize and navigate complex information relationships
Drug Discovery : Model complex molecular and biological relationships
Social Networking : Navigate and analyze social connections and interactions
Identify patterns and trends over time
Determine value or performance over time for data-driven business decisions
Rely on efficient data processing and analytics of time-sequenced data
Require ease of data management for timeseries workloads
Timeseries Data Model : Sequence of data points recorded over time interval for measuring events that change over time
Time-Sequenced Storage : Collect, store, and process data sequenced by time
Temporal Analytics : Built for analyzing how values change over temporal sequences
Serverless Database : Simplifies data access and provides durable, secure way to derive insights
Transparent Access : Query engine transparently accesses and combines data across storage tiers
Built-in Functions : SQL-based analysis with built-in timeseries functions for smoothing, approximation, and interpolation
Advanced Analytics : Supports advanced aggregates, window functions, and complex data types
Multi-AZ Durability : Automatically replicates data across different Availability Zones
IoT Applications : Analyze timeseries data generated by IoT applications using built-in analytic functions to identify trends and patterns
Operational Metrics : Collect and analyze operational metrics to monitor health and usage
Web Traffic Analysis : Store and process web traffic data for applications with real-time analysis for performance improvements
Financial Analysis : Track measure and values over sequence of timestamps for cyclical trend analysis
Call Center Analytics : Collect call volume data over timestamps to scale business processes
Maintain accurate history of application data
Track history of financial transactions requiring audit trails
Verify data lineage of claims with cryptographic verification
Meet audit and compliance requirements with verifiable data
Ledger Database : Provides immutable and verifiable history of all changes to application data
Cryptographic Verification : Uses different methods to ensure data is immutable and cryptographically verifiable
Audit-Focused : Designed specifically for maintaining accurate, traceable data history
Transparent and Immutable : Provides transparent, immutable, and cryptographically verifiable transaction log
Built-in Data Integrity : Trust integrity of data with built-in cryptographic verification enabling third-party validation
Consistent Event Store : Track and maintain sequenced history of every application data change using immutable journal
ACID Transactions : Support for real-time streaming to Amazon Kinesis with ACID transaction support
SQL-Based Queries : Query data using SQL-based language called PartiQL for document database flexibility
Financial Transactions : Creating complete and accurate record of all financial transactions (credit and debit)
Supply Chain : Recording history of each transaction and providing details of every batch manufactured, shipped, stored, and sold
Insurance Claims : Tracking claim over lifetime and cryptographically verifying data integrity for resilience against data entry errors and manipulation
Compliance : Maintaining verifiable audit trails for regulatory compliance requirements
Ledger databases are often slower and not ideal for complex or quick reads and writes, but provide unmatched data integrity and verification capabilities.
Purpose-built databases enable organizations to select optimal database solutions that match specific workload requirements, data models, and use cases rather than forcing all applications to use a single multipurpose database solution.