Skip to content
Pablo Rodriguez

Purpose Built Databases

This section describes the evolution of databases and AWS’s purpose-built database offerings designed for specific use cases and workloads.

The evolution of database technology has paralleled the evolution of application architecture, focusing on splitting applications to improve scaling and resiliency.

OpportunityDatabase EvolutionAWS Examples
Improve on hierarchical databases’ limited abilities to define relationships among dataRelationalAmazon RDS
Improve performance by separating read-heavy reporting from application’s transactional databaseData warehouse/OLAPAmazon Redshift
Analyze more varied types of data being generated in large amounts on the internetNon-relationalDynamoDB
Take advantage of cloud computing’s freedom to scale data stores and ease of connecting microservices to purpose-built data storesPurpose-built: Document, Wide-column, In-memory, Graph, Timeseries, LedgerFully managed database services

Cloud computing gave organizations freedom to scale data stores based on actual usage. Coupled with move toward microservices, cloud flexibility made it attractive to connect different application components to different databases rather than relying on single multipurpose one.

Amazon Redshift is a fully managed, cloud-based data warehousing service designed to handle petabyte-scale analytics workloads.

  • Data Warehouse: Enterprise-class relational database query and management system optimized for reporting and analytics
  • Columnar Storage: Achieves efficient storage and optimum query performance through massively parallel processing and columnar data storage
  • Compression: Very efficient, targeted data compression encoding schemes
  • Managed Service: Handles all work of setting up, operating, and scaling data warehouse
  • Serverless Option: Amazon Redshift Serverless adjusts capacity in seconds for unpredictable workloads
  • OLAP Applications: Suitable for storing and analyzing massive amounts of data quickly and efficiently
  • Machine Learning: Automatically create, train, and deploy ML models for financial and demand forecasts
  • Data Sharing: Securely share data among accounts, organizations, and partners
  • Developer Productivity: Simplified data access without configuring drivers and managing database connections

Data warehouses require data organized in tabular format so SQL can be used to query the data, involving reading large amounts of data to understand relationships and trends.

AWS Fully Managed Purpose-Built Database Options

Section titled “AWS Fully Managed Purpose-Built Database Options”

Amazon DocumentDB

Document Database: Fast, scalable document database service with MongoDB compatibility

Amazon Keyspaces

Wide-Column: Highly available managed Apache Cassandra-compatible database service

Amazon MemoryDB

In-Memory: Redis-compatible, durable, in-memory database service for ultra-fast performance

Amazon Neptune

Graph Database: Fast, reliable, fully managed graph database service for highly connected datasets

Amazon Timestream

Timeseries Database: Scalable, fully managed, fast timeseries database service for IoT and operational applications

Amazon QLDB

Ledger Database: Fully managed ledger database that tracks each application data change with complete and verifiable history

All of these databases are fully managed cloud services that help limit time and cost of experimenting and maintaining different types of databases.

When choosing a purpose-built database, consider these four key elements:

Analyze your workload requirements to see if they match the database’s capabilities

The concept of purpose-built databases aligns with the performance efficiency pillar of the AWS Well-Architected Framework - selecting the right tool for the job.

  • Require flexible schema for fast, iterative development
  • Need to store data that has different attributes and data values
  • Organizations with online customer profiles where different users provide different types of information
  • Document Data Model: Uses JSON-like documents stored as field-value pairs
  • Flexible Schema: Each document can have different structure
  • Query Capability: Query on any attribute with flexible indexing and powerful one-time queries

Example simple document:

customer-profile.json
{
"LName": "Rivera",
"FName": "Martha",
"DOB": "1992-11-16"
}
  • MongoDB-compatible, JSON document database
  • Great for complex documents that are dynamic and may require one-time querying, indexing, and aggregations
  • Native integrations with AWS Database Migration Service (AWS DMS) for migration with virtually no downtime
  • IAM integration for access control to management operations
  • Content Management System (CMS): Collect and aggregate content from variety of sources with flexible schema
  • Customer Profiles: Store user-generated content including images, comments, and videos
  • Real-time Big Data: Store and manage operational data from any source while concurrently feeding to business intelligence engines
  • Fast querying capability of high volumes of data
  • Scalability and consistent performance on heavy write loads
  • Read/write balanced or write-heavy workloads
  • Wide Column Data Model: Data stored in flexible columns, permits data to evolve over time
  • Partitioned Storage: Data partitioned across distributed database systems
  • Two-Dimensional Key-Value: Extends basic key-value data model with additional dimension
  • CQL Support: Can use Cassandra Query Language (CQL) on your data

Wide-column databases work well when read/write is balanced or for heavy write operations, providing massive scalability and consistent performance.

  • Scalable, highly available, and managed Apache Cassandra-compatible database service
  • Performance, elasticity, and enterprise features for business-critical Cassandra workloads at scale
  • Pay only for resources used with automatic scaling up and down based on application traffic
  • Continuous table backups with hundreds of terabytes with no performance impact
  • Point-in-time recovery for preceding 35 days
  • Industrial Equipment Maintenance: Process data at high speeds for applications requiring single-digit-millisecond latency
  • Trade Monitoring: High-speed data processing for financial applications
  • Fleet Management: Route optimization and fleet management applications
  • Migration: Move existing Cassandra workloads to the cloud
  • Latency-sensitive workloads requiring extremely low response times
  • High request rates (up to hundreds of millions of requests per second)
  • High data throughput needs (GB per second read data access)
  • High durability requirements with no data loss
  • In-Memory Database: Relies primarily on memory for data storage
  • Minimal Response Time: Eliminates need to access disks for fastest possible performance
  • Memory-First Architecture: Entire dataset stored in memory for immediate access
  • Redis Compatibility: Compatible with open source Redis, supporting same data types, parameters, and commands
  • Data Durability: Leverages distributed transactional log to provide both in-memory speed and data durability
  • Consistency and Recoverability: Provides data consistency and recoverability alongside in-memory performance
  • Fully Managed: Primary database solution without separately managing cache or durable database
  • Find connections or paths in data
  • Combine data with complex relationships across data silos
  • Navigate highly connected datasets and filter results based on certain variables
  • Answer questions about relationships themselves rather than just business processes
  • Graph Data Model: Stores data and relationships of that data to other data as equally important
  • Nodes and Edges Structure: Quickly create and navigate relationships between data
  • Relationship-First: Optimized for storing and querying relationships rather than traditional JOIN operations
  • High Performance: Graph database engine that efficiently stores and navigates graph data
  • Scale-Up Architecture: In-memory-optimized architecture for fast query evaluation over large graphs
  • Multiple Query Languages: Supports Apache TinkerPop Gremlin, W3C SPARQL, and openCypher
  • High Availability: Support for up to 15 read replicas, hundreds of thousands of queries per second
  • Recommendation Engines: Product recommendations based on user behavior and preferences
  • Fraud Detection: Identify suspicious patterns and connections in financial transactions
  • Knowledge Graphs: Organize and navigate complex information relationships
  • Drug Discovery: Model complex molecular and biological relationships
  • Social Networking: Navigate and analyze social connections and interactions
  • Identify patterns and trends over time
  • Determine value or performance over time for data-driven business decisions
  • Rely on efficient data processing and analytics of time-sequenced data
  • Require ease of data management for timeseries workloads
  • Timeseries Data Model: Sequence of data points recorded over time interval for measuring events that change over time
  • Time-Sequenced Storage: Collect, store, and process data sequenced by time
  • Temporal Analytics: Built for analyzing how values change over temporal sequences
  • Serverless Database: Simplifies data access and provides durable, secure way to derive insights
  • Transparent Access: Query engine transparently accesses and combines data across storage tiers
  • Built-in Functions: SQL-based analysis with built-in timeseries functions for smoothing, approximation, and interpolation
  • Advanced Analytics: Supports advanced aggregates, window functions, and complex data types
  • Multi-AZ Durability: Automatically replicates data across different Availability Zones
  • IoT Applications: Analyze timeseries data generated by IoT applications using built-in analytic functions to identify trends and patterns
  • Operational Metrics: Collect and analyze operational metrics to monitor health and usage
  • Web Traffic Analysis: Store and process web traffic data for applications with real-time analysis for performance improvements
  • Financial Analysis: Track measure and values over sequence of timestamps for cyclical trend analysis
  • Call Center Analytics: Collect call volume data over timestamps to scale business processes
  • Maintain accurate history of application data
  • Track history of financial transactions requiring audit trails
  • Verify data lineage of claims with cryptographic verification
  • Meet audit and compliance requirements with verifiable data
  • Ledger Database: Provides immutable and verifiable history of all changes to application data
  • Cryptographic Verification: Uses different methods to ensure data is immutable and cryptographically verifiable
  • Audit-Focused: Designed specifically for maintaining accurate, traceable data history
  • Transparent and Immutable: Provides transparent, immutable, and cryptographically verifiable transaction log
  • Built-in Data Integrity: Trust integrity of data with built-in cryptographic verification enabling third-party validation
  • Consistent Event Store: Track and maintain sequenced history of every application data change using immutable journal
  • ACID Transactions: Support for real-time streaming to Amazon Kinesis with ACID transaction support
  • SQL-Based Queries: Query data using SQL-based language called PartiQL for document database flexibility
  • Financial Transactions: Creating complete and accurate record of all financial transactions (credit and debit)
  • Supply Chain: Recording history of each transaction and providing details of every batch manufactured, shipped, stored, and sold
  • Insurance Claims: Tracking claim over lifetime and cryptographically verifying data integrity for resilience against data entry errors and manipulation
  • Compliance: Maintaining verifiable audit trails for regulatory compliance requirements

Ledger databases are often slower and not ideal for complex or quick reads and writes, but provide unmatched data integrity and verification capabilities.

Purpose-built databases enable organizations to select optimal database solutions that match specific workload requirements, data models, and use cases rather than forcing all applications to use a single multipurpose database solution.