Skip to content
Pablo Rodriguez

Database Considerations

When selecting a database as an architect, you need to consider several key factors that will inform your decision-making process.

  • Throughput requirements: How much throughput is needed? Will it scale?
  • Traditional on-premises databases can have unpredictable performance impacts during scaling and might require downtime
  • Underprovisioning can cause applications to stop working
  • Overprovisioning increases upfront costs by procuring unnecessary resources, violating cost-optimization principles
  • Choose a database solution that handles needed throughput at launch and can scale up later
  • Consider storage requirements of your workload
  • Database size needed: gigabytes, terabytes, or petabytes?
  • Different database architectures support different maximum data capacities
  • Some designs are ideal for traditional applications while others are for caching or session management
  • Data model: Is it relational, structured or semi-structured, using a highly connected dataset, or timeseries?
  • Data access patterns: How do you need to access your data?
  • Latency requirements: Do you need low-latency data?
  • Data record size: Is there a particular data record size you have in mind?
  • Data durability: Assurance that your data will not be lost
  • Data availability: Your ability to access your data when you want to
  • Critical business data should use database solutions that store multiple redundant copies across multiple geographically separated physical locations
  • Consider data residency or regulatory obligations (regional data privacy laws)
  • Balance business needs with cost considerations

Structure: Tabular form of columns and rows

Schema: Strict schema rules

Benefits:

  • Ease of use
  • Data integrity
  • Reduced data storage
  • Common language (SQL)

Use Cases:

  • Migrating an on-premises relational workload
  • Online transactional processing
  • Well-defined schema structure that doesn’t change often

Optimization: Optimized for structured data stored in tables; supports complex one-time queries through joins

ACID Compliance: Transactions are atomic, consistent, isolated, and durable

Non-relational databases (NoSQL) are purpose-built databases for specific data models including key-value, graph, document, in-memory, and search. They can store structured, semi-structured, and unstructured data with flexible schemas where each object can have a different structure.

Database capacity planning involves considering current and future capacity when selecting and updating database resources. The goal is to adjust and optimize database resources based on usage patterns and forecasts.

  1. Analyze current storage capacity
  2. Predict capacity requirements
  3. Determine if horizontal scaling, vertical scaling, or a combination is needed

Vertical Scaling

Expanding resources that existing server uses to increase capacity:

  • Upgrading memory, storage, or processing power
  • Complex and time-consuming process
  • Usually requires database downtime

Horizontal Scaling

Increasing the number of servers that the database runs on:

  • Decreases load on each server
  • Compute capacity added while database is running
  • Usually happens without downtime

With AWS managed database services, your responsibility decreases significantly compared to hosting databases on-premises or on EC2 instances.

TaskOn-PremisesAmazon EC2Managed AWS Database Service
Power, HVAC, Network
Rack and Stack
Server Maintenance
OS Installation
OS Patches
Database Installation
Database Patches
Database Backups
High Availability
Scaling
Application Optimization

With managed AWS database services, you are responsible only for optimizing your queries and ensuring the database layer works efficiently for your application. These solutions provide high availability, scalability, and database backups as built-in configurable options.

Database capacity planning is a continuous process where you monitor to determine whether existing infrastructure can sustain anticipated workload, and evaluate cost dynamics of scaling up infrastructure when needed.