Well Architected Framework
Applying the AWS Well-Architected Framework Principles to Data Pipelines
Section titled “Applying the AWS Well-Architected Framework Principles to Data Pipelines”AWS Well-Architected Data Analytics Lens
Section titled “AWS Well-Architected Data Analytics Lens”Purpose and Benefits
Section titled “Purpose and Benefits”The Data Analytics Lens is a collection of customer-proven best practices for designing well-architected analytics workloads, containing insights gathered from real-world case studies. This provides IT architects and developers a way to evaluate analytics workloads without becoming subject matter experts.
Review Process
Section titled “Review Process”-
Define the Workload: Identify set of components that together deliver business value (marketing websites, e-commerce, mobile backends, analytic platforms)
-
Evaluate Against Pillar Design Principles: Prioritize pillars by importance and identify most important design principles for each pillar
-
Implement Best Practices: Proceed with implementation and continue regular evaluations to add more best practices
The tools and techniques covered in this module align to AWS Well-Architected Framework pillars. Continual evaluation ensures solutions remain the best fit for analytics workloads.
Security Pillar
Section titled “Security Pillar”Best Practice: Control Access to Workload Infrastructure
Section titled “Best Practice: Control Access to Workload Infrastructure”Implementation: “Implement policies of least privilege for source and downstream systems”
The security pillar encompasses protection of data, systems, and assets to improve security using cloud technologies. Analytics environments change based on evolving data processing and distribution requirements.
Key Principles
Section titled “Key Principles”- Least Privilege Access: Give only enough access for systems to perform their jobs
- Permission Boundaries: System actions on data should determine permissions
- Role-Based Access: Identify minimum privileges each user requires
- Granular Controls: Grant only necessary permissions (e.g., read-only table access for business analysts)
Implementation Considerations
Section titled “Implementation Considerations”- Analytics environments change frequently with evolving requirements
- Ensure environment accessibility with minimum necessary permissions
- Regularly review and update access controls as requirements change
Performance Efficiency Pillar
Section titled “Performance Efficiency Pillar”Best Practice: Choose the Best-Performing Compute Solution
Section titled “Best Practice: Choose the Best-Performing Compute Solution”Implementation: “Identify analytics solutions that best suit your technical challenges”
The performance efficiency pillar focuses on efficient use of resources to meet requirements as demand changes and technologies evolve.
AWS Analytics Service Selection
Section titled “AWS Analytics Service Selection”- Amazon Redshift: Data warehousing for structured analytics
- Kinesis: Streaming data processing for real-time analytics
- QuickSight: Data visualization and business intelligence
- Purpose-Built Services: Each designed to overcome specific challenges
Selection Criteria
Section titled “Selection Criteria”- Business Requirements: Match tools to business and technical requirements
- Use Case Fit: Identify right tool for specific jobs
- Technical Challenges: Choose services that address particular analytical challenges
- Example: Café clickstream visualization using QuickSight and Athena
Cost Optimization Pillar
Section titled “Cost Optimization Pillar”Best Practice: Manage Cost Over Time
Section titled “Best Practice: Manage Cost Over Time”Implementation: Two key approaches for cost management
Remove Unused Data and Infrastructure
Section titled “Remove Unused Data and Infrastructure”- Data Retention: Delete data past retention period to reduce storage costs
- Metadata Catalog: Identify data outside retention periods
- Automation: Use Amazon S3 lifecycle configurations for automatic data expiration
- Regular Cleanup: Implement standardized process to identify and remove unused resources
Reduce Infrastructure Overprovisioning
Section titled “Reduce Infrastructure Overprovisioning”- Utilization Monitoring: Track resource utilization changes over time
- Data Movement: Move infrequently used data from data warehouse to data lake
- Query Optimization: Use Redshift Spectrum to query S3 data without movement
- Storage Tiering: Use Athena to query data at rest in Amazon S3
Cost-Performance Balance
Section titled “Cost-Performance Balance”Consider data retention periods when making storage decisions, balancing cost efficiency with query performance requirements.
Reliability Pillar
Section titled “Reliability Pillar”Best Practice: Design Resilience for Analytics Workloads
Section titled “Best Practice: Design Resilience for Analytics Workloads”Implementation: “Understand the business requirements of analytics and ETL jobs”
The reliability pillar encompasses workload ability to perform intended functions correctly and consistently when expected.
Data Movement Patterns
Section titled “Data Movement Patterns”Extract → Transform → Load
- Transforms data before loading into target
- Good for structured data with known requirements
Extract → Load → Transform
- Loads data into central repository first
- Transforms data when needed for analysis
Extract → Transform → Load → Transform
- Hybrid approach combining both patterns
- Initial transform for entry quality criteria
- Later transforms for specific analysis needs
Business Requirements Alignment
Section titled “Business Requirements Alignment”Understanding business requirements helps determine appropriate patterns for moving data from source systems to target data stores, ensuring reliability matches business needs.
Implementation Strategy
Section titled “Implementation Strategy”Continuous Improvement Process
Section titled “Continuous Improvement Process”- Regular Evaluation: Continually assess whether solutions fit analytics workloads
- Best Practice Implementation: Add as many best practices as possible over time
- Pillar Prioritization: Order pillars by importance for specific use cases
- Iterative Refinement: Refine and improve systems over entire lifecycle
Framework Integration
Section titled “Framework Integration”The AWS Well-Architected Framework supplies foundational questions to understand if architecture aligns with cloud best practices, helping evaluate trade-offs in workload design, operation, and maintenance.
Practical Application
Section titled “Practical Application”Tools and techniques learned throughout this module align with Well-Architected Framework pillars, providing practical implementation guidance for data engineering patterns.
Applying the AWS Well-Architected Framework to data pipelines requires continuous evaluation across security, performance efficiency, cost optimization, and reliability pillars. Implementation focuses on least privilege access, appropriate tool selection, cost management, and resilient data movement patterns.