Analysis Visualization
Analysis and Visualization
Section titled “Analysis and Visualization”Analysis Tool Selection by Role
Section titled “Analysis Tool Selection by Role”Different roles require different analysis capabilities and tools:
Business Analyst
Section titled “Business Analyst”- Need: Design interactive data dashboards with charts and graphs
- Sharing: Share dashboards with business stakeholders
- Tool: Amazon QuickSight for business intelligence and visualization
Data Scientist/Engineer
Section titled “Data Scientist/Engineer”- Need: Use SQL queries to interactively search through customer activity files in data lake
- Purpose: Solve one-off customer problems with flexible data exploration
- Tool: Amazon Athena for serverless SQL querying
DevOps Engineer
Section titled “DevOps Engineer”- Need: Build real-time application monitoring dashboards
- Focus: Monitor system performance and detect issues proactively
- Tool: Amazon OpenSearch Dashboards for real-time monitoring
Amazon QuickSight
Section titled “Amazon QuickSight”Core Capabilities
Section titled “Core Capabilities”- Function: Business intelligence (BI) tool for scalable analytics
- Scale: Support hundreds of thousands of users
- Performance: Fast, responsive visualizations using SPICE in-memory engine
Dashboard Features
Section titled “Dashboard Features”- Data Dashboard: Collection of charts, graphs, and insights like a digital data newspaper
- Interactivity: Ability to interact with dashboard elements
- AutoGraph: Automatically chooses appropriate graph/chart type for selected data
- Natural Language: QuickSight Q search bar supports natural language queries with visualization results
Sharing and Distribution
Section titled “Sharing and Distribution”- Publishing: Share dashboards by publishing or sharing links
- Embedding: Embed dashboards in customer web or mobile applications
- Scheduled Reports: Send dashboard reports via email (Enterprise edition)
Data Source Connectivity
Section titled “Data Source Connectivity”AWS Sources: Amazon RDS, Aurora, Amazon Redshift, Athena, Amazon S3
File Uploads: Excel spreadsheets, flat files (CSV, TSV, CLF, ELF)
On-Premises: SQL Server, MySQL, PostgreSQL databases
SaaS Applications: Salesforce and other SaaS platforms
Amazon Athena
Section titled “Amazon Athena”Core Capabilities
Section titled “Core Capabilities”- Function: Data query engine and schema metadata store
- Query Language: Standard SQL for structured and unstructured data
- Architecture: Built on Trino and Apache Presto
- Pricing: Pay-per-query model, serverless infrastructure
Supported Data Formats
Section titled “Supported Data Formats”- CSV, JSON, Apache ORC, Apache Parquet, Apache Avro
- Direct analysis of data in Amazon S3 using standard SQL
Integration Features
Section titled “Integration Features”- Data Catalog: Integrates with AWS Glue Data Catalog for metadata management
- Persistent Metadata: Central metadata store available throughout AWS account
- Federated Queries: Connect multiple data sources for unified querying
Data Source Connectors
Section titled “Data Source Connectors”AWS Sources: Amazon RDS, DynamoDB, Amazon MSK, OpenSearch Service, Amazon Redshift
On-Premises: SAP HANA, Db2 databases
Other Clouds: Azure Data Lake Storage
Connectivity: JDBC and ODBC drivers for BI tools like QuickSight
Athena for Apache Spark
Section titled “Athena for Apache Spark”Additional feature providing Apache Spark capabilities for big data parallel workload processing, similar to AWS Glue and Amazon EMR ETL jobs.
Amazon OpenSearch Service
Section titled “Amazon OpenSearch Service”Core Capabilities
Section titled “Core Capabilities”- Function: Managed serverless service for Apache OpenSearch use cases
- Use Cases: Interactive log analytics, real-time application monitoring, website searches
- Architecture: Data storage with indexing for search and analysis
OpenSearch Dashboards
Section titled “OpenSearch Dashboards”- Included: Every OpenSearch Service domain includes dashboard installation
- Visualization: Donut charts, area charts, event timelines, success/failure metrics
- Real-Time: Live application monitoring capabilities
Advanced Features
Section titled “Advanced Features”- Alerting: Set up notifications when data exceeds thresholds
- Anomaly Detection: ML-powered automatic outlier detection in streaming data
- Combined Monitoring: Pair anomaly detection with alerting for immediate notification
High Availability
Section titled “High Availability”- Multi-AZ with Standby: Deployment option for business-critical workloads
- Resilience: Protection against infrastructure failures, node drops, AZ failures
- Management: Simplified cluster configuration with enforced best practices
Use Case Example: Café Clickstream Analysis
Section titled “Use Case Example: Café Clickstream Analysis”Requirements
Section titled “Requirements”Data analyst builds dashboard report on website user clickstream activity, shared via URL link.
Implementation Steps
Section titled “Implementation Steps”- Configuration: Data analyst uses QuickSight to configure security permissions for Athena and café clickstream S3 bucket
- Query Processing: Athena saves QuickSight-generated queries using Data Catalog in query result bucket
- Publishing: Data analyst publishes dashboard and sends URL to café owners
- Access: Café owners view dashboard using provided URL link
Architecture Benefits
Section titled “Architecture Benefits”- Separation of Concerns: Athena handles querying, QuickSight handles visualization
- Scalability: Serverless components scale automatically with usage
- Cost Efficiency: Pay-per-query model for occasional analysis needs
Security and Access Control
Section titled “Security and Access Control”Principle of Least Privilege
Section titled “Principle of Least Privilege”- Implementation: Give only enough access for systems to perform required jobs
- User Permissions: Identify minimum privileges each user requires
- Example: Business analyst reading Amazon Redshift table gets only read permission for specific table
Service-Specific Security
Section titled “Service-Specific Security”- QuickSight: Dashboard sharing permissions and data source access controls
- Athena: Query permissions and data source access through IAM
- OpenSearch: Domain access controls and index-level permissions
Tool Selection Guidelines
Section titled “Tool Selection Guidelines”Choose analysis and visualization tools based on:
- User Role: Technical expertise and specific responsibilities
- Use Case: Real-time monitoring vs batch reporting vs ad-hoc analysis
- Data Sources: Integration requirements with existing data infrastructure
- Scalability: Number of users and query frequency
- Cost Model: Pay-per-query vs fixed infrastructure costs
QuickSight provides business intelligence dashboards, Athena enables serverless SQL querying, and OpenSearch Service offers real-time monitoring capabilities. Each tool serves different analysis needs while integrating with the broader AWS data ecosystem.