Skip to content
Pablo Rodriguez

Analysis Visualization

Different roles require different analysis capabilities and tools:

  • Need: Design interactive data dashboards with charts and graphs
  • Sharing: Share dashboards with business stakeholders
  • Tool: Amazon QuickSight for business intelligence and visualization
  • Need: Use SQL queries to interactively search through customer activity files in data lake
  • Purpose: Solve one-off customer problems with flexible data exploration
  • Tool: Amazon Athena for serverless SQL querying
  • Need: Build real-time application monitoring dashboards
  • Focus: Monitor system performance and detect issues proactively
  • Tool: Amazon OpenSearch Dashboards for real-time monitoring
  • Function: Business intelligence (BI) tool for scalable analytics
  • Scale: Support hundreds of thousands of users
  • Performance: Fast, responsive visualizations using SPICE in-memory engine
  • Data Dashboard: Collection of charts, graphs, and insights like a digital data newspaper
  • Interactivity: Ability to interact with dashboard elements
  • AutoGraph: Automatically chooses appropriate graph/chart type for selected data
  • Natural Language: QuickSight Q search bar supports natural language queries with visualization results
  • Publishing: Share dashboards by publishing or sharing links
  • Embedding: Embed dashboards in customer web or mobile applications
  • Scheduled Reports: Send dashboard reports via email (Enterprise edition)

AWS Sources: Amazon RDS, Aurora, Amazon Redshift, Athena, Amazon S3

File Uploads: Excel spreadsheets, flat files (CSV, TSV, CLF, ELF)

On-Premises: SQL Server, MySQL, PostgreSQL databases

SaaS Applications: Salesforce and other SaaS platforms

  • Function: Data query engine and schema metadata store
  • Query Language: Standard SQL for structured and unstructured data
  • Architecture: Built on Trino and Apache Presto
  • Pricing: Pay-per-query model, serverless infrastructure
  • CSV, JSON, Apache ORC, Apache Parquet, Apache Avro
  • Direct analysis of data in Amazon S3 using standard SQL
  • Data Catalog: Integrates with AWS Glue Data Catalog for metadata management
  • Persistent Metadata: Central metadata store available throughout AWS account
  • Federated Queries: Connect multiple data sources for unified querying

AWS Sources: Amazon RDS, DynamoDB, Amazon MSK, OpenSearch Service, Amazon Redshift

On-Premises: SAP HANA, Db2 databases

Other Clouds: Azure Data Lake Storage

Connectivity: JDBC and ODBC drivers for BI tools like QuickSight

Additional feature providing Apache Spark capabilities for big data parallel workload processing, similar to AWS Glue and Amazon EMR ETL jobs.

  • Function: Managed serverless service for Apache OpenSearch use cases
  • Use Cases: Interactive log analytics, real-time application monitoring, website searches
  • Architecture: Data storage with indexing for search and analysis
  • Included: Every OpenSearch Service domain includes dashboard installation
  • Visualization: Donut charts, area charts, event timelines, success/failure metrics
  • Real-Time: Live application monitoring capabilities
  • Alerting: Set up notifications when data exceeds thresholds
  • Anomaly Detection: ML-powered automatic outlier detection in streaming data
  • Combined Monitoring: Pair anomaly detection with alerting for immediate notification
  • Multi-AZ with Standby: Deployment option for business-critical workloads
  • Resilience: Protection against infrastructure failures, node drops, AZ failures
  • Management: Simplified cluster configuration with enforced best practices

Use Case Example: Café Clickstream Analysis

Section titled “Use Case Example: Café Clickstream Analysis”

Data analyst builds dashboard report on website user clickstream activity, shared via URL link.

  1. Configuration: Data analyst uses QuickSight to configure security permissions for Athena and café clickstream S3 bucket
  2. Query Processing: Athena saves QuickSight-generated queries using Data Catalog in query result bucket
  3. Publishing: Data analyst publishes dashboard and sends URL to café owners
  4. Access: Café owners view dashboard using provided URL link
  • Separation of Concerns: Athena handles querying, QuickSight handles visualization
  • Scalability: Serverless components scale automatically with usage
  • Cost Efficiency: Pay-per-query model for occasional analysis needs
  • Implementation: Give only enough access for systems to perform required jobs
  • User Permissions: Identify minimum privileges each user requires
  • Example: Business analyst reading Amazon Redshift table gets only read permission for specific table
  • QuickSight: Dashboard sharing permissions and data source access controls
  • Athena: Query permissions and data source access through IAM
  • OpenSearch: Domain access controls and index-level permissions

Choose analysis and visualization tools based on:

  • User Role: Technical expertise and specific responsibilities
  • Use Case: Real-time monitoring vs batch reporting vs ad-hoc analysis
  • Data Sources: Integration requirements with existing data infrastructure
  • Scalability: Number of users and query frequency
  • Cost Model: Pay-per-query vs fixed infrastructure costs

QuickSight provides business intelligence dashboards, Athena enables serverless SQL querying, and OpenSearch Service offers real-time monitoring capabilities. Each tool serves different analysis needs while integrating with the broader AWS data ecosystem.