Orchestrating Microservices Step Functions
Orchestrating Microservices with AWS Step Functions
Section titled “Orchestrating Microservices with AWS Step Functions”Challenges with Microservice Applications
Section titled “Challenges with Microservice Applications”Dependencies and Coordination
Section titled “Dependencies and Coordination”- Microservice Dependencies: Handling dependencies between thousands of microservices
- Sequential/Parallel Chaining: Coordinating microservices in sequential or parallel patterns
- Scaling Complexity: Managing application growth and component increases
- Data Flow: Passing data between microservices and maintaining state
Error Scenarios
Section titled “Error Scenarios”- Retry Logic: Handling retries after timeouts or errors
- Function Failures: Code exceptions, timeouts, memory issues, runtime errors
- Repeated Processing: Code must handle same event repeatedly without unwanted effects
- Resource Management: Managing resources and database writes during retry scenarios
Monitoring and Troubleshooting
Section titled “Monitoring and Troubleshooting”- Application Visibility: Need for troubleshooting errors and tracking performance
- State Management: Tracking current workflow steps and storing inter-step data
- Automatic Scaling: Coordination layer must scale with changing workloads
AWS Step Functions
Section titled “AWS Step Functions”Core Capabilities
Section titled “Core Capabilities”- Serverless Orchestration: Manages workflows between multiple AWS services
- State Machines: Workflows containing series of event-driven states (steps)
- Visual Workflows: Coordinate distributed applications and microservices components
- State Management: Manages workflow state, checkpoints, and restarts
- Error Handling: Built-in capability for handling errors and retries
Data Processing Features
Section titled “Data Processing Features”- Input/Output: Pass input data into state machine and receive result data
- State Data Transfer: Pass data from one state to the next
- Data Filtering: States can filter data or add result data to input
- Data Manipulation: Transform and process data between workflow steps
Integration and Nesting
Section titled “Integration and Nesting”- Service Integration: Invoke from API Gateway, EventBridge, Lambda, other state machines
- Nested State Machines: Reduce complexity and reuse common processes
- Graphical Console: Visualize application components as series of steps
Workflow Types Comparison
Section titled “Workflow Types Comparison”Characteristics:
- Long-running workflows (up to 1 year)
- Exactly-once execution model
- Full execution history in console
- Asynchronous processing only
- State persisted on every transition
Pricing: Charged based on number of state transitions
Use Cases:
- Long-running, auditable workflows
- Customer order fulfillment (multi-day processes)
- Visual debugging and execution history needs
Characteristics:
- Short-running (up to 5 minutes)
- Synchronous: At-least-once execution
- Asynchronous: At-most-once execution
- Results in CloudWatch Logs
- No persisted state on every transition
Pricing: Charged based on number of requests and duration
Types:
- Synchronous: Wait for completion, return result (microservice orchestration)
- Asynchronous: Return confirmation, poll logs for results (messaging, data processing)
Use Cases:
- High-event-rate workloads
- Streaming data processing
- IoT data ingestion
Development Tip Developers often debug workflows using Standard Workflows for visual representation and execution history, then copy to Express Workflows for production if workload suits it better.
Step Functions Use Cases
Section titled “Step Functions Use Cases”Microservice Orchestrations
Section titled “Microservice Orchestrations”- Standard Workflows: Long-running with AWS Fargate integration for containerized applications
- Synchronous Express: Short-duration, high-volume workflows requiring immediate response
- Asynchronous Express: Short-duration workflows without immediate response requirements
- API Gateway Integration: Direct initiation of Synchronous Express Workflows
Data Processing
Section titled “Data Processing”- Scalability: Manage millions of concurrent executions with horizontal scaling
- Fault Tolerance: Reliable workflows with automatic error handling
- Parallel Processing: Use Parallel state type for concurrent execution
- Dynamic Parallelism: Map state type for iterating over data objects
- S3 Integration: Map state can iterate over objects in S3 buckets
Machine Learning
Section titled “Machine Learning”- End-to-End Workflows: Orchestrate complete ML workflows on Amazon SageMaker
- Data Preprocessing: Data enrichment, feature engineering, data validation
- Model Operations: Post-processing and model evaluation
- Workflow Coordination: Manage complex ML pipeline dependencies
Security Automation
Section titled “Security Automation”- Routine Operations: Software upgrades, patching, security updates
- Infrastructure Management: Automated infrastructure selection and deployment
- Data Synchronization: Automated data routing and synchronization
- Support Automation: Automated support ticket routing
- Error Management: Automatic retry with exponential backoff for error handling
Workflow State Coordination
Section titled “Workflow State Coordination”Step Functions manages application logic through built-in coordination patterns:
Sequential and Parallel Execution
Section titled “Sequential and Parallel Execution”- Sequential Tasks: Run tasks one after another
- Parallel Tasks: Execute multiple tasks simultaneously
- Branching Logic: Select tasks based on input data
- Error Handling: Built-in try-catch-finally logic
Advanced Coordination
Section titled “Advanced Coordination”- Data Record Processing: Process data records in parallel
- Retry Logic: Automatically retry failed tasks
- Timeout Management: Handle tasks taking seconds or months
- Recovery: Graceful recovery with cleanup and recovery code
Benefit Removes repeated coordination code from microservices and functions, centralizing workflow logic.
State Machine State Types
Section titled “State Machine State Types”Work States
Section titled “Work States”- Task: Integrates with AWS services or calls HTTP endpoints
- Activity: Performs tasks hosted anywhere with HTTP connection capability
- Pass: Passes or filters input data to next state without performing work
- Wait: Delays workflow for specified time (relative or absolute)
- Callback Option: State can wait for callback with task token
Transition States
Section titled “Transition States”- Choice: Adds conditional logic to control flow to next state
- Comparison operators for input variables
- Example: Compare if input variable greater/less than 50
- Parallel: Adds branches of nested state machines inside a state machine
- Map: Separates workflow for each data record in dataset running in parallel
- Supports JSON arrays, S3 object lists, CSV files
Stop States
Section titled “Stop States”- Success: Stops state machine and marks execution as successful
- Fail: Stops state machine and marks execution as failed
- End Parameter: Any state can include end parameter to stop state machine
Amazon States Language
Section titled “Amazon States Language”State machines are defined using Amazon States Language (JSON-based structured language):
{"StartAt": "1 Task state","Comment": "Sample state machine","States": { "1 Task state": { "Type": "Task", "Resource": "arn:aws:states:::lambda:invoke", "Parameters": { "FunctionName": "ALambdaFunction" }, "Next": "2 Success state" }, "2 Success state": { "Type": "Succeed" }}}
Key Components
Section titled “Key Components”- StartAt: Specifies the first state to run
- States: List of all states in state machine
- Task State: Work task state invoking a Lambda function
- Next: Pointer to next state to run
- Success State: Stop success state that ends the state machine
State Machine Structure
Section titled “State Machine Structure”- Uses key-value pairs to specify fields and values
- Always specifies which state to run first
- Contains nested JSON documents for each state
- Each state has name identifier and Type field
- Next Field: Pass control to another state (Choice states have multiple Next fields)
- End Field: Mark as last state OR pass control to success/fail states
Example: Stock Trade Workflow
Section titled “Example: Stock Trade Workflow”A Step Functions Standard Workflow state machine example for stock trading:
- Stock Analysis: Lambda function returns company ID, stock count, price, and trade amount
- Trade Recommendation Check: Choice state checks for recommended trade
- If no recommendation → Success state (end)
- If recommendation exists → Continue to approval process
- Human Approval Required: Choice state checks trade amount (e.g., >$100)
- If exceeds limit → Request human approval task
- If within limit → Continue to trade execution
- Approval Process:
- Amazon SNS sends email with task token to approver
- Callback returns approval/denial response
- Trade Execution: Based on approval decision
- Approved: Execute trade transaction via Lambda
- Not Approved: Trade not approved fail state
- Transaction Verification: Check transaction success
- Successful: Trade successful state (end)
- Unsuccessful: Trade unsuccessful fail state (end)
This workflow demonstrates real-world business process automation with human approval integration, error handling, and multiple decision points.
AWS Step Functions provides serverless orchestration for managing workflows between multiple AWS services. State machines are collections of event-driven states defined in Amazon States Language, with states grouped into work, transition, and stop categories. Task states can invoke AWS services or request activities hosted on any compute service with HTTP connection capability.