Skip to content
Pablo Rodriguez

Orchestrating Microservices Step Functions

Orchestrating Microservices with AWS Step Functions

Section titled “Orchestrating Microservices with AWS Step Functions”
  • Microservice Dependencies: Handling dependencies between thousands of microservices
  • Sequential/Parallel Chaining: Coordinating microservices in sequential or parallel patterns
  • Scaling Complexity: Managing application growth and component increases
  • Data Flow: Passing data between microservices and maintaining state
  • Retry Logic: Handling retries after timeouts or errors
  • Function Failures: Code exceptions, timeouts, memory issues, runtime errors
  • Repeated Processing: Code must handle same event repeatedly without unwanted effects
  • Resource Management: Managing resources and database writes during retry scenarios
  • Application Visibility: Need for troubleshooting errors and tracking performance
  • State Management: Tracking current workflow steps and storing inter-step data
  • Automatic Scaling: Coordination layer must scale with changing workloads
  • Serverless Orchestration: Manages workflows between multiple AWS services
  • State Machines: Workflows containing series of event-driven states (steps)
  • Visual Workflows: Coordinate distributed applications and microservices components
  • State Management: Manages workflow state, checkpoints, and restarts
  • Error Handling: Built-in capability for handling errors and retries
  • Input/Output: Pass input data into state machine and receive result data
  • State Data Transfer: Pass data from one state to the next
  • Data Filtering: States can filter data or add result data to input
  • Data Manipulation: Transform and process data between workflow steps
  • Service Integration: Invoke from API Gateway, EventBridge, Lambda, other state machines
  • Nested State Machines: Reduce complexity and reuse common processes
  • Graphical Console: Visualize application components as series of steps

Characteristics:

  • Long-running workflows (up to 1 year)
  • Exactly-once execution model
  • Full execution history in console
  • Asynchronous processing only
  • State persisted on every transition

Pricing: Charged based on number of state transitions

Use Cases:

  • Long-running, auditable workflows
  • Customer order fulfillment (multi-day processes)
  • Visual debugging and execution history needs

Development Tip Developers often debug workflows using Standard Workflows for visual representation and execution history, then copy to Express Workflows for production if workload suits it better.

  • Standard Workflows: Long-running with AWS Fargate integration for containerized applications
  • Synchronous Express: Short-duration, high-volume workflows requiring immediate response
  • Asynchronous Express: Short-duration workflows without immediate response requirements
  • API Gateway Integration: Direct initiation of Synchronous Express Workflows
  • Scalability: Manage millions of concurrent executions with horizontal scaling
  • Fault Tolerance: Reliable workflows with automatic error handling
  • Parallel Processing: Use Parallel state type for concurrent execution
  • Dynamic Parallelism: Map state type for iterating over data objects
  • S3 Integration: Map state can iterate over objects in S3 buckets
  • End-to-End Workflows: Orchestrate complete ML workflows on Amazon SageMaker
  • Data Preprocessing: Data enrichment, feature engineering, data validation
  • Model Operations: Post-processing and model evaluation
  • Workflow Coordination: Manage complex ML pipeline dependencies
  • Routine Operations: Software upgrades, patching, security updates
  • Infrastructure Management: Automated infrastructure selection and deployment
  • Data Synchronization: Automated data routing and synchronization
  • Support Automation: Automated support ticket routing
  • Error Management: Automatic retry with exponential backoff for error handling

Step Functions manages application logic through built-in coordination patterns:

  • Sequential Tasks: Run tasks one after another
  • Parallel Tasks: Execute multiple tasks simultaneously
  • Branching Logic: Select tasks based on input data
  • Error Handling: Built-in try-catch-finally logic
  • Data Record Processing: Process data records in parallel
  • Retry Logic: Automatically retry failed tasks
  • Timeout Management: Handle tasks taking seconds or months
  • Recovery: Graceful recovery with cleanup and recovery code

Benefit Removes repeated coordination code from microservices and functions, centralizing workflow logic.

  • Task: Integrates with AWS services or calls HTTP endpoints
  • Activity: Performs tasks hosted anywhere with HTTP connection capability
  • Pass: Passes or filters input data to next state without performing work
  • Wait: Delays workflow for specified time (relative or absolute)
  • Callback Option: State can wait for callback with task token
  • Choice: Adds conditional logic to control flow to next state
    • Comparison operators for input variables
    • Example: Compare if input variable greater/less than 50
  • Parallel: Adds branches of nested state machines inside a state machine
  • Map: Separates workflow for each data record in dataset running in parallel
    • Supports JSON arrays, S3 object lists, CSV files
  • Success: Stops state machine and marks execution as successful
  • Fail: Stops state machine and marks execution as failed
  • End Parameter: Any state can include end parameter to stop state machine

State machines are defined using Amazon States Language (JSON-based structured language):

state_machine.json
{
"StartAt": "1 Task state",
"Comment": "Sample state machine",
"States": {
"1 Task state": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "ALambdaFunction"
},
"Next": "2 Success state"
},
"2 Success state": {
"Type": "Succeed"
}
}
}
  • StartAt: Specifies the first state to run
  • States: List of all states in state machine
  • Task State: Work task state invoking a Lambda function
  • Next: Pointer to next state to run
  • Success State: Stop success state that ends the state machine
  • Uses key-value pairs to specify fields and values
  • Always specifies which state to run first
  • Contains nested JSON documents for each state
  • Each state has name identifier and Type field
  • Next Field: Pass control to another state (Choice states have multiple Next fields)
  • End Field: Mark as last state OR pass control to success/fail states

A Step Functions Standard Workflow state machine example for stock trading:

  1. Stock Analysis: Lambda function returns company ID, stock count, price, and trade amount
  2. Trade Recommendation Check: Choice state checks for recommended trade
    • If no recommendation → Success state (end)
    • If recommendation exists → Continue to approval process
  3. Human Approval Required: Choice state checks trade amount (e.g., >$100)
    • If exceeds limit → Request human approval task
    • If within limit → Continue to trade execution
  4. Approval Process:
    • Amazon SNS sends email with task token to approver
    • Callback returns approval/denial response
  5. Trade Execution: Based on approval decision
    • Approved: Execute trade transaction via Lambda
    • Not Approved: Trade not approved fail state
  6. Transaction Verification: Check transaction success
    • Successful: Trade successful state (end)
    • Unsuccessful: Trade unsuccessful fail state (end)

This workflow demonstrates real-world business process automation with human approval integration, error handling, and multiple decision points.

AWS Step Functions provides serverless orchestration for managing workflows between multiple AWS services. State machines are collections of event-driven states defined in Amazon States Language, with states grouped into work, transition, and stop categories. Task states can invoke AWS services or request activities hosted on any compute service with HTTP connection capability.