Cluster Deployment Plugin
The Cluster plugin provides multi-node clustering capabilities for Monibuca, supporting Origin-Edge architecture, automatic node discovery, automatic stream relay, intelligent load balancing, and failover. Inter-node communication is based on the QUIC protocol for low-latency and highly reliable transport.
Enabling the Plugin
Section titled “Enabling the Plugin”features = ["cluster"]Architecture Overview
Section titled “Architecture Overview”┌─────────────┐ QUIC ┌─────────────┐│ Origin │◄────────────►│ Edge-1 ││ (node-1) │ └─────────────┘│ │ QUIC ┌─────────────┐│ Push ingest│◄────────────►│ Edge-2 │└─────────────┘ └─────────────┘ QUIC ┌─────────────┐ ◄────────────►│ Edge-3 │ └─────────────┘- Origin Node: Receives push streams, serves as the stream source
- Edge Node: Receives pull requests; automatically pulls from the Origin when a stream is not available locally
- Each node can act as both Origin and Edge — the role is determined by the actual location of the stream
Configuration
Section titled “Configuration”cluster: sync: enabled: true server_id: "node-1" # Unique node identifier address: "192.168.1.10:8001" # This node's communication address seed_servers: # Seed server list - "192.168.1.10:8001" - "192.168.1.11:8001" - "192.168.1.12:8001" heartbeat_interval: 5 # Heartbeat interval (seconds) sync_interval: 30 # Full sync interval (seconds) heartbeat_ttl: 15 # Heartbeat timeout (seconds) suspect_threshold: 2 # Missed heartbeats before marking as suspect offline_threshold: 3 # Missed heartbeats before marking as offline
routing: cpu_threshold: 85.0 # CPU usage threshold for triggering redirect (%) bandwidth_threshold: 8000.0 # Bandwidth threshold for triggering redirect (Mbps) subscriber_threshold: 2000 # Subscriber count threshold for triggering redirect
allocation: ticket_ttl: 30 # Allocation ticket TTL (seconds) candidate_limit: 3 # Number of candidate nodes per allocation
relay: establish_timeout: 10 # Relay connection establishment timeout (seconds) release_delay: 10 # Idle relay release delay (seconds) max_retry_attempts: 3 # Maximum retry attempts retry_delay: 2 # Retry interval (seconds) health_check_interval: 5 # Health check interval (seconds)
role: [] # Role configuration (optional)Key Configuration Details
Section titled “Key Configuration Details”sync — Node Synchronization
Section titled “sync — Node Synchronization”| Field | Description |
|---|---|
server_id | Unique node identifier in the cluster; use meaningful names like origin-1 or edge-beijing-1 |
address | This node’s gRPC communication address; other nodes connect via this address |
seed_servers | List of seed server addresses; new nodes discover other cluster members through seed nodes |
heartbeat_interval | Heartbeat send interval for probing node liveness |
suspect_threshold | After this many consecutive missed heartbeat responses, the node is marked as Suspect |
offline_threshold | After this many consecutive missed heartbeat responses, the node is marked as Offline, triggering session invalidation and relay cleanup |
routing — Routing Strategy
Section titled “routing — Routing Strategy”When a node’s load exceeds the threshold, the Redirect Manager redirects new subscription requests to less loaded nodes:
- CPU threshold: Redirect is triggered when node CPU usage exceeds
cpu_threshold - Bandwidth threshold: Redirect is triggered when node egress bandwidth exceeds
bandwidth_threshold - Subscriber threshold: Redirect is triggered when the subscriber count on a single node exceeds
subscriber_threshold
relay — Stream Relay
Section titled “relay — Stream Relay”When an Edge node pulls a stream from an Origin node, a connection is established through the Relay mechanism. Configuration options control connection timeout, retry strategy, and health check frequency.
Core Components
Section titled “Core Components”The Cluster plugin consists of the following managers working together:
| Component | Responsibility |
|---|---|
| SyncManager | Node discovery, heartbeat detection, state synchronization |
| TransportManager | QUIC transport layer management, maintaining inter-node connections |
| SessionRegistry | Stream session registry, tracking which node publishes which stream |
| RelayManager | Stream relay management, establishing and maintaining cross-node stream transport |
| AllocationManager | Resource allocation, selecting the optimal node for new requests |
| RedirectManager | Load balancing redirect decisions |
| ClusterHooks | Event hooks, responding to stream publish/unpublish/subscribe/unsubscribe events |
| ClusterGrpcServer | gRPC server, handling inter-node communication requests |
Workflow
Section titled “Workflow”Stream Publishing
Section titled “Stream Publishing”- A publisher pushes a stream to the Origin node
- ClusterHooks listens for
StreamEvent::Createdand registers stream information in the SessionRegistry - SyncManager synchronizes stream information to other cluster nodes via heartbeats
Stream Subscription (Edge Pull-from-Origin)
Section titled “Stream Subscription (Edge Pull-from-Origin)”- A pull request arrives at an Edge node
- The Edge does not find the stream locally and queries the SessionRegistry to determine the Origin node
- RelayManager establishes a Relay connection to the Origin via QUIC
- Stream data from the Origin is transmitted to the Edge via Relay, and the Edge distributes it to clients
Node Failure
Section titled “Node Failure”- SyncManager detects a node’s heartbeat timeout
- After reaching
suspect_threshold, the node is marked as suspect; after reachingoffline_threshold, it is marked as offline - SessionRegistry invalidates all stream sessions for that node
- RelayManager cleans up all Relay connections for that node
- When clients make new requests, AllocationManager assigns an available node
Deployment Examples
Section titled “Deployment Examples”Minimal Cluster (1 Origin + 2 Edge)
Section titled “Minimal Cluster (1 Origin + 2 Edge)”Origin node configuration (node-1):
cluster: sync: enabled: true server_id: "origin-1" address: "10.0.0.1:8001" seed_servers: ["10.0.0.1:8001"]Edge node configuration (node-2):
cluster: sync: enabled: true server_id: "edge-1" address: "10.0.0.2:8001" seed_servers: ["10.0.0.1:8001"]Edge node configuration (node-3):
cluster: sync: enabled: true server_id: "edge-2" address: "10.0.0.3:8001" seed_servers: ["10.0.0.1:8001"]For more detailed cluster architecture design, refer to Cluster Architecture.