Skip to content

Cluster Deployment Plugin

The Cluster plugin provides multi-node clustering capabilities for Monibuca, supporting Origin-Edge architecture, automatic node discovery, automatic stream relay, intelligent load balancing, and failover. Inter-node communication is based on the QUIC protocol for low-latency and highly reliable transport.

Cargo.toml
features = ["cluster"]
┌─────────────┐ QUIC ┌─────────────┐
│ Origin │◄────────────►│ Edge-1 │
│ (node-1) │ └─────────────┘
│ │ QUIC ┌─────────────┐
│ Push ingest│◄────────────►│ Edge-2 │
└─────────────┘ └─────────────┘
QUIC ┌─────────────┐
◄────────────►│ Edge-3 │
└─────────────┘
  • Origin Node: Receives push streams, serves as the stream source
  • Edge Node: Receives pull requests; automatically pulls from the Origin when a stream is not available locally
  • Each node can act as both Origin and Edge — the role is determined by the actual location of the stream
cluster:
sync:
enabled: true
server_id: "node-1" # Unique node identifier
address: "192.168.1.10:8001" # This node's communication address
seed_servers: # Seed server list
- "192.168.1.10:8001"
- "192.168.1.11:8001"
- "192.168.1.12:8001"
heartbeat_interval: 5 # Heartbeat interval (seconds)
sync_interval: 30 # Full sync interval (seconds)
heartbeat_ttl: 15 # Heartbeat timeout (seconds)
suspect_threshold: 2 # Missed heartbeats before marking as suspect
offline_threshold: 3 # Missed heartbeats before marking as offline
routing:
cpu_threshold: 85.0 # CPU usage threshold for triggering redirect (%)
bandwidth_threshold: 8000.0 # Bandwidth threshold for triggering redirect (Mbps)
subscriber_threshold: 2000 # Subscriber count threshold for triggering redirect
allocation:
ticket_ttl: 30 # Allocation ticket TTL (seconds)
candidate_limit: 3 # Number of candidate nodes per allocation
relay:
establish_timeout: 10 # Relay connection establishment timeout (seconds)
release_delay: 10 # Idle relay release delay (seconds)
max_retry_attempts: 3 # Maximum retry attempts
retry_delay: 2 # Retry interval (seconds)
health_check_interval: 5 # Health check interval (seconds)
role: [] # Role configuration (optional)
FieldDescription
server_idUnique node identifier in the cluster; use meaningful names like origin-1 or edge-beijing-1
addressThis node’s gRPC communication address; other nodes connect via this address
seed_serversList of seed server addresses; new nodes discover other cluster members through seed nodes
heartbeat_intervalHeartbeat send interval for probing node liveness
suspect_thresholdAfter this many consecutive missed heartbeat responses, the node is marked as Suspect
offline_thresholdAfter this many consecutive missed heartbeat responses, the node is marked as Offline, triggering session invalidation and relay cleanup

When a node’s load exceeds the threshold, the Redirect Manager redirects new subscription requests to less loaded nodes:

  • CPU threshold: Redirect is triggered when node CPU usage exceeds cpu_threshold
  • Bandwidth threshold: Redirect is triggered when node egress bandwidth exceeds bandwidth_threshold
  • Subscriber threshold: Redirect is triggered when the subscriber count on a single node exceeds subscriber_threshold

When an Edge node pulls a stream from an Origin node, a connection is established through the Relay mechanism. Configuration options control connection timeout, retry strategy, and health check frequency.

The Cluster plugin consists of the following managers working together:

ComponentResponsibility
SyncManagerNode discovery, heartbeat detection, state synchronization
TransportManagerQUIC transport layer management, maintaining inter-node connections
SessionRegistryStream session registry, tracking which node publishes which stream
RelayManagerStream relay management, establishing and maintaining cross-node stream transport
AllocationManagerResource allocation, selecting the optimal node for new requests
RedirectManagerLoad balancing redirect decisions
ClusterHooksEvent hooks, responding to stream publish/unpublish/subscribe/unsubscribe events
ClusterGrpcServergRPC server, handling inter-node communication requests
  1. A publisher pushes a stream to the Origin node
  2. ClusterHooks listens for StreamEvent::Created and registers stream information in the SessionRegistry
  3. SyncManager synchronizes stream information to other cluster nodes via heartbeats

Stream Subscription (Edge Pull-from-Origin)

Section titled “Stream Subscription (Edge Pull-from-Origin)”
  1. A pull request arrives at an Edge node
  2. The Edge does not find the stream locally and queries the SessionRegistry to determine the Origin node
  3. RelayManager establishes a Relay connection to the Origin via QUIC
  4. Stream data from the Origin is transmitted to the Edge via Relay, and the Edge distributes it to clients
  1. SyncManager detects a node’s heartbeat timeout
  2. After reaching suspect_threshold, the node is marked as suspect; after reaching offline_threshold, it is marked as offline
  3. SessionRegistry invalidates all stream sessions for that node
  4. RelayManager cleans up all Relay connections for that node
  5. When clients make new requests, AllocationManager assigns an available node

Origin node configuration (node-1):

cluster:
sync:
enabled: true
server_id: "origin-1"
address: "10.0.0.1:8001"
seed_servers: ["10.0.0.1:8001"]

Edge node configuration (node-2):

cluster:
sync:
enabled: true
server_id: "edge-1"
address: "10.0.0.2:8001"
seed_servers: ["10.0.0.1:8001"]

Edge node configuration (node-3):

cluster:
sync:
enabled: true
server_id: "edge-2"
address: "10.0.0.3:8001"
seed_servers: ["10.0.0.1:8001"]

For more detailed cluster architecture design, refer to Cluster Architecture.