Skip to content

Cluster Deployment Plugin

The Cluster plugin provides multi-node clustering capabilities for Monibuca, supporting Origin-Edge architecture, automatic node discovery, automatic stream relay, intelligent load balancing, and failover. Inter-node communication is based on the QUIC protocol for low-latency and highly reliable transport.

Set enable: true in config.yaml (included in official binaries).

┌─────────────┐ QUIC ┌─────────────┐
│ Origin │◄────────────►│ Edge-1 │
│ (node-1) │ └─────────────┘
│ │ QUIC ┌─────────────┐
│ Push ingest│◄────────────►│ Edge-2 │
└─────────────┘ └─────────────┘
QUIC ┌─────────────┐
◄────────────►│ Edge-3 │
└─────────────┘
  • Origin Node: Receives push streams, serves as the stream source
  • Edge Node: Receives pull requests; automatically pulls from the Origin when a stream is not available locally
  • Each node can act as both Origin and Edge — the role is determined by the actual location of the stream
cluster:
sync:
enabled: true
server_id: "node-1" # Unique node identifier
address: "192.168.1.10:8180" # HTTP control-plane address (same as global.http.listenaddr)
seed_servers: # Seed list (HTTP addresses, not gRPC :50051)
- "192.168.1.10:8180"
- "192.168.1.11:8180"
- "192.168.1.12:8180"
heartbeat_interval: 5 # Heartbeat interval (seconds)
sync_interval: 30 # Full sync interval (seconds)
heartbeat_ttl: 15 # Heartbeat timeout (seconds)
suspect_threshold: 2 # Missed heartbeats before marking as suspect
offline_threshold: 3 # Missed heartbeats before marking as offline
routing:
cpu_threshold: 85.0 # CPU usage threshold for triggering redirect (%)
bandwidth_threshold: 8000.0 # Bandwidth threshold for triggering redirect (Mbps)
subscriber_threshold: 2000 # Subscriber count threshold for triggering redirect
allocation:
ticket_ttl: 30 # Allocation ticket TTL (seconds)
candidate_limit: 3 # Number of candidate nodes per allocation
relay:
establish_timeout: 10 # Relay connection establishment timeout (seconds)
release_delay: 10 # Idle relay release delay (seconds)
max_retry_attempts: 3 # Maximum retry attempts
retry_delay: 2 # Retry interval (seconds)
health_check_interval: 5 # Health check interval (seconds)
role: [] # Role configuration (optional)
FieldDescription
server_idUnique node identifier in the cluster; use meaningful names like origin-1 or edge-beijing-1
addressThis node’s HTTP control-plane address (host:port, usually global.http.listenaddr) for /cluster/api/heartbeat
seed_serversSeed nodes’ HTTP control addresses; do not use global.tcp.listenaddr (gRPC, e.g. :50051)
heartbeat_intervalHeartbeat send interval for probing node liveness
suspect_thresholdAfter this many consecutive missed heartbeat responses, the node is marked as Suspect
offline_thresholdAfter this many consecutive missed heartbeat responses, the node is marked as Offline, triggering session invalidation and relay cleanup

When a node’s load exceeds the threshold, the Redirect Manager redirects new subscription requests to less loaded nodes:

  • CPU threshold: Redirect is triggered when node CPU usage exceeds cpu_threshold
  • Bandwidth threshold: Redirect is triggered when node egress bandwidth exceeds bandwidth_threshold
  • Subscriber threshold: Redirect is triggered when the subscriber count on a single node exceeds subscriber_threshold

When an Edge node pulls a stream from an Origin node, a connection is established through the Relay mechanism. Configuration options control connection timeout, retry strategy, and health check frequency.

The Cluster plugin consists of the following managers working together:

ComponentResponsibility
SyncManagerNode discovery, heartbeat detection, state synchronization
TransportManagerQUIC transport layer management, maintaining inter-node connections
SessionRegistryStream session registry, tracking which node publishes which stream
RelayManagerStream relay management, establishing and maintaining cross-node stream transport
AllocationManagerResource allocation, selecting the optimal node for new requests
RedirectManagerLoad balancing redirect decisions
ClusterHooksEvent hooks, responding to stream publish/unpublish/subscribe/unsubscribe events
Cluster HTTP APIControl-plane REST (/cluster/api/*): heartbeat, nodes, sessions
  1. A publisher pushes a stream to the Origin node
  2. ClusterHooks listens for StreamEvent::Created and registers stream information in the SessionRegistry
  3. SyncManager synchronizes stream information to other cluster nodes via heartbeats

Stream Subscription (Edge Pull-from-Origin)

Section titled “Stream Subscription (Edge Pull-from-Origin)”
  1. A pull request arrives at an Edge node
  2. The Edge does not find the stream locally and queries the SessionRegistry to determine the Origin node
  3. RelayManager establishes a Relay connection to the Origin via QUIC
  4. Stream data from the Origin is transmitted to the Edge via Relay, and the Edge distributes it to clients
  1. SyncManager detects a node’s heartbeat timeout
  2. After reaching suspect_threshold, the node is marked as suspect; after reaching offline_threshold, it is marked as offline
  3. SessionRegistry invalidates all stream sessions for that node
  4. RelayManager cleans up all Relay connections for that node
  5. When clients make new requests, AllocationManager assigns an available node

Origin node configuration (node-1):

cluster:
sync:
enabled: true
server_id: "origin-1"
address: "10.0.0.1:8180"
seed_servers: ["10.0.0.1:8180"]

Edge node configuration (node-2):

cluster:
sync:
enabled: true
server_id: "edge-1"
address: "10.0.0.2:8180"
seed_servers: ["10.0.0.1:8180"]

Edge node configuration (node-3):

cluster:
sync:
enabled: true
server_id: "edge-2"
address: "10.0.0.3:8180"
seed_servers: ["10.0.0.1:8180"]

For more detailed cluster architecture design, refer to Cluster Architecture.

Default base URL: http://localhost:8180, route prefix: /cluster/api/

Standard success response:

{
"code": 0,
"message": "success",
"data": {}
}
GET /cluster/api/status/local

Response (example):

{
"code": 0,
"message": "success",
"data": {
"server_id": "node-1",
"address": "192.168.1.10:8180",
"has_sync": true,
"node_count": 3,
"session_count": 25,
"relay_session_count": 2,
"active_relay_sessions": 1
}
}

Optional sync.internal_api_key: when set, all /cluster/api/* requests must include header X-Cluster-Api-Key (or Authorization: Bearer <key>).

GET /cluster/api/servers

Returns all visible node summaries.

POST /cluster/api/heartbeat
Content-Type: application/json

Request body:

{
"summary": {
"server_id": "node-2",
"address": "192.168.1.11:8180"
}
}
FieldTypeRequiredDescription
summary.server_idstringYesNode ID
summary.addressstringYesNode address

Response data (v6):

{
"summaries": [
{ "server_id": "origin-1", "address": "192.168.1.10:8180", "quic_addr": "..." }
]
}

Edge nodes should merge summaries and sessions from the response, periodically GET /cluster/api/servers and GET /cluster/api/sessions from seeds on sync_interval, push session updates via POST /cluster/api/sessions/sync after publish/unpublish, and push subscribe/unsubscribe deltas via POST /cluster/api/sessions/participation (action: subscribe or unsubscribe).

POST /cluster/api/servers/sync
Content-Type: application/json

Request body:

{
"summaries": [
{ "server_id": "node-1", "address": "192.168.1.10:8180" },
{ "server_id": "node-2", "address": "192.168.1.11:8180" }
]
}
GET /cluster/api/sessions

Returns cluster stream sessions (sorted by stream_path).

POST /cluster/api/sessions/participation
Content-Type: application/json
{
"stream_path": "live/test",
"node_id": "edge-1",
"action": "subscribe"
}

action is subscribe or unsubscribe. Remote node_id values update subscribe_nodes on the receiver. When node_id equals the local server_id and the stream is published on a remote origin, the receiver starts (or reuses) a relay session via attach_or_ensure_relay_session.

Participation events are fanned out to all known peer HTTP addresses from the node catalog (not only seed_servers).

POST /cluster/api/relay/pull
Content-Type: application/json

Request body (Edge → Origin):

{
"request_id": "uuid",
"stream_path": "live/test",
"requestor_id": "edge-1",
"quic_addr": "192.168.1.11:44944",
"session_id": "uuid",
"timestamp": 1710000000000
}

Response data is PullStreamResponse: when accepted is true, Origin accepts the pull and starts QUIC relay; quic_addr is Origin’s QUIC endpoint. Origin also sends FLV sequence headers as a metadata frame before media.

GET /cluster/api/relay/sessions

Returns relay sessions managed on this node (state, subscriber_count, frame_count, source_quic_addr, etc.) for the Admin cluster Relay tab.

POST /cluster/api/allocate/publish
Content-Type: application/json
POST /cluster/api/allocate/play
Content-Type: application/json

Both endpoints use the same request body:

{
"stream_path": "live/test",
"protocol": "webrtc",
"limit": 3
}
FieldTypeRequiredDescription
stream_pathstringYesStream path
protocolstringNoProtocol name, defaults to unknown
limitintNoCandidate node limit
GET /cluster/api/redirect/decision?stream_path=live/test&protocol=webrtc
ParameterTypeRequiredDescription
stream_pathstringYesTarget stream path
protocolstringNoProtocol type
HTTP StatusScenario
400Invalid body or missing stream_path
404Route not found
401Missing or invalid X-Cluster-Api-Key when internal_api_key is configured
503Allocation/redirect manager unavailable