Cluster Architecture
The Monibuca V6 cluster solution is based on the Origin-Edge model. Nodes communicate via the QUIC protocol for low-latency, high-reliability communication, supporting automatic node discovery, intelligent stream forwarding, and load balancing.
Architecture Overview
Section titled “Architecture Overview” ┌──────────────┐ │ DNS/LB │ ← User Entry Point └──────┬───────┘ │ ┌───────────────┼───────────────┐ │ │ │ ┌──────▼─────┐ ┌─────▼──────┐ ┌─────▼──────┐ │ Edge-BJ │ │ Edge-SH │ │ Edge-GZ │ │ (Beijing) │ │ (Shanghai) │ │ (Guangzhou) │ └──────┬─────┘ └─────┬──────┘ └─────┬──────┘ │ │ │ │ QUIC Relay │ │ │ │ ┌──────▼───────────────▼───────────────▼──────┐ │ Origin Cluster │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Origin-1 │ │ Origin-2 │ │ Origin-3 │ │ │ └──────────┘ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────┘Role Definitions
Section titled “Role Definitions”| Role | Responsibility |
|---|---|
| Origin | Receives direct ingest from publishers; serves as the stream source. Provides stream data to Edge nodes |
| Edge | Receives pull requests from viewers. Automatically pulls from Origin when a stream is not available locally |
QUIC Communication
Section titled “QUIC Communication”Cluster nodes communicate using the QUIC protocol, leveraging its inherent advantages:
Why QUIC
Section titled “Why QUIC”| Feature | TCP | QUIC |
|---|---|---|
| Connection establishment | 1-3 RTT (including TLS) | 0-1 RTT |
| Head-of-line blocking | Entire connection blocked | Only individual stream blocked |
| Multiplexing | Requires HTTP/2 | Native support |
| Connection migration | Not supported | Supported (survives IP changes) |
| Congestion control | Shared across connection | Independent per stream |
Communication Layers
Section titled “Communication Layers”┌─────────────────────────────┐│ Application Layer ││ (gRPC: Node Sync/Control) │├─────────────────────────────┤│ Relay Layer ││ (Audio/Video Data Transfer)│├─────────────────────────────┤│ QUIC Transport ││ (quinn: Connection Mgmt) │├─────────────────────────────┤│ TLS 1.3 ││ (Self-signed/Auto Certs) │└─────────────────────────────┘- gRPC Layer: Node discovery, heartbeat detection, stream session registration, Catalog synchronization
- Relay Layer: High-speed audio/video frame data transfer, transmitted directly over QUIC streams
- Transport Layer: QUIC connection management based on the
quinnlibrary, with automatic certificate generation
Node Discovery and Health Checks
Section titled “Node Discovery and Health Checks”Seed Node Discovery
Section titled “Seed Node Discovery”When a new node starts, it connects to the seed nodes configured in seed_servers to obtain cluster topology information:
New Node → Seed Node: "I am edge-3, please tell me who is in the cluster"Seed Node → New Node: [origin-1, edge-1, edge-2, ...]New Node → Each Node: Establish QUIC connectionsHeartbeat Mechanism
Section titled “Heartbeat Mechanism”┌────────┐ heartbeat (every 5s) ┌────────┐│ Node A │ ──────────────────────► │ Node B ││ │ ◄────────────────────── │ ││ │ heartbeat_ack │ │└────────┘ └────────┘Each heartbeat carries node summary information:
- CPU usage
- Memory usage
- Bandwidth usage
- Current stream count
- Subscriber count
Failure Detection
Section titled “Failure Detection”Three-level failure detection based on heartbeat timeout:
| State | Condition | Behavior |
|---|---|---|
| Healthy | Heartbeat normal | Participates in cluster normally |
| Suspect | suspect_threshold consecutive missed responses | Marked as suspect, weight reduced |
| Offline | offline_threshold consecutive missed responses | Marked as offline, sessions cleaned up |
When a node is marked as Offline, the following actions are triggered:
- SessionRegistry clears all stream registrations for that node
- RelayManager disconnects all Relay connections to that node
- AllocationManager stops assigning requests to that node
Stream Relay
Section titled “Stream Relay”Pull-from-Origin Flow
Section titled “Pull-from-Origin Flow”Viewer → Edge: Request live/camera01Edge: Stream not available locally, query SessionRegistrySessionRegistry → Edge: Stream is on Origin-1Edge → Origin-1: Establish QUIC Relay connectionOrigin-1 → Edge: Transfer audio/video data via QUICEdge → Viewer: Distribute to local subscribersRelay Lifecycle
Section titled “Relay Lifecycle”- Establishment: Edge detects no local stream and initiates a Relay request to Origin via QUIC
- Transfer: RingBuffer data from Origin is transmitted to Edge over QUIC streams
- Health Check: Relay connection status is periodically checked (every
health_check_intervalseconds) - Release: After the last subscriber on the Edge leaves, the Relay is released after waiting
release_delayseconds
Failure Recovery
Section titled “Failure Recovery”When a Relay connection is broken:
- RelayManager detects the connection anomaly
- Waits
retry_delayseconds before retrying - Retries up to
max_retry_attemptstimes - If the Origin node is offline, queries SessionRegistry for a new Origin
- After all retries fail, the stream on the Edge is marked as unavailable
Load Balancing
Section titled “Load Balancing”Allocation Strategy
Section titled “Allocation Strategy”AllocationManager selects the optimal node for new pull requests, considering:
- Node health status: Only selects nodes in Healthy state
- Load metrics: CPU usage, bandwidth usage, subscriber count
- Proximity: Prefers nodes in the same region as the request source
- Stream availability: Prefers nodes that already hold the target stream (to avoid pulling from origin)
Redirect Mechanism
Section titled “Redirect Mechanism”When an Edge node is overloaded, RedirectManager automatically redirects new requests to other nodes:
Viewer → Edge-BJ: Request live/camera01Edge-BJ: CPU usage 90% > threshold 85%Edge-BJ → Viewer: HTTP 302 → Edge-SHViewer → Edge-SH: Request live/camera01Redirect threshold configuration:
cluster: routing: cpu_threshold: 85.0 # CPU usage threshold bandwidth_threshold: 8000.0 # Bandwidth threshold (Mbps) subscriber_threshold: 2000 # Subscriber count thresholdDeployment Modes
Section titled “Deployment Modes”Single Origin + Multiple Edges
Section titled “Single Origin + Multiple Edges”The most common deployment mode, suitable for small to medium scale:
# Origin configurationcluster: sync: server_id: "origin-1" address: "10.0.1.1:8001" seed_servers: ["10.0.1.1:8001"]
# Edge configurationcluster: sync: server_id: "edge-bj-1" address: "10.0.2.1:8001" seed_servers: ["10.0.1.1:8001"] # Points to OriginMultiple Origins + Multiple Edges
Section titled “Multiple Origins + Multiple Edges”Large-scale deployment with multiple Origins sharing the ingest load:
# Origin-1cluster: sync: server_id: "origin-1" address: "10.0.1.1:8001" seed_servers: ["10.0.1.1:8001", "10.0.1.2:8001"]
# Origin-2cluster: sync: server_id: "origin-2" address: "10.0.1.2:8001" seed_servers: ["10.0.1.1:8001", "10.0.1.2:8001"]
# Edgecluster: sync: server_id: "edge-1" address: "10.0.2.1:8001" seed_servers: ["10.0.1.1:8001", "10.0.1.2:8001"]Cascaded Edges
Section titled “Cascaded Edges”Edge nodes can also serve as upstream for other Edges, forming a multi-tier cascade:
Ingest → Origin → Edge-L1 (Regional Hub) → Edge-L2 (City Node) → ViewerSuitable for large-scale nationwide distribution scenarios.
Operations Commands
Section titled “Operations Commands”View Cluster Status
Section titled “View Cluster Status”GET /cluster/statusView Node List
Section titled “View Node List”GET /cluster/nodesView Stream Distribution
Section titled “View Stream Distribution”GET /cluster/sessionsManually Trigger Rebalancing
Section titled “Manually Trigger Rebalancing”POST /cluster/rebalance