Skip to content

Performance Tuning

This document describes the key performance parameters of Monibuca V6 and how to tune them, helping you achieve optimal performance across different deployment scales.

dispatcher_workers controls the number of stream dispatch threads and is the most critical parameter affecting throughput and latency.

config.yaml
dispatcher_workers: 0 # 0 = auto (CPU core count)
ValueBehaviorUse Case
0Auto-detect CPU core count and create an equal number of threadsRecommended default, suitable for most scenarios
1Single-threaded dispatchLow concurrency, latency-first scenarios
NSpecify N dispatch threadsPrecise resource allocation control
  • Small scale (< 100 streams): dispatcher_workers: 0 is sufficient
  • Medium scale (100-1000 streams): Set to 50%-75% of CPU core count, leaving headroom for other tasks
  • Large scale (> 1000 streams): Set to CPU core count, ensuring dispatch doesn’t become a bottleneck
  • Ultra-low latency: Set to 1 to reduce thread switching overhead (at the cost of throughput)
# 8-core CPU, 1000 streams scenario
dispatcher_workers: 6

RingBuffer is the core data structure for Monibuca’s zero-copy distribution. Each Track of each stream has its own independent RingBuffer.

# Global default configuration
publish:
ring_size: 256 # RingBuffer capacity (in frames)
CapacityMemory Usage (est./Track)Use Case
64~2 MBLow-latency live streaming, tight memory
128~4 MBGeneral live streaming scenarios
256~8 MBRecommended default, balancing latency and fault tolerance
512~16 MBHigh bitrate streams, multi-subscriber scenarios
1024~32 MBUltra-high concurrency subscriptions

Total memory estimation formula:

Memory ≈ Stream Count × Tracks per Stream × ring_size × Average Frame Size

For example: 100 streams × 2 Tracks × 256 × 16KB ≈ 800 MB

Subscribers and dispatchers communicate through bounded channels.

subscribe:
channel_size: 64 # Channel capacity per subscriber
ValueDescription
16Lowest latency, but prone to frame drops during network fluctuations
32Lower latency, moderate buffering
64Recommended default
128High buffering, suitable for unstable network conditions

Behavior when the channel is full depends on the configured strategy:

  • Drop old frames: Maintains real-time performance, may cause visual jumps
  • Block and wait: Maintains completeness, may increase latency

Each network connection consumes a file descriptor. Large-scale scenarios require raising system limits.

Terminal window
# Check current limit
ulimit -n
# Temporary modification
ulimit -n 1000000
# Permanent modification in /etc/security/limits.conf
* soft nofile 1000000
* hard nofile 1000000

Estimation formula:

Required fd ≈ (Publishers + Subscribers) × 1.2 + Base Overhead (~1000)
/etc/sysctl.conf
# Increase TCP buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216
# Increase connection queue
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
# Enable TCP fast recycling
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
# Increase port range
net.ipv4.ip_local_port_range = 1024 65535
# Increase network device receive queue
net.core.netdev_max_backlog = 65535

Apply the configuration:

Terminal window
sudo sysctl -p

For multi-core servers with NUMA architecture, binding processes to specific CPUs can reduce cross-NUMA node memory access:

Terminal window
# Bind to CPU 0-7
taskset -c 0-7 ./monibuca

Monibuca uses the system allocator by default. In high-concurrency scenarios, you can use jemalloc via LD_PRELOAD for better multi-threaded performance:

Terminal window
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so ./monibuca

The following data is based on a test environment (Intel Xeon 8-core, 32GB RAM, 10GbE network):

ScenarioStreamsTotal SubscribersCPU UsageMemory UsageLatency (P99)
RTMP→FLV1001,00015%800 MB< 500ms
RTMP→FLV5005,00045%3.5 GB< 800ms
RTMP→HLS10010,00025%1.2 GB3-6s
RTMP→WebRTC5050030%1.0 GB< 200ms
Mixed Protocols2002,00035%2.0 GBProtocol-dependent
MetricValue
Max subscribers per stream10,000+
Max bitrate per stream50 Mbps
Max stream count10,000+
Engine startup time< 1s
GET /api/sysinfo

Returns real-time metrics including CPU, memory, and network I/O. Combined with the report plugin, metrics can be exported to Prometheus/Grafana.

Enable the debug plugin for more detailed internal state:

features = ["debug"]
GET /debug/ringbuffer/{stream_path}
GET /debug/dispatcher
GET /debug/connections

Dynamically adjust the log level at runtime:

PUT /config/global
Content-Type: application/json
{
"log": { "level": "debug" }
}