Enhanced Restart Flow After New Voice Installation Fixed issues causing delayed or incorrect restarts when adding new voices. The system now restarts cleanly and reliably after each addition.
Smarter Auto-Language Switching Improved detection and switching logic for multilingual content, resulting in smoother transitions and fewer misclassifications.
Rehnuma Manager UI Upgrade
Polished interface for better usability and workflow.
Added support for registering other Piper-based English voices/models.
Significantly improved handling of single words and individual letters, making it more reliable for TTS training, dictionary work, and educational use-cases.
New Voice: Rehnuma Arfa (Female):
Added a natural-sounding female voice named Arfa to broaden the selection for users.
Miscellaneous
Various stability fixes, performance tuning, refactoring, and internal improvements.
Version 1.0.4
Release Date: 30 October, 2025
Enabled ONNX Runtime graph optimizations
Set GraphOptimizationLevel to Level3 and enabled memory pattern reuse for inference sessions
Impact: 15–45% faster model inference; fewer allocations during repeated calls
Added phonemization result caching
Thread-safe cache keyed by voice and processed text (includes Arabic diacritized text)
Impact: 50–90% lower latency for repeated or similar utterances; lower CPU usage
Fixed exponential growth in real-time streaming chunk size
Replaced multiplicative chunk size update with constant base chunk size per stream
Impact: predictable latency, reduced memory usage, prevents performance degradation over time
Enabled GPU acceleration on Windows via DirectML
ONNX Runtime provider ordering: DirectML (preferred) → CPU fallback
Impact: large speedups on compatible GPUs without configuration changes
Tuned real-time streaming parameters
Updated defaults: chunk_size=64, chunk_padding=2
Impact: lower initial latency while maintaining stream smoothness
Increased gRPC channel buffer sizes
mpsc channel capacity raised from 512 to 1024 for both synthesis endpoints
Impact: reduced backpressure and blocking under bursty load; smoother streaming
Configured gRPC server for high concurrency
Set concurrency_limit_per_connection=1024, max_concurrent_streams=1000, tcp_keepalive=60s, tcp_nodelay=true
Impact: improved throughput and responsiveness under concurrent client load
Right-sized synthesis thread pool
Reduced from num_cpus4 to min(num_cpus2, 16), with a floor of 2
Impact: less context switching, more stable latency, better CPU utilization