Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil Dwyer, Gabber
An in-depth look at Gabber's experience deploying the Orpheus text-to-speech model to production, covering latency optimization, high-fidelity LoRa-based voice cloning, and a cost-effective inference stack using vLLM and a consistent hash ring for load balancing.