Run Qwen3-TTS Web UI Locally: Complete Guide for Voice Cloning Without Coding

Run Qwen3-TTS Web UI Locally: Complete Guide for Voice Cloning Without Coding

Qwen3-TTS comes with a built-in web interface—upload a recording to clone voices without writing any code. This guide covers Mac (M-series chips) and Windows (NVIDIA GPU) setups.

Run Qwen3-TTS Web UI Locally: Complete Guide for Voice Cloning Without Coding

Alibaba’s Qwen3-TTS is something else—upload a recording of yourself, and it “learns” to speak like you. Or describe a “deep magnetic male voice” in text, and it generates one for you. Best of all, it comes with a web interface—just open your browser and click away, no code required.

I personally tested this on a Mac mini M4 (M-series) and marked all the pitfalls I encountered so you don’t have to fall into them.


First, Know Which Path to Take

Local installation (deployment) guide:

https://lingflux.com/en/articles/ai/qwen3-tts-mac-mini-m4-complete-guide/

Don’t rush to copy commands yet. Check your computer specs first:

Your ComputerWhich Path to Take
Mac, M1/M2/M3/M4 chipsUse mps acceleration, take Mac route
Windows, with NVIDIA GPUUse cuda acceleration, take Windows route
No dedicated GPU, pure CPUIt works, just slower—brew some tea while waiting

Three Ways to Use It, Pick One

You choose different models at startup for different experiences. In short:

Voice Cloning → Upload your own recording, it learns your voice Model: Qwen/Qwen3-TTS-12Hz-1.7B-Base

Preset Voice → Choose from built-in voices, plus add instructions like “say it with a sad tone” Model: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

Custom Voice Design → Describe the voice you want in text, it creates it for you Model: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign

The commands below use the Base model (voice cloning) as an example—just swap the model name to switch modes.


Step 1: Start the Interface

Mac (M-series Chips)

Open Terminal and paste this command:

qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-Base \
  --device mps \
  --dtype bfloat16 \
  --no-flash-attn

What these parameters mean:

  • --device mps: Uses Apple chip’s GPU, much faster than pure CPU. If your Mac is an older model without M-series, change this to cpu
  • --dtype bfloat16: Model precision format, M-series supports it well, just use it as-is
  • --no-flash-attn: Don’t miss this one! Mac doesn’t support FlashAttention, without this flag startup will fail

Windows (NVIDIA GPU)

Open Command Prompt (CMD) and paste:

qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-Base ^
  --device cuda:0 ^
  --dtype bfloat16 ^
  --flash-attn

Parameter explanation:

  • --device cuda:0: Uses the first NVIDIA GPU (usually you only have one, so 0 is enough)
  • --dtype bfloat16: RTX 30-series and above support this, recommended
  • --flash-attn: This acceleration works under Windows + CUDA, significantly speeds things up

Tip: Windows uses ^ for line breaks (CMD) or backtick (PowerShell), different from Mac’s \, don’t mix them up.


No GPU, Pure CPU?

qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-Base \
  --device cpu \
  --dtype float32

It works, just slow. Generating a sentence might take a few minutes, so be patient.


Step 2: Open Your Browser

After the command runs, you’ll see this line in the terminal:

Running on local URL: http://0.0.0.0:8000

Go directly to http://localhost:8000 in your browser, and the interface appears. From there, just click around.

Want to use it on other devices in your local network? Replace localhost with this computer’s IP address. Find your IP: Mac runs ifconfig | grep "inet ", Windows runs ipconfig.


Don’t Panic When Errors Hit, Check This List

Mac shows FlashAttention error on startup Ten to one you forgot --no-flash-attn, add it and restart.


Windows says CUDA unavailable Run this to check:

python -c "import torch; print(torch.cuda.is_available())"

Output True means no problem. Output False means you installed the wrong PyTorch version—reinstall with CUDA support:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

cu121 corresponds to CUDA 12.1. Adjust based on your CUDA version—CUDA 11.8 would be cu118.


Not enough VRAM, getting OOM (Out of Memory) Change --dtype bfloat16 to --dtype float16—lower precision, saves some VRAM.


Model download slow or failing (China network) Set up a mirror before running commands:

Mac / Linux:

export HF_ENDPOINT=https://hf-mirror.com

Windows:

set HF_ENDPOINT=https://hf-mirror.com

Don’t Want to Run Locally? Try Online First

Setting up the model and environment takes some work. You can try the official online demo for a few minutes first—confirm you’re interested before committing to local setup:


Stuck on a step? Copy the complete error message from the terminal and throw it at a search engine or AI—most likely it’ll be solved in a few minutes.