TTS 文本转语音

NUWA 提供统一的文本转语音接口，聚合多家主流模型，保持与 OpenAI 风格一致的调用方式，适用于语音播报、AI 配音、数字人、智能客服、有声内容等场景。

支持的模型

gpt-4o-audio-preview：对话式音频生成，支持多模态输出（文本 + 音频）
gpt-4o-mini-tts：实时语音合成，支持通过提示词或 instructions 控制语音风格
tts-1-hd：高清音质的上一代 TTS 模型
tts-1：标准 TTS，兼顾质量与速度

性能建议：追求速度时使用 wav 或 pcm；高质量音频可选 tts-1-hd，实时应用推荐 gpt-4o-mini-tts。

接口规格

from openai import OpenAI

client = OpenAI(
    api_key="sk-***",  # 在 NUWA 控制台生成的 Key
    base_url="https://api.nuwaapi.com/v1"
)

with client.audio.speech.with_streaming_response.create(
    model="<model_id>",  # tts-1 / tts-1-hd / gpt-4o-mini-tts
    input="your text",
    voice="alloy",
    response_format="mp3",
) as response:
    response.stream_to_file("speech.mp3")

print("已保存到 speech.mp3")

model 为具体语音模型 ID
返回内容为音频二进制流

标准 TTS 参数（tts-1 / tts-1-hd / gpt-4o-mini-tts）

参数	类型	必填	说明
`model`	string	是	选择 `tts-1`、`tts-1-hd` 或 `gpt-4o-mini-tts`
`input`	string	是	待合成文本，最长约 4096 字符
`voice`	string	否	发音人：`alloy`、`echo`、`fable`、`onyx`、`nova`、`shimmer`
`response_format`	string	否	输出格式：`mp3`（默认）、`opus`、`aac`、`flac`、`wav`、`pcm`
`speed`	number	否	语速 0.25–4.0，默认 1.0；`gpt-4o-mini-tts` 不支持，可用自然语言控制
`instructions`	string	否	仅 `gpt-4o-mini-tts` 支持，用于指定口音、情绪、语调、语速等

gpt-4o-audio-preview 参数

参数	类型	必填	说明
`model`	string	是	设置为 `gpt-4o-audio-preview`
`modalities`	array	是	必须包含 `"text"` 和 `"audio"`
`audio`	object	是	音频配置，如 `{ "voice": "alloy", "format": "mp3" }`
`messages`	array	是	聊天消息数组，格式同标准 Chat Completions

调用示例

标准 TTS（tts-1）

from openai import OpenAI

client = OpenAI(api_key="sk-***", base_url="https://api.nuwaapi.com/v1")

with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    input="欢迎使用 NUWA 文本转语音接口。",
    voice="alloy",
    response_format="mp3",
) as response:
    response.stream_to_file("speech.mp3")

print("已保存到 speech.mp3")

gpt-4o-mini-tts（指定语音风格）

from openai import OpenAI

client = OpenAI(api_key="sk-***", base_url="https://api.nuwaapi.com/v1")

with client.audio.speech.with_streaming_response.create(
    model="gpt-4o-mini-tts",
    input="请用温暖而清晰的声音播放这一段文字。",
    voice="nova",
    instructions="温柔、放松的语气，语速稍慢",
    response_format="mp3",
) as response:
    response.stream_to_file("soft.mp3")

print("已保存到 soft.mp3")

gpt-4o-audio-preview（文本 + 音频）

import base64
from pathlib import Path
from openai import OpenAI

client = OpenAI(api_key="sk-***", base_url="https://api.nuwaapi.com/v1")

completion = client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "mp3"},
    messages=[
        {"role": "user", "content": "用亲切的语气介绍一下 NUWA TTS 接口的用法。"}
    ],
)

# 文本结果（兼容 content 为空的情况）
msg = completion.choices[0].message
text_part = None
if msg.content and len(msg.content) > 0:
    first = msg.content[0]
    text_part = getattr(first, "text", None) if first is not None else None

if text_part:
    print(text_part)
else:
    print("未返回文本内容，完整 message 对象：", msg)

# 音频结果（Base64）写入 mp3
audio_b64 = completion.choices[0].message.audio.data
audio_bytes = base64.b64decode(audio_b64)
out_path = Path("preview.mp3")
out_path.write_bytes(audio_bytes)
print(f"音频已保存到 {out_path}")

返回说明

接口直接返回音频二进制流，建议使用 --output 或程序方式保存文件
若发生错误，将返回 JSON 结构的错误信息，包含状态码与原因

支持的模型​

接口规格​

标准 TTS 参数（tts-1 / tts-1-hd / gpt-4o-mini-tts）​

gpt-4o-audio-preview 参数​

调用示例​

标准 TTS（tts-1）​

gpt-4o-mini-tts（指定语音风格）​

gpt-4o-audio-preview（文本 + 音频）​

返回说明​

支持的模型

接口规格

标准 TTS 参数（tts-1 / tts-1-hd / gpt-4o-mini-tts）

gpt-4o-audio-preview 参数

调用示例

标准 TTS（tts-1）

gpt-4o-mini-tts（指定语音风格）

gpt-4o-audio-preview（文本 + 音频）

返回说明