AI实践：本地化音频转文字

本文介绍一个基于 OpenAI Whisper 模型的音频转写脚本，支持：

强制中文识别，提高中文转写准确率
繁体转简体转换，方便简体中文阅读
带时间戳的分段输出，格式为 [hh:mm:ss - hh:mm:ss]: 文本
GPU加速支持，自动判断是否使用 CUDA
输出转写结果到文本文件

功能亮点

利用 Whisper 预训练模型，实现高质量语音转文字
通过 OpenCC 库自动将繁体中文转换为简体中文，适合大陆用户习惯
输出内容包含每段语音的开始和结束时间，方便后续对照音频定位
灵活支持模型大小和设备选择，满足不同性能需求

代码示例

import whisper
import datetime
import torch
from opencc import OpenCC

def format_time(seconds):
    return str(datetime.timedelta(seconds=int(seconds)))

def transcribe_with_timestamps(
    audio_path,
    output_path="output.txt",
    model_size="medium",
    use_cuda=True
):
    device = "cuda" if torch.cuda.is_available() and use_cuda else "cpu"
    print(f"Using device: {device}")

    model = whisper.load_model(model_size).to(device)
    
    # 语言强制为中文，关闭词级时间戳，避免速度下降
    result = model.transcribe(audio_path, language="zh", word_timestamps=False)
    
    cc = OpenCC('t2s')  # 繁体转简体

    with open(output_path, "w", encoding="utf-8") as f:
        for segment in result["segments"]:
            start = format_time(segment["start"])
            end = format_time(segment["end"])
            text = segment["text"].strip()
            text_simplified = cc.convert(text)
            line = f"[{start} - {end}]: {text_simplified}\n"
            f.write(line)
            print(line, end="")

if __name__ == "__main__":
    audio_file = "path/to/your/audio.mp3"
    transcribe_with_timestamps(audio_file, model_size="medium", use_cuda=True)

使用说明

安装依赖：

pip install openai-whisper opencc-python-reimplemented torch

brew install ffmpeg

准备音频文件：支持常见音频格式（MP3、WAV 等）
运行脚本：修改脚本中 audio_file 路径，执行即可得到带时间戳的简体中文转写结果
结果文件：转写内容会保存在 output.txt 中，每行格式为

[00:01:23 - 00:01:30]: 这里是转写内容

参数说明

参数名	说明	默认值
`audio_path`	输入音频文件路径	必填
`output_path`	输出文本文件路径	`output.txt`
`model_size`	Whisper模型大小，影响准确率和速度	`"medium"`
`use_cuda`	是否启用GPU加速	`True`

效果预览

音频取至这里：https://www.gequbao.com/music/4190

CodeBlock Loading...

小结

这段代码简单实用，适合需要将中文语音转成简体文本并保留时间戳的场景。通过 Whisper 和 OpenCC 的结合，兼顾了识别准确度和文本本地化需求。

欢迎大家在评论区交流使用体验和改进建议！

本文介绍一个基于 OpenAI Whisper 模型的音频转写脚本，支持：

强制中文识别，提高中文转写准确率
繁体转简体转换，方便简体中文阅读
带时间戳的分段输出，格式为 [hh:mm:ss - hh:mm:ss]: 文本
GPU加速支持，自动判断是否使用 CUDA
输出转写结果到文本文件

功能亮点

利用 Whisper 预训练模型，实现高质量语音转文字
通过 OpenCC 库自动将繁体中文转换为简体中文，适合大陆用户习惯
输出内容包含每段语音的开始和结束时间，方便后续对照音频定位
灵活支持模型大小和设备选择，满足不同性能需求

代码示例

import whisper
import datetime
import torch
from opencc import OpenCC

def format_time(seconds):
    return str(datetime.timedelta(seconds=int(seconds)))

def transcribe_with_timestamps(
    audio_path,
    output_path="output.txt",
    model_size="medium",
    use_cuda=True
):
    device = "cuda" if torch.cuda.is_available() and use_cuda else "cpu"
    print(f"Using device: {device}")

    model = whisper.load_model(model_size).to(device)
    
    # 语言强制为中文，关闭词级时间戳，避免速度下降
    result = model.transcribe(audio_path, language="zh", word_timestamps=False)
    
    cc = OpenCC('t2s')  # 繁体转简体

    with open(output_path, "w", encoding="utf-8") as f:
        for segment in result["segments"]:
            start = format_time(segment["start"])
            end = format_time(segment["end"])
            text = segment["text"].strip()
            text_simplified = cc.convert(text)
            line = f"[{start} - {end}]: {text_simplified}\n"
            f.write(line)
            print(line, end="")

if __name__ == "__main__":
    audio_file = "path/to/your/audio.mp3"
    transcribe_with_timestamps(audio_file, model_size="medium", use_cuda=True)

使用说明

安装依赖：

pip install openai-whisper opencc-python-reimplemented torch

brew install ffmpeg

准备音频文件：支持常见音频格式（MP3、WAV 等）
运行脚本：修改脚本中 audio_file 路径，执行即可得到带时间戳的简体中文转写结果
结果文件：转写内容会保存在 output.txt 中，每行格式为

[00:01:23 - 00:01:30]: 这里是转写内容

参数说明

参数名	说明	默认值
`audio_path`	输入音频文件路径	必填
`output_path`	输出文本文件路径	`output.txt`
`model_size`	Whisper模型大小，影响准确率和速度	`"medium"`
`use_cuda`	是否启用GPU加速	`True`

效果预览

音频取至这里：https://www.gequbao.com/music/4190

[0:00:00 - 0:00:05]: 词曲 李宗盛
[0:00:30 - 0:00:32]: 对这个世界
[0:00:32 - 0:00:34]: 如果你有太多的抱怨
[0:00:34 - 0:00:35]: 跌倒了
[0:00:35 - 0:00:37]: 就不该继续往前走
[0:00:37 - 0:00:38]: 为什么
[0:00:38 - 0:00:41]: 人要这么的脆弱堕落
[0:00:41 - 0:00:44]: 请你打开电视看看多少人
[0:00:44 - 0:00:47]: 为生命在努力勇敢的走下去
[0:00:47 - 0:00:49]: 我们是不是该自主
[0:00:49 - 0:00:53]: 珍惜一切就算没有用
[0:00:53 - 0:00:57]: 还记得你说这是为了惩罚
[0:00:57 - 0:01:00]: 谁知道想得要继续奔跑
[0:01:00 - 0:01:01]: 别微笑
[0:01:01 - 0:01:04]: 像是狗怎么我知道
[0:01:04 - 0:01:05]: 像是我的梦我只能
[0:01:05 - 0:01:09]: 不要哭让烟火虫带著你逃跑
[0:01:09 - 0:01:12]: 像见到鸽鸭永远在依靠
[0:01:12 - 0:01:13]: 回家吧
[0:01:13 - 0:01:16]: 回到最初的美好
[0:01:16 - 0:01:18]: 回到最初的美好
[0:01:27 - 0:01:44]: 不要这么容易就想放弃就像我说的
[0:01:44 - 0:01:47]: 只不得多梦想我可梦不就得了
[0:01:47 - 0:01:49]: 为自己的热忱心愿丧失
[0:01:49 - 0:01:52]: 先把海豚身洗完的颜色
[0:01:52 - 0:01:53]: 笑一个吧
[0:01:53 - 0:01:55]: 公正名叫不是目的
[0:01:55 - 0:01:58]: 让自己快乐快乐的才叫做意义
[0:01:58 - 0:02:03]: 突然的这飞机现在终于飞回我身上
[0:02:03 - 0:02:05]: 我身为的那快乐
[0:02:05 - 0:02:08]: 吃饺在店里最轻盈最难累了
[0:02:08 - 0:02:09]: 都在睡
[0:02:09 - 0:02:11]: 我偏偏疯掉电脑跑了
[0:02:11 - 0:02:12]: 谁在多想呢
[0:02:12 - 0:02:16]: 我靠著大大冷吹著碰撞这个睡著了
[0:02:16 - 0:02:19]: 我和其他的聪明总更精锤
[0:02:19 - 0:02:22]: 烟火散在路上就不怕兴趣
[0:02:22 - 0:02:26]: 等惜一切就算没有拥有
[0:02:27 - 0:02:30]: 还记得你说这是为了操抱
[0:02:30 - 0:02:33]: 睡著到想可要继续奔跑
[0:02:33 - 0:02:37]: 为了回想像是狗的梦我知道
[0:02:37 - 0:02:39]: 像是狗的梦我知道
[0:02:39 - 0:02:42]: 不要哭让烟火冲淡成天堂堡
[0:02:42 - 0:02:45]: 想见的更要永远的依靠
[0:02:45 - 0:02:46]: 回家吧
[0:02:46 - 0:02:49]: 回到最初的美好
[0:02:49 - 0:02:50]: 回到最初的美好
[0:02:50 - 0:02:54]: 还记得你说这是为了操抱
[0:02:54 - 0:02:57]: 睡著到想可要继续奔跑
[0:02:57 - 0:03:01]: 为了回想像是狗的梦我知道
[0:03:02 - 0:03:06]: 不要哭让烟火冲淡成天堂堡
[0:03:06 - 0:03:09]: 想见的更要永远的依靠
[0:03:09 - 0:03:10]: 回家吧
[0:03:10 - 0:03:12]: 回到最初的美好
[0:03:20 - 0:03:42]: Zither Harp

CodeBlock Loading...

小结

这段代码简单实用，适合需要将中文语音转成简体文本并保留时间戳的场景。通过 Whisper 和 OpenCC 的结合，兼顾了识别准确度和文本本地化需求。

欢迎大家在评论区交流使用体验和改进建议！

Search

AI实践：本地化音频转文字

AI实践：本地化音频转文字

功能亮点

代码示例

使用说明

参数说明

效果预览

小结

功能亮点

代码示例

使用说明

参数说明

效果预览

小结