使用 Faster-Whisper 轻松将音频转录为文本
使用 Faster-Whisper 轻松将音频转录为文本
本文讲解了如何使用Faster-Whisper转写大量语音文件
最近需要将一批电话录音处理文字信息,马上想到了OpenAI的whisper的api
但是这些电话录音包含一些敏感信息,我并不想让这些录音文件上传到OpenAI,虽然OpenAI的隐私政策表明不会拿用户的数据进行训练,但实际上,谁知道呢(摊手)
于是使用OpenAI的whisper进行处理,但是录音文件又长又多,whisper处理起来比较慢,于是转用faster-whisper,处理速度确实提升了不少
接下来就开始讲解下怎么用。
安装Python环境和cuda这些不再详细讲了,开始讲重点
首先安装faster-whisper和python-docx
pip install faster-whisper python-docx
接下来获取需要转录的文件的绝对路径
import os
path = '/path/to/your/filesDir/'
all_items = os.listdir(path)
full_paths = [os.path.join(path, f) for f in all_items if os.path.isfile(os.path.join(path, f))]
然后定义一个生成文档的函数generatorWord
from docx import Document
from docx.shared import Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
def generatorWord(transcript,title,filepath):
# 创建一个新的Word文档
doc = Document()
title_paragraph = doc.add_heading(title, level=0)
title_paragraph.runs[0].font.size = Pt(16)
title_paragraph.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER
# 将文本添加到文档中,格式化时间戳
for line in transcript:
# 分割时间戳和文本内容
timestamp, text = line.split('] ')
p = doc.add_paragraph()
run = p.add_run(timestamp + '] ')
run.font.size = Pt(12)
run = p.add_run(text)
run.font.size = Pt(12)
p.paragraph_format.space_after = Pt(2) # 设置段落后间距为6磅
p.paragraph_format.space_before = Pt(0) # 设置段落前间距为0磅
doc.save(filepath)
print("文档已生成完成:%s" % (filepath))
最后使用faster-whisper将音频文件转为文字并使用generatorWord
函数生成对应的docx文档
from faster_whisper import WhisperModel
model_size = "large-v3"
# Run on GPU with FP32
model = WhisperModel(model_size, device="cuda", compute_type="float32")
for full_path in full_paths:
segments, info = model.transcribe(full_path, beam_size=5)
print("开始处理文件 %s | %s(%f)" % ( full_path, info.language, info.language_probability))
handler_item = []
for segment in segments:
handler_item.append("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
print("语音识别完成,正在转为word文档。")
filename=os.path.basename(full_path)
generatorWord(handler_item,filename,f'/path/to/you/outdir/{filename}.docx')