博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
语音活性检测器py-webrtcvad安装使用
阅读量:4596 次
发布时间:2019-06-09

本文共 9083 字,大约阅读时间需要 30 分钟。

谷歌为WebRTC项目开发的VAD是目前最优秀、最先进和免费的产品之一。webrtcvad是WebRTC语音活动检测器(VAD)的python接口。兼容python2和python3。功能是将一段音频数据分为静音与非静音。它对于电话和语音识别很有用。

1、安装pip

yum -y install epel-releaseyum -y install python-pip

2、安装webrtcvad

yum -y install python-develpip install webrtcvad

3、webrtcvad测试脚本(test_webrtcvad.py

import collectionsimport contextlibimport sysimport wave import webrtcvad  def read_wave(path):    with contextlib.closing(wave.open(path, 'rb')) as wf:        num_channels = wf.getnchannels()        assert num_channels == 1        sample_width = wf.getsampwidth()        assert sample_width == 2        sample_rate = wf.getframerate()        assert sample_rate in (8000, 16000, 32000)        pcm_data = wf.readframes(wf.getnframes())        return pcm_data, sample_rate  def write_wave(path, audio, sample_rate):    with contextlib.closing(wave.open(path, 'wb')) as wf:        wf.setnchannels(1)        wf.setsampwidth(2)        wf.setframerate(sample_rate)        wf.writeframes(audio)  class Frame(object):    def __init__(self, bytes, timestamp, duration):        self.bytes = bytes        self.timestamp = timestamp        self.duration = duration  def frame_generator(frame_duration_ms, audio, sample_rate):    n = int(sample_rate * (frame_duration_ms / 1000.0) * 2)    offset = 0    timestamp = 0.0    duration = (float(n) / sample_rate) / 2.0    while offset + n < len(audio):        yield Frame(audio[offset:offset + n], timestamp, duration)        timestamp += duration        offset += n  def vad_collector(sample_rate, frame_duration_ms,                  padding_duration_ms, vad, frames):    num_padding_frames = int(padding_duration_ms / frame_duration_ms)    ring_buffer = collections.deque(maxlen=num_padding_frames)    triggered = False    voiced_frames = []    for frame in frames:        sys.stdout.write(            '1' if vad.is_speech(frame.bytes, sample_rate) else '0')        if not triggered:            ring_buffer.append(frame)            num_voiced = len([f for f in ring_buffer                              if vad.is_speech(f.bytes, sample_rate)])            if num_voiced > 0.9 * ring_buffer.maxlen:                sys.stdout.write('+(%s)' % (ring_buffer[0].timestamp,))                triggered = True                voiced_frames.extend(ring_buffer)                ring_buffer.clear()        else:            voiced_frames.append(frame)            ring_buffer.append(frame)            num_unvoiced = len([f for f in ring_buffer                                if not vad.is_speech(f.bytes, sample_rate)])            if num_unvoiced > 0.9 * ring_buffer.maxlen:                sys.stdout.write('-(%s)' % (frame.timestamp + frame.duration))                triggered = False                yield b''.join([f.bytes for f in voiced_frames])                ring_buffer.clear()                voiced_frames = []    if triggered:        sys.stdout.write('-(%s)' % (frame.timestamp + frame.duration))    sys.stdout.write('\n')    if voiced_frames:        yield b''.join([f.bytes for f in voiced_frames])  def main(args):    if len(args) != 2:        sys.stderr.write(            'Usage: example.py 
\n') sys.exit(1) audio, sample_rate = read_wave(args[1]) vad = webrtcvad.Vad(int(args[0])) frames = frame_generator(30, audio, sample_rate) frames = list(frames) segments = vad_collector(sample_rate, 30, 300, vad, frames) for i, segment in enumerate(segments): #path = 'chunk-%002d.wav' % (i,) print('--end') #write_wave(path, segment, sample_rate) if __name__ == '__main__': main(sys.argv[1:])

 

4、运行命令(其中,第一个参数为敏感系数,取值0-3,越大表示越敏感,越激进,对细微的声音频段都可以识别出来;第二个参数为wav文件存放路径,目前仅支持8K,16K,32K的采样率,示例wav文件下载:73.wav 链接:https://pan.baidu.com/s/19YJB9u0zvCFGBLDRisK1KQ 密码:fgkf)

[root@host-10-0-251-159 ~]# python test_webrtcvad.py 2 /home/73.wav00000000000000000000000000000000000000000000000000000000000000000000000000111111+(2.1)11111111111111111111111100000000-(3.36)--end00000000000111111+(3.57)1111111111111111111111111111111111111001111111111111111111111111111111111111000000111111111111111111111111111111111111111111111111101111111111110000011110000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000111111111111111111111111111100111111111111111111111111111111111111111111111100000000000011100000000-(14.43)--end000000000000000000000000000000000000011+(15.3)111100000001110000-(16.14)--end00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111+(21.21)11111111111111111111111110000000-(22.47)--end00000000000111111+(22.68)111111111111111111111111111111111111111111100000000000-(24.6)--end000000111111+(24.66)111111111111111111111111111111111111111111111111111110000000-(26.76)--end1111111111+(26.76)1111111111111110000000000-(27.81)--end000000001111+(27.87)11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000-(31.38)--end0001111111+(31.38)11111111111111111110111111111111111000000-(32.91)--end00000001111111111111+(33.21)111000111111111111111111111111111111111110000000000-(35.04)--end000000000000000000000000000111111+(35.73)111111111111111111111111111111111111111111111111111000011111111111111111111111111000000011111111111111111111111111111111111111111111111111111111111111111111111111111000011100000000-(41.43)--end000000000000000000000000000000000000000000000111111+(42.66)1111111111111111111110000000-(43.8)--end000000001111111+(43.95)1111111111111111111111110011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111010000000-(51.03)--end00000000111111+(51.15)1111111111111111111111111111111111111001111111111111111111111111111111111000000-(53.82)--end0111111111+(53.82)11111111111111111111111111111111111111111111000011111111111111111111111111001111111111111111111111111111111111111111111111111111000111111111111111111111111111111111111111111111111111110000000-(59.85)--end00000000000000000000000000111111+(60.51)11111111111111111111111111111111111000000111100111111111111111111111111111111111111111111111111111111111111111111111110011100000000-(64.74)--end0000111000000000000000000001111111+(65.46)11111111111111111111111111111110011100000000000000-(67.26)--end00000000111000000000111111+(67.74)111111111111111111111111111111111000100000000-(69.39)--end00001111111+(69.42)11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100001111111111110001111111111111111111111111111111110000000-(74.55)--end1111111111+(74.55)111111111111111111111111111111111111100000011111011111111111111111111111111111111111111111111111111111111111111111111111111111111111100111111111111111111111111111111111111111111111111111111111111111111111000000000-(81.24)--end0011111000000111111+(81.51)111111111111111111111111111111111111111111111111111111111111111111111100000000111111111111111111111111111111111111111111111111111111111111111001111111111111111111111111111111111000000001100000000-(87.66)--end000000000001111111+(87.9)1111111111111111111111111111110111111000001100000000-(89.76)--end000000000000000000000000000000000000000000000000111111+(91.08)1111111111111100000000-(92.04)--end0000000000000111111+(92.31)11111111111111110111011111111111111111111111110001111111111111111111111111111111111111111000001111111111111111111111111111111111111111100000000-(96.9)--end000000000000000111111+(97.23)11111111111111111111111111111111111111100111111001111111111111111111111111111111111111111111111111001111111111111111111111111111111111111111100000000000000000-(102.27)--end000111000000111111+(102.51)111111111111111111111111111111111111111111111110000000-(104.43)--end0000111111+(104.43)111111111111111111111111111111110000000-(105.9)--end11100111100000000011111111+(106.38)111000000011111111111111111111111111111100000000-(108.12)--end00001111000000000011110111111+(108.69)111111111111111111111111111111110000000-(110.16)--end000000000000000000000000011100111000111111+(111.12)111111111111111111100001111111111111111111111111110000000-(113.13)--end0001111111+(113.13)111111111111111111111111111111111110000010000000-(114.87)--end0111011111+(114.87)1111111111111111111111111111111111100000011111111111111111111111111111111111111111111110110000000-(118.08)--end

 

转载于:https://www.cnblogs.com/zhenyuyaodidiao/p/9288455.html

你可能感兴趣的文章
网站首页加载动态数据的方法
查看>>
09-Python之迭代器,生成器
查看>>
Java逆向入门(一)
查看>>
泛型与非泛型代码性能比较
查看>>
杂项_眼见非实(ISCCCTF)
查看>>
代码审计_弱类型整数大小比较绕过
查看>>
PHP函数方法
查看>>
[译]你真的了解外边距折叠吗
查看>>
c#中IList<T>与List<T>
查看>>
python 多线程删除MySQL表
查看>>
ibatis报错
查看>>
SCN学习
查看>>
mysql的启动
查看>>
TCP端口状态说明ESTABLISHED、TIME_WAIT、 CLOSE_WAIT
查看>>
自己电脑能ping别人的,但别人电脑去不能跟我们的电脑通信
查看>>
制作自动化系统安装U盘
查看>>
python模块之xml.etree.ElementTree
查看>>
谷歌模拟
查看>>
【NOI2012】迷失游乐园
查看>>
postgresql 自定义排序
查看>>