这边博客,主要记录librosa. 中关于CQT 与perceptual_weighting()函数的理解。
1. CQT
def cqt( y, sr=22050, hop_length=512, fmin=None, n_bins=84, bins_per_octave=12, tuning=0.0, filter_scale=1, norm=1, sparsity=0.01, window="hann", scale=True, pad_mode="reflect", res_type=None, dtype=None, ):
讯享网
函数的接口如上所示, 其中
fmin: 最小的起始频率;
那么最高频率是算的呢?
已知, 从最低频率开始 fmin = 32Hz, 2 5 2^5 25,
由于总共八个音阶, 算上开始的, 所以这八个音阶对应的各自频率如下:
2 5 2^5 25 = 32Hz, 2 6 2^6 26=64Hz, 2 7 2^7 27=128Hz, 2 8 2^8 28=256Hz,
2 9 2^9 29=512Hz, 2 10 2^{10} 210=1024, 2 11 2^{11} 211, 2 12 2^{12} 212,
由以上可知, 2 12 2^{12} 212 = 4096 Hz,
1.1 参数的设置
fmin, filters个数, 若是使用默认配置参数时, 采样率过低(低于 4186Hz x 2),会出现如下情况:
讯享网sound_clip, s = librosa.load(fn, sr=8000) cqtpec = librosa.cqt(y=sound_clip, sr=s)
Use a lower n_bins or a lower fmin. With the default fmin of 32.7Hz (musical C1), n_bins = 84, and bins_per_octave = 12, the highest bin falls 7 octaves higher, at 4186Hz (C8), but with a sampling rate of 8000Hz you can only deal with frequencies up to 4000Hz, so if you keep fmin the same, n_bins needs to be no more than 83.
1.2 hop length 设定
hop_len 帧移动的长度,
假设 参数中,设置的 f_min = 32 Hz, = 2^5,
那么 hop_len 帧移动的长度在设置的时候,必须是32的倍数;
才能确保在输出后, 输出正确的帧数;
spect = librosa.cqt(waveform, sr=9000, hop_length=188, fmin=32, filter_scale=1 )
reference:
- https://blog.csdn.net/_/article/details/#t6;
- https://stackoverflow.com/questions//how-can-i-extract-cqt-from-audio-with-sampling-rate-8000hz-librosa

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/46922.html