mfcc vs fbank
2024-10-21 09:29:45
There is some debate in the community regarding the use of the DCT, instead of directly using the log Mel fiterbank features, particularly for deep neural network based acoustic models. Some research groups, like Google, use filterbanks (fbanks) while Kaldi mostly uses MFCCs, especially in its TDNN chain models. Since filterbank energies are correlated and cannot be used directly with a Gaussian mixture with diagonal covariance, we apply a discrete cosine transform (DCT) to decorrelate them.
Here is Dan Povey’s take on this:
The reason we use MFCC is because they are more easily compressible, being decorrelated; we dump them to disk with compression to 1 byte per coefficient. But we dump all the coefficients, so it’s equivalent to filterbanks times a full-rank matrix, no information is lost.
最新文章
- 【React】启动dva脚手架
- 使用duplicate target database ... from active database复制数据库
- web程序调试方法
- flask学习
- 六种排序的C++实现
- 关于三目运算符与if语句的效率与洛谷P2704题解
- 如何在IOS开发中在自己的framework中添加.bunble文件
- 服务器部署_linuix下 一台nginx 多域名
- KeyDown,KeyPress 和KeyUp
- PHP字符串替换函数strtr()
- 考试必备神器-真题园手机客户端Android版1.1正式上线啦,欢迎大家下载使用!
- PHP学习笔记7-JSON数据操作
- 转:C++与JAVA语言区别
- Java提取URL某个参数的值
- 转 Java虚拟机5:Java垃圾回收(GC)机制详解
- DefWindowProc是一个会产生消息的函数
- 《Linux内核分析》 第五节 扒开系统调用的三层皮(下)
- get the code of function in matlab
- IBM MQ + WebSphere + Spring JMS配置方法
- python中while循环和for循环的定义和详细的使用方法