【持续更新】
为了简便:import librosa
display
specshow (data[, x_coords, y_coords, x_axis, …]) |
Display a spectrogram/chromagram/cqt/etc. |
waveplot (y[, sr, max_points, x_axis, …]) |
Plot the amplitude envelope of a waveform. |
cmap (data[, robust, cmap_seq, cmap_bool, …]) |
Get a default colormap from the given data. |
TimeFormatter ([lag, unit]) |
A tick formatter for time axes. |
NoteFormatter ([octave, major]) |
Ticker formatter for Notes |
LogHzFormatter ([major]) |
Ticker formatter for logarithmic frequency |
ChromaFormatter |
A formatter for chroma axes |
TonnetzFormatter |
A formatter for tonnetz axes |
[1]中介绍了很多关于librosa的应用,同时提出librosa.display模块并不默认包含在librosa中,使用时要单独引入:
waveplot
Plot the amplitude envelope of a waveform.
If y is monophonic, a filled curve is drawn between [-abs(y), abs(y)].
If y is stereo, the curve is drawn between [-abs(y[1]), abs(y[0])], so that the left and right channels are drawn above and below the axis, respectively.
Long signals (duration >= max_points) are down-sampled to at most max_sr before plotting.
librosa.display.waveplot(y, sr=22050, max_points=50000.0, x_axis='time', offset=0.0, max_sr=1000, ax=None, **kwargs)
specshow
Display a spectrogram/chromagram/cqt/etc.
librosa.display.specshow(data, x_coords=None, y_coords=None, x_axis=None, y_axis=None, sr=22050, hop_length=512, fmin=None, fmax=None, tuning=0.0, bins_per_octave=12, ax=None, **kwargs)
注意:源码中 sr 默认是22050Hz,如果音频文件是8k或者16k,一定要指定采样率。
可以选择不同的尺度显示频谱图,y_axis={‘linear’, ‘log’, ‘mel’, ‘cqt_hz’,...}
stft / istft
短时傅里叶变换 / 逆短时傅里叶变换,参考librosa源码和博客[librosa语音信号处理]。
librosa.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, pad_mode='reflect')
librosa.core.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=<class 'numpy.complex64'>, pad_mode='reflect') # This function caches at level 20.
The STFT represents a signal in the time-frequency domain by computing discrete Fourier transforms (DFT) over short overlapping windows. This function returns a complex-valued matrix D such that
- np.abs(D[f, t]) is the magnitude of frequency bin f at frame t, and
- np.angle(D[f, t]) is the phase of frequency bin f at frame t.
Parameters: |
- y : np.ndarray [shape=(n,)], real-valued
-
input signal
- n_fft : int > 0 [scalar]
-
length of the windowed signal after padding with zeros. The number of rows in the STFT matrix D is (1 + n_fft/2). The default value, n_fft=2048 samples, corresponds to a physical duration of 93 milliseconds at a sample rate of 22050 Hz, i.e. the default sample rate in librosa. This value is well adapted for music signals. However, in speech processing, the recommended value is 512, corresponding to 23 milliseconds at a sample rate of 22050 Hz. In any case, we recommend setting n_fft to a power of two for optimizing the speed of the fast Fourier transform (FFT) algorithm.
- hop_length : int > 0 [scalar]
-
number of audio samples between adjacent STFT columns.
Smaller values increase the number of columns in D without affecting the frequency resolution of the STFT.
If unspecified, defaults to win_length / 4 (see below).
- win_length : int <= n_fft [scalar]
-
Each frame of audio is windowed by window() of length win_length and then padded with zeros to match n_fft.
Smaller values improve the temporal resolution of the STFT (i.e. the ability to discriminate impulses that are closely spaced in time) at the expense of frequency resolution (i.e. the ability to discriminate pure tones that are closely spaced in frequency). This effect is known as the time-frequency localization tradeoff and needs to be adjusted according to the properties of the input signal y.
If unspecified, defaults to win_length = n_fft .
- window : string, tuple, number, function, or np.ndarray [shape=(n_fft,)]
-
Either:
- a window specification (string, tuple, or number); see
scipy.signal.get_window
- a window function, such as
scipy.signal.hanning
- a vector or array of length n_fft
Defaults to a raised cosine window (“hann”), which is adequate for most applications in audio signal processing.
- center : boolean
-
If True, the signal y is padded so that frame D[:, t] is centered at y[t * hop_length].
If False, then D[:, t] begins at y[t * hop_length].
Defaults to True, which simplifies the alignment of D onto a time grid by means of librosa.core.frames_to_samples . Note, however, that center must be set to False when analyzing signals with librosa.stream .
- dtype : numeric type
-
Complex numeric type for D. Default is single-precision floating-point complex (np.complex64).
- pad_mode : string or function
-
If center=True, this argument is passed to np.pad for padding the edges of the signal y. By default (pad_mode=”reflect”), y is padded on both sides with its own reflection, mirrored around its first and last sample respectively. If center=False, this argument is ignored.
|
Returns: |
- D : np.ndarray [shape=(1 + n_fft/2, n_frames), dtype=dtype]
-
Complex-valued matrix of short-term Fourier transform coefficients.
|
librosa.istft(stft_matrix, hop_length=None, win_length=None, window='hann', center=True, length=None)
librosa.core.istft(stft_matrix, hop_length=None, win_length=None, window='hann', center=True, dtype=<class 'numpy.float32'>, length=None) # This function caches at level 30.
Converts a complex-valued spectrogram stft_matrix to time-series y by minimizing the mean squared error between stft_matrix and STFT of y as described in [2] up to Section 2 (reconstruction from MSTFT).
In general, window function, hop length and other parameters should be same as in stft, which mostly leads to perfect reconstruction of a signal from unmodified stft_matrix.
Parameters: |
- stft_matrix : np.ndarray [shape=(1 + n_fft/2, t)]
-
STFT matrix from stft
- hop_length : int > 0 [scalar]
-
Number of frames between STFT columns. If unspecified, defaults to win_length / 4.
- win_length : int <= n_fft = 2 * (stft_matrix.shape[0] - 1)
-
When reconstructing the time series, each frame is windowed and each sample is normalized by the sum of squared window according to the window function (see below).
If unspecified, defaults to n_fft.
- window : string, tuple, number, function, np.ndarray [shape=(n_fft,)]
-
- a window specification (string, tuple, or number); see
scipy.signal.get_window
- a window function, such as
scipy.signal.hanning
- a user-specified window vector of length n_fft
- center : boolean
-
- If True, D is assumed to have centered frames.
- If False, D is assumed to have left-aligned frames.
- dtype : numeric type
-
Real numeric type for y. Default is 32-bit float.
- length : int > 0, optional
-
If provided, the output y is zero-padded or clipped to exactly length samples.
|
Returns: |
- y : np.ndarray [shape=(n,)]
-
time domain signal reconstructed from stft_matrix
|
有用的函数
effects.split
librosa.effects.split(y, top_db=60, ref=<function amax at 0x7fa274a61d90>, frame_length=2048, hop_length=512)
Split an audio signal into non-silent intervals. 参数说明源码。
Parameters: |
- y : np.ndarray, shape=(n,) or (2, n)
-
An audio signal
- top_db : number > 0
-
The threshold (in decibels) below reference to consider as silence
- ref : number or callable
-
The reference power. By default, it uses np.max and compares to the peak power in the signal.
- frame_length : int > 0
-
The number of samples per analysis frame
- hop_length : int > 0
-
The number of samples between analysis frames
|
Returns: |
- intervals : np.ndarray, shape=(m, 2)
-
intervals[i] == (start_i, end_i) are the start and end time (in samples) of non-silent interval i.
|
参考
[1] https://www.cnblogs.com/xingshansi/p/6816308.html
[2] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.
最新文章
- Web项目从Oracle转为Mysql,fluentnhibernate-1.0和NHibernate2.1.0升级到NHibernate3.3的注意事项
- HTTP 错误 404.3 – Not Found 由于扩展配置问题而无法提供您请求的页面。如果该页面是脚本,请添加处理程序。如果应下载文件,请添加 MIME 映射。
- bzoj2683
- java单例-积木系列
- css input checkbox和radio样式美化
- Javascript验证手机号码正则表达式
- [转]Oracle 树操作(select…start with…connect by…prior)
- Log4net日志GUI配置工具
- LoadRunner中web_custom_request 和 web_submit_data的差别
- httphelp web自动化
- *[topcoder]BracketExpressions
- 打开自定义链接新窗口(safari JS prompt的坑!)2016.03.08
- CImageList使用简要说明
- uva 10118,记忆化搜索
- Scala 对象
- 集成学习(ensemble learning)
- HTTP协议之chunk介绍
- SpringBoot入门 (九) MQ使用
- G. (Zero XOR Subset)-less(线性基)
- 王勇详谈 Linux Deepin 背后的故事
热门文章
- nodejs的child_process
- Loj #3045. 「ZJOI2019」开关
- linux 成功安装oracle后,为其创建一个登录账户
- ReentrantLock的实现原理及AQS和CAS
- Appium+python自动化(一)- 环境搭建—上(超详解)
- 如何防止短信API接口遍历
- SpringBoot 通过jjwt快速实现token授权
- netcore添加api帮助文档页-Swagger
- Markdown温故知新(2):详解七大标准语法
- 解决 new file()在IOS下不兼容 的问题