Speech 服务是认知服务的一种,提供了语音转文本,文本转语音, 语音翻译等,今天我们实战的是语音转文本(Speech To Text)。

STT支持两种访问方式,1.是SDK,2.是REST API。

其中:

SDK方式支持 识别麦克风的语音流 和 语音文件;

REST API方式仅支持语音文件;

准备工作:创建 认知服务之Speech服务:

创建完成后,两个重要的参数可以在页面查看:

一. REST API方式将语音文件转换成文本:

Azure global的 Speech API 终结点请参考:

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech-service/rest-speech-to-text#regions-and-endpoints

Azure 中国区 的 Speech API 终结点:

截至到2020.2月,仅中国东部2区域已开通Speech服务,服务终结点为:

https://chinaeast2.stt.speech.azure.cn/speech/recognition/conversation/cognitiveservices/v1

对于Speech To Text来说,有两种身份验证方式:

其中Authorization  Token有效期为10分钟。

为了简便,本文使用了Ocp-Apim-Subscription-Key的方式。

注意:如果要实现文本转语音,按照上表,则必须使用 Authorization Token形式进行身份验证。

构建请求的其他注意事项:

  1. 文件格式:

  2. 请求头:


    需要注意的是,Key或者Authorization是二选一的关系。

  3. 请求参数:

在Postman中的示例如下:

如果要在REST API中使用 Authorization Token,则需要先获得Token:

Global 获取Token的终结点:

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech-service/rest-speech-to-text#authentication

中国区获取Token的终结点:

截至2020.02,只有中国东部2有Speech服务,其Token终结点为:

https://chinaeast2.api.cognitive.azure.cn/sts/v1.0/issuetoken

Postman获取Token 参考如下:

二. SDK方式将语音文件转换成文本(Python示例):

在官网可以看到类似的代码,但需要注意的是,该代码仅在Azure Global的Speech服务中正常工作,针对中国区,需要做特定的修改(见下文)。

import azure.cognitiveservices.speech as speechsdk

# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and service region (e.g., "chinaeast2").
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# Creates an audio configuration that points to an audio file.
# Replace with your own audio filename.
audio_filename = "whatstheweatherlike.wav"
audio_input = speechsdk.AudioConfig(filename=audio_filename)

# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)

print("Recognizing first result...")


# Starts speech recognition, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed. The task returns the recognition text as result.
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query.
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

# Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))

  

代码提供页面:

https://docs.azure.cn/zh-cn/cognitive-services/speech-service/quickstarts/speech-to-text-from-file?tabs=linux&pivots=programming-language-python#create-a-python-application-that-uses-the-speech-sdk

针对中国区,需要使用自定义终结点的方式,才能正常使用SDK:

speech_key, service_region = "Your Key", "chinaeast2"
template = "wss://{}.stt.speech.azure.cn/speech/recognition" \
"/conversation/cognitiveservices/v1?initialSilenceTimeoutMs={:d}&language=zh-CN"
speech_config = speechsdk.SpeechConfig(subscription=speech_key,
endpoint=template.format(service_region, int(initial_silence_timeout_ms)))

中国区完整代码如下:

#!/usr/bin/env python
# coding: utf-8

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
"""
Speech recognition samples for the Microsoft Cognitive Services Speech SDK
"""

import time
import wave

try:
import azure.cognitiveservices.speech as speechsdk
except ImportError:
print("""
Importing the Speech SDK for Python failed.
Refer to
https://docs.microsoft.com/azure/cognitive-services/speech-service/quickstart-python for
installation instructions.
""")
import sys
sys.exit(1)


# Set up the subscription info for the Speech Service:
# Replace with your own subscription key and service region (e.g., "westus").
speech_key, service_region = "your key", "chinaeast2"

# Specify the path to an audio file containing speech (mono WAV / PCM with a sampling rate of 16
# kHz).
filename = "D:\FFOutput\speechtotext.wav"

def speech_recognize_once_from_file_with_custom_endpoint_parameters():
"""performs one-shot speech recognition with input from an audio file, specifying an
endpoint with custom parameters"""
initial_silence_timeout_ms = 15 * 1e3
template = "wss://{}.stt.speech.azure.cn/speech/recognition/conversation/cognitiveservices/v1?initialSilenceTimeoutMs={:d}&language=zh-CN"
speech_config = speechsdk.SpeechConfig(subscription=speech_key,
endpoint=template.format(service_region, int(initial_silence_timeout_ms)))
print("Using endpoint", speech_config.get_property(speechsdk.PropertyId.SpeechServiceConnection_Endpoint))
audio_config = speechsdk.audio.AudioConfig(filename=filename)
# Creates a speech recognizer using a file as audio input.
# The default language is "en-us".
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config) result = speech_recognizer.recognize_once()

# Check the result
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))


speech_recognize_once_from_file_with_custom_endpoint_parameters()

需要注意的是,如果我们使用SDK识别麦克风中的语音,则将

speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config),修改为:

speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

最新文章

  1. BZOJ3813: 奇数国
  2. php代码效率测试
  3. Java学习网站
  4. Android中的图片压缩
  5. vs2010 无法创建 *.edmx(Entity Frame Work) 文件的问题
  6. Jquery 禁用 a 标签 onclick 事件30秒后可用
  7. Clamp函数
  8. 【zigbee】开启及清除NV_RESTORE信息的方法
  9. 多线程之join方法
  10. sqlserver跨数据库与跨服务器使用
  11. 《Language Implementation Patterns》之 增强解析模式
  12. Redis常用命令--Hashes
  13. 潭州课堂25班:Ph201805201 django 项目 第三十八课 后台 文章发布,FastDFS安装 配置(课堂笔记)
  14. 程序员的快速开发框架:Github上 10 大优秀的开源后台控制面板
  15. 小程序 wepy wx.createAnimation 向右滑动渐入渐出
  16. java混淆代码的使用
  17. protobuf 嵌套示例
  18. Oracle->mysql碰到的问题
  19. R序列seq
  20. view_countInfo

热门文章

  1. B: 最小代价
  2. js对数字的处理:取整、四舍五入、数字与字符串的转换
  3. FFmpeg——命令笔记
  4. 2020年java架构师是什么-java架构师基本要求
  5. IntelliJ IDEA Debug模式启动项目
  6. linux上实现jmeter分布式压力测试(转)
  7. CSS相关(1)
  8. CNN反向传播算法公式
  9. java学习-初级入门-面向对象③-类与对象-类与对象的定义和使用1
  10. linux零散知识