新闻建站cms系统、政府cms系统定制开发

广州网站建设公司-阅速公司

asp.net新闻发布系统、报纸数字报系统方案
/
http://www.ysneo.com/
广州网站建设公司
您当前位置:首页>网站技术

网站技术

Windows Phone 8 Text-to-Speech(TTS)让应用程序读出内容

发布时间:2013/12/22 10:50:59  作者:Admin  阅读:478  

广告:阿里云采购优惠专区

讨论过Voice commands与Speech recognition后,接下来该篇要讨论的即是Text-to-Speech。相较于上述二种本篇的内容比较容易一些。

主要即是在应用程序中指定Speech System阅读指定的文字。搭配Windows.Phone.Speech.Synthesis API建立synthesized speech(合成语音),

或称text-to-speech (TTS),运用于应用程序之中做为提示用户输入、阅读消息的内容、目前搜寻的结果…等。

接着往下说明要操作的方法与重要类别:

(1) 准备必要的capabilites

要开发支持Text-to-Speech的应用程序,需要在manifest中加入:ID_CAP_SPEECH_RECOGNITION

(2) 基本的TTS Sample

最简单且快速建立TTS的方法,是使用SpeechSynthesizer.SpeakTextAsync()并指定一个纯文字的字符串给它。

阅读时的语系会根据「设定/语音/语音功能的语言」而读出对应的发音内容,例如:中文就会一开始读英文,在15字段时读出

中文的「十五」。

private async void ButtonSimpleTTS_Click(object sender, RoutedEventArgs e)
{
  SpeechSynthesizer synth = new SpeechSynthesizer();
    
  await synth.SpeakTextAsync("You have a meeting with Peter in 15 minutes.");
}

通常情况下,使用await修饰符搭配SpeakTextAsync()方法,异步执行内容的朗读。由于使用SpeakTextAsync()需要呼叫系统的

Speech System所以采用asynchronous机制让应用程序可以继续处理其它任务,

(3) 选择要朗读的Voice

WP 8系统包括多个国家的语音,每一个语音(voice generates synthesized speech )搭配一个语系,依「设定/系统/语言+地区」有所不同。

在程序里该怎么指定voice的语言呢

‧透过建立Windows.Phone.Speech.Synthesis.SpeechSynthesizer后,则可指定要加载语音的语言

‧建立好的SpeechSynthesizer物件可以指定加载手机中有安装的任何语音,使用于生成讲话。

‧如果没有语言被指定,API将自动以「设定/语音(Settings/Speech)」中的语言做为预设加载的来源

如何在程序找到需要的Voice

‧使用Windows.Phone.Speech.Synthesis.VoiceInformation对象集合与它的Language属性,搭配LINQ搜寻需要语系以取得语音集合;

->需注意这边找到的是设备已经安装的语音;如果没有需要提示用户进行安装;

‧设定SpeechSynthesizer的SpeechSynthesizer.SetVoice(VoiceInformation)方法来指定要加载的语音;

->在指定VoiceInformation时,需注意透过LINQ搜寻回来的结果

(1) only return femle或only return male;

(2) return femle and male;

为何会有这种情况,主要是因为安装语音时会各语言有二种声音(男/女)或者只有一种,所以需要指定要用的是female或male

的来发音,所以会指定Index;

透过下列范例来说明:撷取<Text-to-speech (TTS) for Windows Phone>范例:

// Declare the SpeechSynthesizer object at the class level.
SpeechSynthesizer synth;
 
// Handle the button click event.
private async void SpeakFrench_Click_1(object sender, RoutedEventArgs e)
{
  // Initialize the SpeechSynthesizer object.
  synth = new SpeechSynthesizer();
 
  // Query for a voice that speaks French.
  IEnumerable<VoiceInformation> frenchVoices = from voice in InstalledVoices.All
                     where voice.Language == "fr-FR"
                     select voice;
            
  // Set the voice as identified by the query.
  synth.SetVoice(frenchVoices.ElementAt(0));
 
  // Count in French.
  await synth.SpeakTextAsync("un, deux, trois, quatre");
} 

主要new了一个SpeechSynthesizer对象后,搭配InstalledVoiced.All取得目前设备中安装的语音来进行LINQ的搜寻,找到后再指定至对象中。

另外,更可以使用Speech Synthesis Markup Language (SSML)来指定需要语系的语音

可参考<Speech Synthesis Markup Language Reference>。

上述应能为大家建立基本实作TTS的概念,接下来针对主要的类别组件与方法加以说明:

Windows.Phone.Speech.Synthesis

该namespaces定义了包括启动、设定speech synthesis engine的类别,以创建成语音提示(prompts)、响应事件或是为了修改语音的特性。

SpeechSynthesizer负责speech synthesis engine连结与功能,更可以搭配指定特定的语系语音来朗读与呈现;

PromptBuilder类别提供appens speech synthesis engine的内容,透过从文字、SSML标记或录好的语音文件;

还有很多相关类别,往下针对WP8中会用到的类别来说明:

(a) SpeechSynthesizer

主要负责text-to-speech (TTS)语音工作的类别。重要的Event与Method如下:

Type Name Description
Event BookmarkReadched An event that fires when a <mark> element is reached in a Speech Synthesis Markup Language (SSML) file.
Event SpeechStarted An event that fires when the synthesized voice begins output.
Method CancelAll Cancels all asynchronous text-to-speech calls that are in the active queue.
Method Close Performs application-defined tasks associated with freeing, releasing, or resetting allocated resources.
Method SetVoice Sets the synthesized voice.
Method GetVoice Gets the active synthesized voice.
Method SpeakSsmlAsync(String) Asynchronously speaks a string of text with Speech Synthesis Markup Language (SSML) markup with a text-to-speech voice.
Method SpeakSsmlFromUriAsync(Uri) Asynchronously speaks the content of a standalone Speech Synthesis Markup Language (SSML) document with a text-to-speech voice.
Method SpeakTextAsync(String) Asynchronously speaks the content of a plain-text string.

synthesis API有提供上述三种Speak方法来启动语言输出,分别支持朗读纯本文、具有SSML标签内容或加载完整的SSML文件;

(b) VoiceInformation

定义一个text-to-speech voice的信息。重要的属性如下:

Property Access-Type Description
Description Read-only Gets the description of a text-to-speech (TTS) voice.
DisplayName Read-only Gets the display name of the text-to-speech (TTS) voice.
Gender Read-only Gets the gender of the text-to-speech (TTS) voice.
Id Read-only Gets the identifier of the text-to-speech (TTS) voice.
Language Read-only Gets the language of the text-to-speech (TTS) voice.

上述的范例程序透过Language属性来识别要搜寻的语系;

(c) InstalledVoices

提供连结在设备中「设定/语音」已安装的synthesis voices。

Property Access-Type Description
All Read-only Gets the full set of synthesized voices that are available to use as part of the Speech feature.
Default Read-only Gets the default synthesized voice.

Speech Synthesis Markup Language (SSML)

SSML是XML-based的标准格式语言被设计用于speech synthesis应用程序。在W3C's voice browser working group也有推荐该定义语言。

它允许开发人员控件多种synthesis speech的特性,例如:语音、语言、发音…等。然而MS实作SSML版本是基于World Wide Web Consortium

所定义的1.0版本(Speech Synthesis Markup Language (SSML) Version 1.0.)。

然而在SpeechSyntheiszer类别提供二个使用SSML朗读文字的方法,分别为:SpeakSsmlAsync(String)与SpeakSsmlFromUriAsync(Uri)。

前者接收类似参数型的文字(简单用SSML定义要朗读的内容),可比较方便在程序里立即切换要发音的语系;

后者则以完整的SSML文件定义来加以朗读,可透过完整定义各种发音内容与语系;

往下参考<Using SSML for advanced text-to-speech on Windows Phone 8>来说明SSML的结构:

(1) SSML文件或文字必定由<speak />卷标给包装起来

<speak />是在文件中是root element,也可以直接使用不包装其它element的组合。例如:

<speak version="1.0" 
       xmlns="http://www.w3.org/2001/10/synthesis" 
       xml:lang="string"> </speak>

内有三个属性,其中以xml:lang为最为重要,透过字面可看出它即是定义该speak要使用何种语系来发音

搭配SpeakSsmlAsync(String)最简单的范例如下:

private async void SpeachBySsmlString() 
{
    synth = new SpeechSynthesizer();
    // 定义一个简单的<speak />,指定发音语系为en-US;
    string ssmlText = "<speak version=\"1.0\" ";
    ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
    ssmlText += " Testing Windows Phone 8 TTS";
    ssmlText += "</speak>";
    await synth.SpeakSsmlAsync(ssmlText);
}

(2) 加入指定的Sound Files

除了上述直接定义<speak />搭配文字内容外,还可以指定<audio />于要发音的文字段中,举例来说:

有一段「this is a book.」我想要把「book」用上自己的音档,则可以写成

「this is a <audio src="ms-appx:///Assets/book.wav">book</audio>」。

然而,并非什么音档格式均可以搭配<audio />,音檔格式需要符合

‧support file in PCM, a-law and u-law format;

‧8 bits or 16 bits depth;

‧non-stereo (mono only);

private async void SpeakByStringInAudio()
{
    ssmlText = "<speak version=\"1.0\" ";
    ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
    ssmlText += "Here comes the dog, ";
    // 指定要播放音檔
    ssmlText += "<audio src=\"ms-appx:///Assets/cats.wav\">Dog </audio>";
 
    ssmlText += "</speak>";await synth.SpeakSsmlAsync(ssmlText);
}

src属性指定要加载的音档,如果朗读过中加载该音档失败、格式不符合或其它理由造成无法播放音文件时,系统会自己以预设的语音朗读。

另外,src采用的location有些可以支持Assets/cats.wav,但保险一点建议写成具有完整URI Scheme的格式比较好。

(3) 插入暂停

<break />标签被用于插入至朗读过程暂停或暂停指定时间,可搭配二个属性使用:

‧strength:选用属性,其值包括:none, x-weak, weak, medium, strong, or x-strong;

‧time:选用属性,定义停止的时间,单位:seconds或milliseconds;

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
// 分别定义要暂停时间与暂停强度
ssmlText += "There is a pause <break time=\"500ms\" /> here, ";
ssmlText += "and another one <break strength=\"x-strong\" /> here";
ssmlText += "</speak>";
await synth.SpeakSsmlAsync(ssmlText);

(4) 定义或改变单词的发音

SSML提供二种方法用于指定speech synthesis调整某一个字的发音。如下:

‧针对该字定义on-time的发音(pronunciation),采用<phoneme />

=>但采用该方法也代表只要出现该字时,均需要再用<phoneme/>包装一次;

=>具有二个属性:ph、alphabet;

如下范例:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
// 定义<phoneme/>与相关属性,ph为发音的方式;alphabet为固定
ssmlText += "<phoneme alphabet=\"x-microsoft-ups\" ph=\"O L AA\">hello</phoneme>";
ssmlText += ", I mean hello";ssmlText += "</speak>";
 
await synth.SpeakSsmlAsync(ssmlText);

‧在一个地方定义多个字的发音,采用<lexicon />

=>定义<lexicon />需要额外产生份lexicon file。该份文件也是XML-based,内容包括了发音与文字对应。如下范例:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"  
        xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"  
        alphabet="x-microsoft-ups" xml:lang="en-US">  
    <lexeme>    
        <grapheme>wife</grapheme>    
        <phoneme> W AI F AI</phoneme>  
    </lexeme>
</lexicon>

每一个字定义一个<lexeme />,它包含<phoneme />(定义该字如何发音)<grapheme />(定义什么字要用特定发音)

=>定义好的lexicon file,搭配SpeakSynthesizer.SpeakSsmlAsync()时,需要在<speak />中建立的<lexicon />

加入uri属性与type属性,如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
//指定uri属性
ssmlText += "<lexicon uri=\"ms-appx:///Assets/lexicon1.xml\"";
//指定type类型,与MIME Type相同
ssmlText += " type=\"application/pls+xml\"/>";
ssmlText += "She is not my wife";ssmlText += "</speak>";
await synth.SpeakSsmlAsync(ssmlText);

需注意,如果一份SSML中同时存在<phoneme />与<lexicon />时,speech synthesis会以<phoneme />为较高的优先权。

更多相关的内容可以参考<lexicon Element SSML>与<Speech Synthesis Markup Language Reference>。

(5) 更改voices

有很多方法可以改变指定目前要朗读的Voice,例如上述透过SpeakSynthesizer.SetVoice()的方法,

从InstalledVioce中搜寻到需要语系再指定。在SSML里提供<voice />卷标来指定,该卷标具有多个属性,

但都是选择使用,但至少要有一个,这些属性被认为是speech synthesis的优先选的值

因此,如果在加载该voice是有属性值有错的话,会另外以其它属性来使用。

Attribute Description
name Optional. Specifies the name of the installed voice that will speak the contained text.
gender Optional. Specifies the preferred gender of the voice that will speak the contained text.

The allowed values are: male, female, and neutral.

age Optional. Specifies the preferred age in years of the voice that will speak the contained text.

The allowed values are: 10 (child), 15 (teen), 30 (adult), and 65(senior).

xml:lang Optional. Specifies the language that the voice must support.

The value may contain either a lower-case, two-letter language code, (such as en for English), or may optionally include an upper-case, country/region or other variation in addition to the language code, (such as zh-CN).

variant Optional. An integer that specifies a preferred voice when more than one voice matches the values specified in any of the xml:lang, gender, or age parameters.

范例如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
//定义了<voice/>与相关的属性
ssmlText += "<voice name=\"Microsoft Susan Mobile\" gender=\"female\" age=\"30\"";
ssmlText += " xml:lang=\"en-US\">";ssmlText += "This is another test </voice>";
ssmlText += "</speak>";
 
await synth.SpeakSsmlAsync(ssmlText);

另外,还可以透过<p xml:lang="" />与<s xml:lang="" />针对某些内容修改vocie,如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-GB\">";
// 利用<p />与<s />切换voice
ssmlText += "<p>";ssmlText += "<s>First sentence of a paragraph</s>";
ssmlText += "<s xml:lang=\"en-US\">And this is the second sentence</s>";
ssmlText += "</p>";ssmlText += "</speak>";
await synth.SpeakSsmlAsync(ssmlText);

(6) 改变语音的韵律

可透过<break />标签去暂停或调整朗读的速度,另外可以搭配<prosody />提供更多属性的设定来达到需求。例如:

<prosody pitch="value" contour="value" 
         range="value" rate="value" 
         duration="value" volume="value"> </prosody>

Attribute Description
pitch Optional. Indicates the baseline pitch for the contained text.

This value may be expressed in one of three ways:

  • An absolute value, expressed as a number followed by "Hz" (Hertz). For example, 600Hz.
  • A relative value, expressed as a number preceded by "+" or "-" and followed by "Hz" or "st", that specifies an amount to change the pitch. For example +80Hz or -2st. The “st” indicates the change unit is semitone, which is half of a tone (a half step) on the standard diatonic scale.
  • An enumeration value, from among the following: x-low, low, medium, high, x-high, or default.
contour Optional. Represents changes in pitch for speech content as an array of targets at specified time positions in the speech output.

Each target is defined by sets of parameter pairs, for example:

<prosody contour="(0%,+20Hz) (10%,-2st) (40%,+10Hz)">

The first value in each set of parameters specifies the location of the pitch change as a percentage of the duration of the contained text (a number followed by "%").

The second value specifies the amount to raise or lower the pitch, using a relative value or an enumeration value for pitch, see above.

range Optional. A value that represents the range of pitch for the contained speech content.

This value may be expressed using the same absolute values, relative values, or enumeration values used to describe pitch, see above.

rate Optional. Indicates the speaking rate of the contained text.

This value may be expressed in one of two ways:

  • A relative value, expressed as a number that acts as a multiplier of the default. For example, a value of 1 results in no change in the rate. A value of .5 results in a halving of the rate. A value of 3 results in a tripling of the rate.
  • An enumeration value, from among the following: x-slow, slow, medium, fast, x-fast, or default
duration Optional. A value in seconds or milliseconds for the period of time that should elapse while the speech synthesis (TTS) engine reads the contents of the element. For example 2s or 1800ms.
volume Optional. Indicates the volume level of the speaking voice.

This value may be expressed in one of three ways:

  • An absolute value, expressed as a number in the range of 0.0 to 100.0, from quietest to loudest. For example, 75. The default is 100.0.
  • A relative value, expressed as a number preceded by "+" or "-" that specifies an amount to change the volume. For example +10 or -5.5.
  • An enumeration value, from among the following: silent, x-soft, soft, medium, loud, x-loud, or default.

搭配程序如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
ssmlText += "Testing the ";
// 定义<prosody />
ssmlText += "<prosody pitch=\"+100Hz\" volume=\"70.0\" >Prosody</prosody>";
ssmlText += " element";
ssmlText += "Normal,<prosody rate=\"2\"> Very Fast,</prosody>";
ssmlText += "<prosody rate=\"0.4\"> now slow,</prosody>";
ssmlText += "and normal again";
ssmlText += "</speak>";
 
await synth.SpeakSsmlAsync(ssmlText);

(7) 监控讲话进度

如果应用程序中需要针对朗读时有具体的监控行动,可以在SSML中为每一个监控点加上<mark />标签,那么,speech synthesizer在朗

读时如遇到<mark />会自动触发SpeechBookmarkReached event,透过该事件即可得到相关<mark />的信息。如下程序内容:

public MainPage()
{    
    InitializeComponent();    
    synth = new SpeechSynthesizer();    
    // Add the event handler for the speech progress events    
    synth.BookmarkReached += new TypedEventHandler<SpeechSynthesizer, 
            SpeechBookmarkReachedEventArgs>(synth_BookmarkReached);
} 
 
private async void Button7_Click(object sender, RoutedEventArgs e)
{    
    ssmlText = "<speak version=\"1.0\" ";    
    ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
    //标记要取得的<mark />
    ssmlText += "<mark name=\"START\"/>";    
    ssmlText += "This is the first half of the speech.";    
    ssmlText += "<mark name=\"HALF\"/>";    
    ssmlText += "and this the second half. Ending now";    
    ssmlText += "<mark name=\"END\"/>";    
    ssmlText += "</speak>";    
    await synth.SpeakSsmlAsync(ssmlText);
 
} 
 
static void synth_BookmarkReached(object sender, SpeechBookmarkReachedEventArgs e)
{    
    Debugger.Log(1, "Info", e.Bookmark + " mark reached\n");
}

(8) Specifying content type and aliasing parts of a speech

利用<say-as />来表示特定的content type(例如:日期、数字)。其格式如下:

<say-as interpret-as="string" format="digit string" detail="string"> <say-as>

Attribute Description
interpret-as Required. Indicates the content type of text contained in the element.

The SSML 1.0 say-as attribute values specification defines six content types.

format Optional. Provides additional information about the precise formatting of the contained text for content types that may have ambiguous formats. SSML defines formats for content types that use them.
detail Optional. Indicates the level of detail to be spoken. For example, this attribute might request that the speech synthesis engine pronounce punctuation marks.

There are no standard values defined for the detail attribute. Support for this attribute depends on the individual speech synthesis engine.

举例常见interpret-as如下:

Interpret-as Format Interpretation
date dmy, mdy, ymd,

ym, my, md,

dm, d, m, y

The contained text is a date in the specified format.

In the format designations, d=day, m=month, and y=year.

The format for date indicates which date components are represented and their sequence.

The following is an example of a say-as element that contains a date:

Today is <say-as interpret-as="date" format="mdy">10-19-2003</say-as>

The speech synthesizer should pronounce “Today is October nineteenth two thousand three”.

cardinal - The contained text should be spoken as a cardinal number.

The following is an example of a say-as element that contains a cardinal number:

There are <say-as interpret-as="cardinal">3</say-as> alternatives.

The speech synthesizer should pronounce “There are three alternatives”.

ordinal - The contained text should be interpreted as an ordinal number.

The following is an example of a say-as element that contains an ordinal number:

Select the <say-as interpret-as="ordinal">3rd</say-as> option.

The speech synthesizer should pronounce “Select the third option”.

characters - Indicates that each letter in the contained text should be pronounced individually (spelled out).

The following is an example of a say-as element that contains a word that should be spoken as individual letters:

<say-as interpret-as="characters">test</say-as>.

The speech synthesizer should pronounce each letter: “T E S T”.

time hms12,

hms24

The contained text is a time. Time may be expressed using either a 12-hour clock (hms12) or a 24-hour clock (hms24).

The format attribute indicates which clock to use. The following is an example of a say-as element that contains a time:

The train departs at <say-as interpret-as="time" format="hms12">4:00am</say-as>.

The speech synthesizer should speak “The train departs at four A M”.

Use a colon to separate numbers representing hours, minutes, and seconds.

The following time strings are all valid examples: 12:35, 1:14:32, 08:15, and 02:50:45.

telephone digit string The contained text is a telephone number. The format attribute may contain digits that represent a country code, for example “1” for the United States or “39” for Italy.

The speech synthesis engine may use this information to guide its pronunciation of a phone number.

The country code may also be included in the phone number, and if so, takes precedence over the country code in the format attribute if there is a mismatch. The following is an example of a say-as element that contains a telephone number:

The number is <say-as interpret-as="telephone" format="1">(888) 555-1212</say-as>.

The speech synthesizer should speak “My number is area code eight eight eight five five five one two one two”.

范例如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
//定义为ordinal 
ssmlText += "<p>This is an ordinal number: <say-as interpret-as=\"ordinal\">121</say-as></p>";
//定义为cardinal
ssmlText += "<p>This is a cardinal number: <say-as interpret-as=\"cardinal\">121</say-as></p>";
//定义为characters
ssmlText += "<p>And these are just individual numbers: <say-as interpret-as=\"characters\">121</say-as></p>";
ssmlText += "</speak>";
 
await synth.SpeakSsmlAsync(ssmlText);

另外,还可以搭配<sub />来提供指定某一字需要换读完整的字段,例如:在文字中可以把字写成缩写,但在读的时候想要用完整字来读的情境。

用于定义一个别名的功能,它可能不是特别有用,因为它的工作原理就像一次性别名,因此,其意图可能是给SSML文档所提供的方式有书面和

口头形式的相关文本更清晰。范例如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
//定义别名,所以当遇到WP8时,不会读WP8,而换成Windows Phone 8
ssmlText += "This code runs on <sub alias=\"Windows Phone 8\">WP8</sub>";
ssmlText += "</speak>";
await synth.SpeakSsmlAsync(ssmlText);

(9) 播放一份SSML document

定义SSML document本身即是XML文件,把上述介绍过的一些参数与格式整理成一份档案。

搭配SpeackSssmlFromUri()将应用程序中的SSML Document透过URI的方式加载进行朗读。一份完整的SSML Document如下:

<speak version="1.0" 
       xmlns="http://www.w3.org/2001/10/synthesis" 
       xml:lang="en-US">  
    <voice gender="male" xml:lang="en-US">    
        <prosody rate="0.8">      
            <p>Thanks for reading the article, and thanks for trying the examples</p>
            <p>Now be creative, and create amazing applications for this fantastic platform</p>      
            <voice gender="male" xml:lang="es">Adios</voice>    
        </prosody>  
    </voice>
</speak>

搭配以下的程序段:

await synth.SpeakSsmlFromUriAsync(new Uri("ms-appx:///Assets/SSML1.xml"));

广告:阿里云新人采购专场

相关文章
TTS
应用程序
cms新闻系统购买咨询
扫描关注 广州阅速软件科技有限公司
扫描关注 广州阅速科技