Implement SSML parsing at SpeechSynthesisUtterance
Reported by
guest271...@gmail.com,
Dec 15 2017
|
|||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.94 Chrome/62.0.3202.94 Safari/537.36 Steps to reproduce the problem: 1. Pass valid SSML as first argument SpeechSynthesisUtterance 2. 3. What is the expected behavior? SSML to be parsed What went wrong? SSML is not parsed Did this work before? N/A Does this work in other browsers? No Chrome version: 62.0.3202.94 Channel: n/a OS Version: Flash Version: SSML Example: https://www.w3.org/2004/Talks/05-www2004-voice/slide5-0.html Specifications: https://w3c.github.io/speech-api/speechapi.html#utterance-attributes, https://www.w3.org/TR/speech-synthesis11/
,
Dec 15 2017
,
Dec 17 2017
,
Dec 18 2017
I'm confused by this report.
I've been able to pass SSML to Chrome's text-to-speech engine. Chrome clearly parses the XML because it does not speak the XML. However, Chrome does not interpret the SSML semantics; Chrome just throws away the XML portions and speaks the .textContent in some default manner.
For example, Chrome will speak the .textContent of
<speak version="1.0" xml:lang="en-US">
Here are <say-as interpret-as="characters">SSML</say-as> samples.
Hello world, how are you today?
Try a date: <say-as interpret-as="date" format="dmy" detail="2">10-9-1960</say-as>
<say-as interpret-as="date" format="ymd" detail="2">1960-10-09</say-as>.
The safe's combination is <say-as interpret-as="characters" detail="2 1 2 1 2">10-24-65</say-as>.
<say-as interpret-as="telephone">650-555-1234</say-as>.
<sub alias="World Wide Web Consortium">W3C</sub>
</speak>
without saying something like "less than speak version one point zero. XML colon lang en...", but Chrome will not follow the date format instructions or say "World Wide Web Consortium" (it says "W 3 C").
I think Chrome should follow the SSML hints.
Edge does SSML correctly. I just tried Firefox, and it speaks the XML markup.
,
Dec 18 2017
PS. My comments above are for Windows 10 running Chrome 63. (not Linux).
,
Dec 18 2017
,
Dec 19 2017
Chromium at *nix does not parse SSML and speaks the XML, the same for Firefox 57
,
Jan 4 2018
Firefox on Windows also speaks the markup.
Chrome on Windows strips the SSML.
Edge on Windows processes SSML.
It doesn't take much code to strip SSML.
var parser = new DOMParser();
// get the text to speak
var str = utterance.text;
//Examine str DOMSTRING to see if it is SSML
// eg, a regex that skips a possible XML header and checks for "<speech ".
If <<str is SSML>> {
// parse the XML
var doc = parser.parseFromString(str, "application/xml");
if (doc.documentElement.nodeName === "parseerror") {
// problems.... just speak the markup
} else {
// strip the XML
str = doc.documentElement.textContent;
}
// str is the text to speak...
IIRC, Chrome does not make parseerror the documentElement. That is, Chrome does not follow
https://w3c.github.io/DOM-Parsing/#the-domparser-interface
,
Jan 4 2018
For Chrome returning document with wrong root, see https://bugs.chromium.org/p/chromium/issues/detail?id=698130
,
Jan 5 2018
At /run/user/1000/speech-dispatcher/log: speechd: Updating client specific settings "linux:chrome:extension_api" against emacs:* appears to be socket client name at Chromium corresponding to conn_ = libspeechd_loader_.spd_open( "chrome", "extension_api", NULL, SPD_MODE_THREADED); https://cs.chromium.org/chromium/src/chrome/browser/speech/tts_linux.cc?ssfr=1&l=119. spd-say has an -x/--ssml option which parses the input text as SSML. We should be able to set spd_set_data_mode(SPDConnection *connection, SPDDataMode mode) to SPD_DATA_SSML https://cs.chromium.org/chromium/src/third_party/speech-dispatcher/libspeechd.h?q=SSML&dr=CSs&l=62 by defining the spd_set_data_mode option at https://cs.chromium.org/chromium/src/third_party/speech-dispatcher/BUILD.gn?q=spd_say&dr=C&l=16.
,
Jan 5 2018
At *nix the speech-dispatcher program provides a means to set the unix socket data mode when --enable-speech-dispatcher flag set. We need to set the -x or -ssml flag as default for all calls to spd-say, or if necessary the -m option for espeak, when the socket connection is established for SSML to be parsed, or if no SSML is set as text at SpeechSynthesisUtterance the plain text should be synthesized. See also https://askubuntu.com/questions/991314/how-to-set-options-of-commands-called-by-browser.
,
Jan 5 2018
This bug report is does not address the conformance with the specification as to stripping SSML where the native application which synthesizes text does not have the capabilities to parse SSML, but rather, at *nix, where speech-dispatcher and spd-say are used within the Chromium source code, to simply pass the necessary option(s) to speech-dispatcher at each unix socket connection so that spd-say uses spd_set_data_mode of the connection to set SSML parsing to on for the entire connection, or if necessary, for each spd-say or espeak call. The existing option of using spd_set_data_mode is simply omitted from the source code of Chromium relevant to speech-dispatcher usage by the browsers unix socket connection to the native program.
,
Jan 5 2018
Could not locate any code where Chromium attempts to verify that SSML is set at SpeechSynthesisUtterance, thus could not find any attempt at code at Chromium source to strip SSML tags; in that aspect Chromium source code is not in conformance with the specification.
,
Jan 5 2018
Earlier Chrome platforms should support SSML https://bugs.chromium.org/p/chromium/issues/detail?id=88072
,
Jan 5 2018
Looking at the windows implementation to see SSML handling. Microsoft SAPI will handle SSML on its own (since 5.3). Chrome code at https://cs.chromium.org/chromium/src/chrome/browser/speech/tts_win.cc?ssfr=1 shows Chrome adds prefix/suffix information when utterance sets a pitch. That passes a string (prefix .text suffix) to Microft SAPI. I thought the SAPI XML prefix would wreck SSML processing, but SAPI still processes the SSML fragment even though it is embedded in the prefix/suffix pitch change.
,
Jan 5 2018
Not certain how the Extension TTS API is related to the current issue? Perhaps the owners of Blink>Speech and Internals>SpeechSynthesis could chime in to verify the differences between the Extension TTS API and Blink>Speech. Unless missing something could not locate any code which explicitly attempts to set the SSML parsing option of the unix socket connection to speech-dispatcher either at the platform or extension code for *nix.
,
Jan 30 2018
The attached file should be fixing this, could somebody patch chromium with it and confirm that this fixes the issue?
,
Feb 2 2018
Reporter@ - Thanks for filing the issue...!! Tried testing the issue by navigating to http://jsfiddle.net/8pyWZ/ from issue id: 88072 at comment #6. But unable to know what the issue is exactly. Could you please provide a screencast or screenshot for better understanding of the issue. This will help us in triaging the issue further. Thanks...!!
,
Feb 2 2018
#19 At which OS's have you tried the code at the linked jsfiddle? What is the audio output of speechSynthesis.speak() at each OS?
,
Feb 2 2018
Thank you for providing more feedback. Adding requester "krajshree@chromium.org" to the cc list and removing "Needs-Feedback" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 12 2018
Chrome on Windows does the right thing and just speaks individual letters "ABCD" for the jsfiddle described in Comment 19. I presume on other OSs, Chrome may speak the XML. Other browsers on Windows speak the markup. Firefox on Windows, for example, speaks the markup rather than just "ABCD". Edge on Windows speaks the markup, but that can be fixed by inserting xml:lang="en-US" into the document node; then Edge will just speak "ABCD". Chrome on Windows does not interpret the SSML markup; it just ignores the SSML hints and speaks the .textContent.
,
May 15 2018
Tried checking the issue on chrome version 62.0.3202.94 using Ubuntu 14.04 by navigating to http://jsfiddle.net/8pyWZ/ from issue id: 88072 at comment #6. The audio output after navigating to the above URL is <?xml version="1.0"?>\r\n<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis". @Reporter: Providing with a screencast describing the issue would help us in triaging it in a better way. Thanks!
,
May 15 2018
For this report, a screen shot is worthless. The issue is what the audio sounds like. Since this is Blink>Speech, knowledge of Web Speech API is presumed: https://w3c.github.io/speech-api/speechapi.html which also references the SSML spec: https://www.w3.org/TR/speech-synthesis/ If you navigate to http://jsfiddle.net/8pyWZ/ then Chrome should speak "Eh Bee Sea Dee". If you don't hear that, then the implemnentation is wrong. Chrome should not speak "ex em el version equals ...". This bug report complains that *nix Chrome does not speak "ABCD". Chrome on Windows speaks the jsfiddle correctly. According to the Web Speech API, all implementations should recognize SSML and speak its text content, but all implementations do not have to interpret or follow the SSML instructions. https://bugs.chromium.org/p/chromium/issues/detail?id=88072 is about following the SSML instructions, something that Chrome on Windows does not do even though it recognizes and strips SSML markup. SSML markup may contain instructions about how to speak an acronym (e.g., W3C say-as "World Wide Web Consortium") or a date (e.g., "01-11-2018" is a DMY date and should be spoken "November 1st twenty eighteen" and not "January eleventh twenty eighteen").
,
Aug 29
https://github.com/w3c/speech-api/issues/37 is a relevant spec issue. No browser has implemented support for parsing SSML in SpeechSynthesisUtterance.
,
Aug 29
"No browser has implemented support for parsing SSML in SpeechSynthesisUtterance." False. Edge supports SSML 1.0. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by guest271...@gmail.com
, Dec 15 2017