New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 9 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 28
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment
link

Issue 812767: Speech Synthesis API bypasses audio autoplay policy

Reported by davidben@chromium.org, Feb 15 2018 Project Member

Issue description

Chrome Version       : 65.0.3325.51
OS Version: OS X 10.13.3
URLs (if applicable) : https://davidben.scripts.mit.edu/speech-test.html

What steps will reproduce the problem?
1. Visit https://davidben.scripts.mit.edu/speech-test.html
2. Get very annoyed

What is the expected result?
Speech should not play without user gesture.

What happens instead of that?
Speech plays without user gesture.

I'm not sure what the current state of audio autoplay but this suggests that, at least on Android, we don't allow audio autoplay without user gesture:
https://cs.chromium.org/chromium/src/media/base/media_switches.cc?rcl=b0e6afca6430cd3aa67504b4453ec10e15e37dd9&l=355

One way or another, this API ultimately dumps the data into a system service, so everything we do to control sound must be reimplemented to account for it, if it is left in its current form.

Please provide any additional information below. Attach a screenshot if
possible.

I came across a frame-busting ad that was abusing this this morning.

UserAgentString: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.51 Safari/537.36
 

Comment 1 by krajshree@chromium.org, Feb 16 2018

Labels: Needs-Triage-M65

Comment 2 by mlamouri@chromium.org, Feb 16 2018

Cc: mlamouri@chromium.org
Components: -Internals>Media>Audio Blink>Media>Autoplay
Owner: dmazz...@chromium.org
Status: Untriaged (was: Unconfirmed)
dmazzoni@, would it make sense to block the Speech Synthesis API when autoplay isn't allowed on the page? Are their ways to notify the callers that the API can't be used?

Comment 3 by mlamouri@chromium.org, Feb 26 2018

Cc: susanjun...@techmahindra.com
 Issue 807538  has been merged into this issue.

Comment 4 by dmazz...@chromium.org, Feb 26 2018

Status: Assigned (was: Untriaged)
My understanding is that an <audio> or <video> element cannot autoplay, but we don't prevent a page from playing audio via JavaScript, for example via the Web Audio API. But maybe I'm wrong.

Basically I think speech should behave the same as the Web Audio API. If we have a mechanism to

Comment 5 by dmazz...@chromium.org, Feb 26 2018

Owner: rtoy@chromium.org
@rtoy, can you answer about the Web Audio API? If there is a mechanism to prevent autoplay, can you point me to it so we can reuse it for speech?

If we don't suppress the Web Audio API then I think we should close this as WontFix.

Comment 6 by mlamouri@chromium.org, Feb 26 2018

Cc: rtoy@chromium.org
Owner: dmazz...@chromium.org
We do restrict autoplay for Web Audio, see https://developers.google.com/web/updates/2017/09/autoplay-policy-changes#webaudio

Comment 7 by dmazz...@chromium.org, Feb 26 2018

OK, sounds good. Do you happen to know where this is triggered in the code? Do we have a common place to check if there's been an appropriate user gesture?

Comment 8 by dmazz...@chromium.org, Feb 26 2018

Also we'll need a way to bypass this for Chrome extensions - besides our internal stuff there are some other extensions in the web store that generate speech in the background page.

Comment 9 by mlamouri@chromium.org, Feb 26 2018

Extensions and Chrome Apps should be fine with the exception of WebView.

You should be able to check if you are allowed to play by doing this:
```
#include "core/html/media/AutoplayPolicy.h"

bool IsAllowedToPlay() {
  if (AutoplayPolicy::GetAutoplayPolicyForDocument(*document) != AutoplayPolicy::Type::kDocumentUserActivationRequired))
    return true;

  return AutoplayPolicy::IsDocumentAllowedToPlay(*document);
```

Comment 10 by csharrison@chromium.org, Mar 5 2018

Labels: Hotlist-Abusive
Adding Hotlist-Abusive since the API is being actively abused.

Comment 11 by mlamouri@chromium.org, Mar 9 2018

 Issue 814129  has been merged into this issue.

Comment 12 by davidben@chromium.org, Apr 17 2018

Cc: cbentzel@chromium.org
Any news on this? This is being used by abusive ads and already falls under our existing autoplay behavior, so we really should plug this hole.

Comment 13 by davidben@chromium.org, May 24 2018

dmazzoni: Ping?

Comment 14 by cbentzel@chromium.org, May 24 2018

Cc: jkarlin@chromium.org csharrison@chromium.org
jkarlin, csharrison: Interested in picking this up?

Comment 15 by csharrison@chromium.org, May 24 2018

Yeah, I'd be happy to implement this if feature-owners are on board. I don't have much context in the accessibility space to know how much breakage to expect though.

Comment 16 by johnpallett@chromium.org, May 25 2018

related to 517317

Comment 17 by dmazzoni@google.com, Jun 14 2018

Owner: csharrison@chromium.org
I'm supportive of making this change.

Chrome also has a TTS extension API that predates the web speech synthesis API, so extension authors already have another option. Still, perhaps the safest route would be to whitelist extensions for now and just try to tackle the problem with web pages.

What sort of metrics could we collect? I'm wondering if there's any way we could determine how many instances of bad speech were blocked without violating privacy.

Assigning back to @csharrison, who volunteered to implement this, but please loop in me and katie@chromium.org (katydek@google.com) and we can try to help. I don't know how autoplay is implemented but I can help answer any questions about how speech synthesis works now, and Katie is the one who has done the most work on this recently.

Comment 18 by csharrison@chromium.org, Jun 15 2018

Thanks dmazzoni. I can volunteer some cycles to take a stab.

First thing I was thinking of implementing is:
1. UseCounter for speechSynthesis.speak
2. UseCounter for speechSynthesis.speak that would be blocked with autoplay policy.

The existing UseCounter for SpeechSynthesis (V8Window_SpeechSynthesis_AttributeGetter) shows hits on ~5% of pages. This is way higher than I expected, so hopefully implementing (1) and (2) can help measure some risk.

I don't think we can measure "good" vs. "bad" speech easily. Maybe if the API was per-frame we could correlate speech with quick tab closing as a proxy for abuse. As-is we don't have an easy way to do that though.

Comment 20 by bugdroid1@chromium.org, Jun 26 2018

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/dcdefe4c3bc6c6d890c46079fb1c4b02fc7846cc

commit dcdefe4c3bc6c6d890c46079fb1c4b02fc7846cc
Author: Charlie Harrison <csharrison@chromium.org>
Date: Tue Jun 26 14:10:22 2018

Add TTS UseCounters to ukm_features

These counters are logged in < .1% of pages.

See blink-dev intent to deprecate:
https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/XpkevOngqUs

Bug:  812767 
Change-Id: I75e262fb04230da4a2ebb47ecac37ebaf602462f
Reviewed-on: https://chromium-review.googlesource.com/1113659
Reviewed-by: Robert Kaplow <rkaplow@chromium.org>
Commit-Queue: Charlie Harrison <csharrison@chromium.org>
Cr-Commit-Position: refs/heads/master@{#570397}
[modify] https://crrev.com/dcdefe4c3bc6c6d890c46079fb1c4b02fc7846cc/chrome/browser/page_load_metrics/observers/use_counter/ukm_features.cc

Comment 21 by bugdroid1@chromium.org, Aug 3

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/76fa3b9b03cc2794038568845c57ae7336836f51

commit 76fa3b9b03cc2794038568845c57ae7336836f51
Author: Charlie Harrison <csharrison@chromium.org>
Date: Fri Aug 03 15:16:14 2018

Deprecate speechSynthesis.speak() without user activation

See intent to deprecate:
https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/XpkevOngqUs

The deprecation will target M71.

Bug:  812767 
Change-Id: Id4448a91047def16194a47efdacc152070dace82
Reviewed-on: https://chromium-review.googlesource.com/1157231
Reviewed-by: Dominic Mazzoni <dmazzoni@chromium.org>
Reviewed-by: Philip J├Ągenstedt <foolip@chromium.org>
Commit-Queue: Charlie Harrison <csharrison@chromium.org>
Cr-Commit-Position: refs/heads/master@{#580550}
[modify] https://crrev.com/76fa3b9b03cc2794038568845c57ae7336836f51/third_party/blink/renderer/core/frame/deprecation.cc
[modify] https://crrev.com/76fa3b9b03cc2794038568845c57ae7336836f51/third_party/blink/renderer/modules/speech/speech_synthesis.cc

Comment 22 by mvolm...@gmail.com, Aug 26

Can anyone explain how this would work? Right now I have an app that makes heavy use of TTS to provide audible cues to people. The TTS is initiated by the user, so this change would not have to have any effects on the app, but then the app makes uses of a timer to timely notify the user about things.
This subsequent notifications are not user initiated, so I guess they would break? If that's the case, this change will simply make my app unusable, and I would highly prefer this to be handled with a permission (much like any other thing: notifications, bluetooth, etc) than a user interaction...

Comment 23 by csharrison@chromium.org, Aug 26

Hi mvolmaro, thanks for reaching out. Can you provide a link to your site? Are you seeing cases where the devtools deprecation warning is firing?

If I understand you correctly, I do not think you will be affected. The change requires the user to interact with the page at least _once_ before subsequent speaking will succeed. After some interaction, multiple calls to speak can be called on that same page without failing.

Comment 24 by mvolm...@gmail.com, Aug 27

@csharrison: Right now I'm not seeing anything as I'm not using m70. I just read the upcoming changes in https://www.chromestatus.com/features and this took my attention.
If, after the first user interaction, TTS works normally, that would work for me. Question: I'm initiating the TTS on user interaction (on click) but not directly on the event handler, but on the result of a promise fired by the event handler...

So: Click event handler > Promise.then > play utterance.

Will that works the same as if the utterance is being played right on the event handler?

Comment 25 by csharrison@chromium.org, Aug 27

mvolmaro: Yes, speaking sometime after the user initiation but not on the actual event handler should work fine. Please do reach out if you see anything unexpected on Chrome 70.

Comment 26 by bugdroid1@chromium.org, Sep 28

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b469a6e8b042ebeb028c7f601f7c98990981b9a7

commit b469a6e8b042ebeb028c7f601f7c98990981b9a7
Author: Charlie Harrison <csharrison@chromium.org>
Date: Fri Sep 28 22:42:09 2018

Disallow speechSynthesis.speak autoplaying

See intent to remove:
https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/WsnBm53M4Pc

Even though speech API does not work properly on content shell, this
change can be tested in layout tests because it fails immediately without
calling into any synthesis code. To force autoplay, tests need to use
the new unified autoplay flag:
--autoplay-policy=document-user-activation-required

Bug:  812767 
Change-Id: I41bee6e37ab46ff2013d096c714b5124bd0ccc2c
Reviewed-on: https://chromium-review.googlesource.com/1225650
Commit-Queue: Charlie Harrison <csharrison@chromium.org>
Reviewed-by: Dominic Mazzoni <dmazzoni@chromium.org>
Reviewed-by: Philip J├Ągenstedt <foolip@chromium.org>
Cr-Commit-Position: refs/heads/master@{#595238}
[modify] https://crrev.com/b469a6e8b042ebeb028c7f601f7c98990981b9a7/third_party/WebKit/LayoutTests/NeverFixTests
[modify] https://crrev.com/b469a6e8b042ebeb028c7f601f7c98990981b9a7/third_party/WebKit/LayoutTests/TestExpectations
[modify] https://crrev.com/b469a6e8b042ebeb028c7f601f7c98990981b9a7/third_party/WebKit/LayoutTests/VirtualTestSuites
[add] https://crrev.com/b469a6e8b042ebeb028c7f601f7c98990981b9a7/third_party/WebKit/LayoutTests/virtual/speech-with-unified-autoplay/external/wpt/speech-api/README.txt
[modify] https://crrev.com/b469a6e8b042ebeb028c7f601f7c98990981b9a7/third_party/blink/renderer/core/frame/deprecation.cc
[modify] https://crrev.com/b469a6e8b042ebeb028c7f601f7c98990981b9a7/third_party/blink/renderer/modules/speech/speech_synthesis.cc

Comment 27 by csharrison@chromium.org, Sep 28

Status: Fixed (was: Assigned)

Comment 28 by earnolma...@gmail.com, Oct 9

What about when "SpeechRecognition" results are used to trigger a "speechSynthesis.speak" event?  Will that continue to work, or is that going to be broken by these changes?

This is going to affect voice only web applications that recognize speech and read text back to users. 

There are so many use-cases where sounds should play without direct user interaction, and SpeechRecognition result events firing speechSynthesis.speak events should continue to work without the user clicking on something directly since voice events are calling it.

It would be nice if Google quit inventing standards that should NOT exist.  There are perfectly valid instances when sounds should play without direct user interaction, and cases when they should NOT.  Deciding to take the blacklist approach and blocking sounds from playing without direct user interaction is just ridiculous and is not a very well thought out approach to handling this problem (if there even is one - since what you consider annoying may NOT be annoying to me).  Why change something that wasn't broken and was the web standard up until now just because someone is annoyed?  This is the internet.  If you don't like something, quit visiting the page.  Content creators should be able to decide when sounds play (with or without direct user interaction).  If you don't like it, get your panties out of a bunch and cry elsewhere.

Comment 29 by flor...@daschkiewicz.com, Nov 6

I also think that there are many use-cases where speechSynthesis.speak should work without user-ineraction!

It is (or better it was?) a very important feature in sense of human/machine interaction. For example in conjunction with the speechRecognition API to basically enable an audio-dialog between the user and the machine.

Sure, you have to ban abusive implementations of this feature, but the planned solution is not a good solution for most web developers out there!

Also think of use-cases where handicapped (blind) users want to take advantage of this feature. And in future they have to first click a button to get an audio response?! *lol*

Comment 30 by csharrison@chromium.org, Nov 6

I agree it is unfortunate that we need interactions for speech synthesis to work on the web. However, browsers have a responsibility to protect their users, and in this case we made a trade-off.

In this case the majority of usage of this API was for abuse.

Note that this doesn't change the ability for extensions to use the chrome.tts API to enable speech synthesis without an interaction.

Comment 31 by tgarifu...@gmail.com, Nov 14

Hi! Just got deprecation warning for speechsynthesis.`speak`. 
Can I suggest disabling it but let user turn it on in Chrome settings for specific domain(s)?
There are use cases when this is useful and improves UX. So, for my site I could prompt users to turn such a Chrome setting on (for my domain) and I'm sure those who interested by my web-app will do it.

Comment 32 by csharrison@chromium.org, Nov 14

Hey tgarifulin! We are currently developing a solution which allows sites whitelisted in chrome://settings/content/sound to autoplay content. This will include autoplay speech synthesis.

mlamouri: Is this expected to land in M71?

Comment 33 by mlamouri@chromium.org, Nov 15

I would strongly discourage websites to suggest users to alter there settings. If your website doesn't work with the autoplay restrictions, you would spend more time explaining to your users how to enable autoplay than it would take to make your website adapt to no autoplay. The spirit of this setting is for advanced users that visit old unmaintained websites. It is meant to help with backward compatibility.

Comment 34 by tgarifu...@gmail.com, Nov 15

csharrison, sounds great. 

mlamouri, I see your point. I partially agree it is not ideal. Bu I don't see other options. Sure, I'll have to adapt to `no autoplay`, but as I said in my above comment it will degrade user experience for some of my web-app functionality:
it is something like autoplaying slides with animations and on each such slide a piece of text is spoken. How would you deal with this without autoplay?

Comment 35 by tgarifu...@gmail.com, Nov 15

Just read the whole thread and now I think I don't understand exact scenario.
Could you guys spread more light on the upcoming behavior?

Consider csharrison's comment:
>> The change requires the user to interact with the page at least _once_ before subsequent speaking will succeed. After some interaction, multiple calls to speak can be called on that same page without failing.

According to the comment my case should not be affected?

I got deprecation warning 
"[Deprecation] speechSynthesis.speak() without user activation is deprecated and will be removed in M71...."
as soon as I open a view in my web-app and it autoplays some text with delay using setTimeout. Any subsequent calls to speechsynthesis.speak() from the subsequent setTimeout handlers do not incur the deprecation warning.
FYI, opening the view with the autoplay functionality doesn't trigger http request, just history.pushState().

Thanks, 
Timur Garifulin.

Comment 36 by csharrison@chromium.org, Nov 15

Hey Timur,
You can take a look at [1] for an explanation of the autoplay policies. The one difference for speech synthesis is that we don't support "muted" speech synthesis on the platform, since SSML can change the volume out from under us.

The deprecation messages should indicate when speech is disallowed, but I recommend you try out your app in Chrome Beta (M71) to see how it behaves with the policies enabled.

[1]: https://developers.google.com/web/updates/2017/09/autoplay-policy-changes

Comment 37 Deleted

Comment 38 by csharrison@chromium.org, Dec 3

Hey ikolosov,
You should just call speak() again once you have user activation if the first call failed.

Comment 40 Deleted

Comment 41 by baklan...@gmail.com, Dec 14

This is not a bug at all. But a wrong/bad mindset.

Why should user pay for your decisions? The sound is already "harder" than visuals, but you try to make it even harder with confirmations/additional visual requests.

Consider the possibility of UI speech helper more than some shhty site needs in ads. 

I belive that the blocker is better than the confirmer.
Firefox has some sound off icon in the tabs, for example.

Let the sites comptete for the user, browsers are already fine.
Bug submitter, huh.

Comment 42 by kbil4...@gmail.com, Dec 20

This is causing havoc on our virtual call center. As soon as the version is updated to Chrome v71 the agent can't hear a caller anymore but the caller can hear them. We are currently in the process of downgrading to Chrome v70 until there is a fix to auto allow the Speech Synthesis API for a site like you can with sound and microphone.

Most agents have switched to Firefox because of this.

Thanks for the test script David, works like a charm, please keep it online so we can test future releases.

Sign in to add a comment