UKM is logging infeasibly large amounts of watch time from relatively obscure domains |
|||
Issue descriptionBased on early UKM data from M60 beta/stable, we're seeing a large amount of watch time (several fold what comes from YouTube) come from relatively obscure domains. Many of these domains tend to have live streams, so it is possible there is a bug which inaccurately reports watch time for live streams. A snapshot of the data that shows the inflated watch times can be found at https://colab.corp.google.com/v2/notebook#fileId=0ByYlJ-nwE4btTjhPbG9OUlZ6Q0k dalecurtis@, per our conversation, assigning to you to investigate
,
Aug 3 2017
Can you limit this to just desktop? I'm curious if Android HLS playback might be messing up the timestamps.
,
Aug 3 2017
Oh wait, there's no Android data right now so it can't be that.
,
Aug 3 2017
We do have Android data from canary/dev/beta. I updated the Colab to break down the data by platform. The largest watch times come from a combination of Windows & Android devices.
,
Aug 3 2017
Digging a bit deeper, it looks like this is caused by a small number of records that have large and very similar watch time values. For example: 9223372036848308 9223372036838476 9223372036794691 9223372036747836 9223372036715817 9223372036536457 9223372036497179 9223372036011412 9223372036010056 9223372035870254 9223372035774514 9223372035417587 9223372035284623 9223372035143779 9223372035082924 9223372035045945 9223372035019550 9223372034962027 9223372034863480 9223372034429944 9223372034317832 9223372033204259 9223372031394126 9223372029385441 This suggests it's related to an overflow issue. Interestingly, we report the same watch time values to UMA and UKM, but UMA is not affected (we know UMA is not affected since the sum of all UMA watch time for a given day is smaller than the watch time in a single record for these impacted UKM logs). dalecurtis@ is going to create a patch to prevent negative values (which are causing the overflow) from being reported. In the mean time, I am updating the queries to exclude any UKM records with watch time exceeding 10 hours.
,
Aug 3 2017
UMA will just report these as 10hours and the values are clamped before reported to the server.
,
Aug 4 2017
Okay, I think I worked out what's happening. The problem isn't negative values, those would be elided by not being >= kMinimumElapsedWatchTime. Instead it looks like we're subtracting from kInfiniteDuration, which is int64_t::max() or 9223372036854775808. The only way this could be happening is if we return a current time value of kInfiniteDuration, which is possible if we trigger "ended" while the duration == kInfiniteDuration. This value will be propogated to |paused_time_| and vended as the current time value during |ended_|. All the sites I looked at are MSE, so I think there is some case where we are ending while duration == kInfiniteDuration. I was able to manually force this locally to trigger the repro, but wasn't able to do so legitimately. +wolenetz in case there's some aspect of MSE that would allow this that I'm unaware of. I've got some local fixes for this that I'll clean up tomorrow and submit.
,
Aug 4 2017
Hmm. Two ideas here: A) Maybe due to this line (I'm scratching my head wondering why this line is there in our code.. It's from vrk@ long ago): https://cs.chromium.org/chromium/src/media/filters/chunk_demuxer.cc?rcl=65ef3a7c2973fc9aef9a338640b52750fc82c5a1&l=1319 --> If max_duration is really 0 (nothing is buffered currently), then endOfStream should set duration to 0, not just blindly return. I think. B) Maybe due to a race: 1. marking endofstream 2. renderers read the endofstream buffer(s) from chunkdemuxerstream(s) 3a. rendering signalling wmpi of ended state is in-flight trampolining threads racing 3b. JS app doing something that unmarks endOfStream (like SB.remove(..), SB.appendBuffer(..), SB.mode = ..., SB.timestampOffset = ...) + then updating MediaSource.duration to be +Infinity. If JS wins this race, I think it might be possible this is where at least some of the kInfiniteDuration cases might be arising.
,
Aug 4 2017
Speculating further, maybe also: C) duration is infinite at the point where some error occurs in MSE playback --> might this not update duration before it's used for watchtime reporting?
,
Aug 4 2017
For c#8-A: I think this can occur. I tried a local test:
Nothing appended yet
ms duration=NaN
videoTag duration=NaN
videoTag buffered range count:0
Init segment appended
ms duration=Infinity
videoTag duration=NaN
videoTag buffered range count:0
After forcing MediaSource duration to be +Infinity
ms duration=Infinity
videoTag duration=NaN
videoTag buffered range count:0
After calling endOfStream()
ms duration=Infinity
videoTag duration=NaN
videoTag buffered range count:0
videoTag 'ended' event
ms duration=Infinity
videoTag duration=Infinity
videoTag buffered range count:0
,
Aug 4 2017
Thanks for looking Matt. Keep in mind the watch time reporter must have been running, so we have > 7 seconds of watch time, yet at some point currentTime() returns inf.
,
Aug 4 2017
c#8A/c#10 show known incorrect behavior (part of bug 604353). I think removal of the https://cs.chromium.org/chromium/src/media/filters/chunk_demuxer.cc?rcl=65ef3a7c2973fc9aef9a338640b52750fc82c5a1&l=1319 line to fix this might be the right incremental way forward towards MSE compliance and fixing c#8A. I'll take a closer look today at this aspect. Dale, your watchtime cleanups might need to still be done to get data corrected on release branches ASAP, independently of the MSE compliance fixes. (Especially since at least 8B/9C might exist in our code for a while.)
,
Aug 4 2017
Yeah, hbengali@ is already filtering these values in UKM and they're ~0% of our UMA data since we cap to 10 hours. I'll land fixes to protect the watch time reporter today from this.
,
Aug 4 2017
w.r.t. #11 (and 8a/8b/10): I think internally we force currentTime to be duration when on-ended condition occurs (not mse-specific). So, if (in MSE) renderers get EOS buffers and transition to ended, but: (8a) we haven't updated duration correctly if nothing is currently buffered (keep in mind that it could have played lots of buffered media previously, but then remove(all) could have been done by the app before EOS), or (8b) the app is racing EOS + unmarking EOS and setting duration to +Infinity (for example, I think GPM/cast was seeing this at one point -- and I think this still needs fixing in Chrome since pipeline controller+renderers+demuxer need to coordinate on a "can we agree on EOS for realz if the renderers get there, or should we restart renderer on them getting EOS in case the app has unmarked EOS in the interim") - then it is likely our "on-ended" logic that sets currentTime to duration might be where +Infinity gets into watchtime metrics.
,
Aug 4 2017
For 8B, see bug 530698.
,
Aug 5 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/051fdf6190e29a09beb760e23997a2cc047fb493 commit 051fdf6190e29a09beb760e23997a2cc047fb493 Author: Dale Curtis <dalecurtis@chromium.org> Date: Sat Aug 05 00:21:13 2017 Fix a couple issues with watch time reporting. Add DCHECKs. There were two issues with watch time reporting due to the way WebMediaPlayerImpl calculates current time upon ended and how it handles play during an ongoing seek. 1. Upon |ended_| we clamp to duration, which can correctly be kInfiniteDuration == int64_t::max() for live streams. That value may be used to compute WT = int64_t::max() - start_time if a finalize or watch time update tick occurs after ended. I've fixed this issue by removing clamping from the version of the current time getter that watch time reporting uses. 2. If a clip has a positive start time we don't know this until it's been properly demuxed, so if we allow the watch time reporter to start while a seek is ongoing we can get the default seek time of zero as our starting point; artifically inflating watch time based on the start time. I've fixed this issue by not allowing watch time reporting to start while a seek is ongoing (this was already expected to be true; we cancel WTR before seeking). Unfortunately given the complexity of interactions between WMPI and WatchTimeReporter I wasn't able to create an extensive test, but did add a basic infinite duration behavior test and added copious DCHECKs. BUG= 751823 TEST=new unittest. dchecks don't fail. test-positive-start-time.webm has the correct elapsed duration after looping. Change-Id: I9494d36d5e01a586b79823cc0da04c01f80dec70 Reviewed-on: https://chromium-review.googlesource.com/601539 Reviewed-by: Dan Sanders <sandersd@chromium.org> Commit-Queue: Dale Curtis <dalecurtis@chromium.org> Cr-Commit-Position: refs/heads/master@{#492195} [modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/watch_time_reporter.cc [modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/webmediaplayer_cast_android.cc [modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/webmediaplayer_cast_android.h [modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/webmediaplayer_impl.cc [modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/webmediaplayer_impl.h [modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/webmediaplayer_impl_unittest.cc
,
Aug 9 2017
Don't think we need to merge this to M61 since you have workarounds for data processing Husain. Tentatively marking as fixed; let me know if you see any UKM records after 62.0.3178.0 which have massive values. |
|||
►
Sign in to add a comment |
|||
Comment 1 by hbengali@chromium.org
, Aug 2 2017