New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 751823 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug



Sign in to add a comment

UKM is logging infeasibly large amounts of watch time from relatively obscure domains

Project Member Reported by hbengali@chromium.org, Aug 2 2017

Issue description

Based on early UKM data from M60 beta/stable, we're seeing a large amount of watch time (several fold what comes from YouTube) come from relatively obscure domains. Many of these domains tend to have live streams, so it is possible there is a bug which inaccurately reports watch time for live streams.

A snapshot of the data that shows the inflated watch times can be found at https://colab.corp.google.com/v2/notebook#fileId=0ByYlJ-nwE4btTjhPbG9OUlZ6Q0k

dalecurtis@, per our conversation, assigning to you to investigate
 
Cc: dah...@chromium.org
Can you limit this to just desktop? I'm curious if Android HLS playback might be messing up the timestamps.
Oh wait, there's no Android data right now so it can't be that.
We do have Android data from canary/dev/beta.

I updated the Colab to break down the data by platform. The largest watch times come from a combination of Windows & Android devices.
Digging a bit deeper, it looks like this is caused by a small number of records that have large and very similar watch time values. For example:

9223372036848308
9223372036838476
9223372036794691
9223372036747836
9223372036715817
9223372036536457
9223372036497179
9223372036011412
9223372036010056
9223372035870254
9223372035774514
9223372035417587
9223372035284623
9223372035143779
9223372035082924
9223372035045945
9223372035019550
9223372034962027
9223372034863480
9223372034429944
9223372034317832
9223372033204259
9223372031394126
9223372029385441

This suggests it's related to an overflow issue.

Interestingly, we report the same watch time values to UMA and UKM, but UMA is not affected (we know UMA is not affected since the sum of all UMA watch time for a given day is smaller than the watch time in a single record for these impacted UKM logs).

dalecurtis@ is going to create a patch to prevent negative values (which are causing the overflow) from being reported. In the mean time, I am updating the queries to exclude any UKM records with watch time exceeding 10 hours.
UMA will just report these as 10hours and the values are clamped before reported to the server.
Cc: wolenetz@chromium.org
Okay, I think I worked out what's happening. The problem isn't negative values, those would be elided by not being >= kMinimumElapsedWatchTime. Instead it looks like we're subtracting from kInfiniteDuration, which is int64_t::max() or 9223372036854775808. The only way this could be happening is if we return a current time value of kInfiniteDuration, which is possible if we trigger "ended" while the duration == kInfiniteDuration. This value will be propogated to |paused_time_| and vended as the current time value during |ended_|.

All the sites I looked at are MSE, so I think there is some case where we are ending while duration == kInfiniteDuration. I was able to manually force this locally to trigger the repro, but wasn't able to do so legitimately. +wolenetz in case there's some aspect of MSE that would allow this that I'm unaware of.

I've got some local fixes for this that I'll clean up tomorrow and submit.
Hmm. Two ideas here:

A) Maybe due to this line (I'm scratching my head wondering why this line is there in our code.. It's from vrk@ long ago):
https://cs.chromium.org/chromium/src/media/filters/chunk_demuxer.cc?rcl=65ef3a7c2973fc9aef9a338640b52750fc82c5a1&l=1319
--> If max_duration is really 0 (nothing is buffered currently), then endOfStream should set duration to 0, not just blindly return. I think.


B) Maybe due to a race:

1. marking endofstream
2. renderers read the endofstream buffer(s) from chunkdemuxerstream(s)
3a. rendering signalling wmpi of ended state is in-flight trampolining threads
racing
3b. JS app doing something that unmarks endOfStream (like SB.remove(..), SB.appendBuffer(..), SB.mode = ..., SB.timestampOffset = ...) + then updating MediaSource.duration to be +Infinity.

If JS wins this race, I think it might be possible this is where at least some of the kInfiniteDuration cases might be arising.

Speculating further, maybe also:
C) duration is infinite at the point where some error occurs in MSE playback --> might this not update duration before it's used for watchtime reporting?
For c#8-A: I think this can occur. I tried a local test:

  Nothing appended yet
           ms duration=NaN
           videoTag duration=NaN
           videoTag buffered range count:0
  Init segment appended
           ms duration=Infinity
           videoTag duration=NaN
           videoTag buffered range count:0
  After forcing MediaSource duration to be +Infinity
           ms duration=Infinity
           videoTag duration=NaN
           videoTag buffered range count:0
  After calling endOfStream()
           ms duration=Infinity
           videoTag duration=NaN
           videoTag buffered range count:0
  videoTag 'ended' event
           ms duration=Infinity
           videoTag duration=Infinity
           videoTag buffered range count:0

Thanks for looking Matt. Keep in mind the watch time reporter must have been running, so we have > 7 seconds of watch time, yet at some point currentTime() returns inf. 
c#8A/c#10 show known incorrect behavior (part of bug 604353).
I think removal of the https://cs.chromium.org/chromium/src/media/filters/chunk_demuxer.cc?rcl=65ef3a7c2973fc9aef9a338640b52750fc82c5a1&l=1319 line to fix this might be the right incremental way forward towards MSE compliance and fixing c#8A. I'll take a closer look today at this aspect.

Dale, your watchtime cleanups might need to still be done to get data corrected on release branches ASAP, independently of the MSE compliance fixes. (Especially since at least 8B/9C might exist in our code for a while.)
Yeah, hbengali@ is already filtering these values in UKM and they're ~0% of our UMA data since we cap to 10 hours. I'll land fixes to protect the watch time reporter today from this.
w.r.t. #11 (and 8a/8b/10): I think internally we force currentTime to be duration when on-ended condition occurs (not mse-specific).

So, if (in MSE) renderers get EOS buffers and transition to ended, but:
(8a) we haven't updated duration correctly if nothing is currently buffered (keep in mind that it could have played lots of buffered media previously, but then remove(all) could have been done by the app before EOS), or
(8b) the app is racing EOS + unmarking EOS and setting duration to +Infinity (for example, I think GPM/cast was seeing this at one point -- and I think this still needs fixing in Chrome since pipeline controller+renderers+demuxer need to coordinate on a "can we agree on EOS for realz if the renderers get there, or should we restart renderer on them getting EOS in case the app has unmarked EOS in the interim")

- then it is likely our "on-ended" logic that sets currentTime to duration might be where +Infinity gets into watchtime metrics.
For 8B, see bug 530698.
Project Member

Comment 16 by bugdroid1@chromium.org, Aug 5 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/051fdf6190e29a09beb760e23997a2cc047fb493

commit 051fdf6190e29a09beb760e23997a2cc047fb493
Author: Dale Curtis <dalecurtis@chromium.org>
Date: Sat Aug 05 00:21:13 2017

Fix a couple issues with watch time reporting. Add DCHECKs.

There were two issues with watch time reporting due to the way
WebMediaPlayerImpl calculates current time upon ended and how
it handles play during an ongoing seek.

1. Upon |ended_| we clamp to duration, which can correctly be
kInfiniteDuration == int64_t::max() for live streams. That
value may be used to compute WT = int64_t::max() - start_time
if a finalize or watch time update tick occurs after ended. I've
fixed this issue by removing clamping from the version of the
current time getter that watch time reporting uses.

2. If a clip has a positive start time we don't know this until
it's been properly demuxed, so if we allow the watch time reporter
to start while a seek is ongoing we can get the default seek time
of zero as our starting point; artifically inflating watch time
based on the start time. I've fixed this issue by not allowing
watch time reporting to start while a seek is ongoing (this was
already expected to be true; we cancel WTR before seeking).

Unfortunately given the complexity of interactions between WMPI and
WatchTimeReporter I wasn't able to create an extensive test, but did
add a basic infinite duration behavior test and added copious DCHECKs.

BUG= 751823 
TEST=new unittest. dchecks don't fail. test-positive-start-time.webm
has the correct elapsed duration after looping.

Change-Id: I9494d36d5e01a586b79823cc0da04c01f80dec70
Reviewed-on: https://chromium-review.googlesource.com/601539
Reviewed-by: Dan Sanders <sandersd@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#492195}
[modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/watch_time_reporter.cc
[modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/webmediaplayer_cast_android.cc
[modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/webmediaplayer_cast_android.h
[modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/webmediaplayer_impl.cc
[modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/webmediaplayer_impl.h
[modify] https://crrev.com/051fdf6190e29a09beb760e23997a2cc047fb493/media/blink/webmediaplayer_impl_unittest.cc

Status: Fixed (was: Assigned)
Don't think we need to merge this to M61 since you have workarounds for data processing Husain. Tentatively marking as fixed; let me know if you see any UKM records after 62.0.3178.0 which have massive values.

Sign in to add a comment