New issue
Advanced search Search tips

Issue 785940 link

Starred by 3 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 772407
issue 887659



Sign in to add a comment

AnimationClock::currentTime is often incorrectly ahead of rAF time

Project Member Reported by raphael....@intel.com, Nov 16 2017

Issue description

This test, added in https://github.com/w3c/web-platform-tests/commit/5f00b0e203676b2ff7fbfb8727c0b7e3728d4173#diff-3d02e7b1e09cf4b9e5060e79092c5942 upstream, is currently blocking WPT imports.

See, for example, https://luci-milo.appspot.com/buildbot/chromium.infra.cron/wpt-importer/5341.

Although it passes the Blink trybots, all the {linux,mac,win}_chromium_rel_ng CQ bots are failing:

FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 60.492999999041785 but got 76.60099999884551

(the actual numbers vary)

For now, I'll skip the test to unblock imports.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Nov 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e52e9775c8c8f61862763f77e7fc4e3a1fa938b5

commit e52e9775c8c8f61862763f77e7fc4e3a1fa938b5
Author: Raphael Kubo da Costa <raphael.kubo.da.costa@intel.com>
Date: Thu Nov 16 14:25:09 2017

Mark upcoming web animations WPT test as flaky.

external/wpt/web-animations/timing-model/timelines/document-timelines.html
is currently failing on the {linux,mac,win}_chromium_rel_ng bots even though
it passes the Blink trybots. Do this to unblock WPT imports.

TBR=ericwilligers@chromium.org

Bug: 785940
Change-Id: Ia777fcae000aa563cdd9182c21680c5c83913f57
No-Try: True
Reviewed-on: https://chromium-review.googlesource.com/774799
Reviewed-by: Raphael Kubo da Costa (rakuco) <raphael.kubo.da.costa@intel.com>
Commit-Queue: Raphael Kubo da Costa (rakuco) <raphael.kubo.da.costa@intel.com>
Cr-Commit-Position: refs/heads/master@{#517080}
[modify] https://crrev.com/e52e9775c8c8f61862763f77e7fc4e3a1fa938b5/third_party/WebKit/LayoutTests/TestExpectations

Description: Show this description
Hmm, so this test used to be 

    crbug.com/626703 [ Mac ] external/wpt/web-animations/interfaces/AnimationTimeline/document-timeline.html [ Failure ]

In any case, it is no longer failing only on Mac.
Did you have a chance to look into why the test was behaving differently on try bots and CQ?
Nope; I originally thought it was the DCHECK on-and-off issue, but since it's not a crash I couldn't figure out why only the CQ bots were failing :/

Comment 6 by nainar@chromium.org, Nov 16 2017

Labels: Test-Layout Update-Fortnightly

Comment 7 by foolip@chromium.org, Nov 21 2017

Cc: -foolip@chromium.org
Cc: flackr@chromium.org majidvp@chromium.org smcgruer@chromium.org
Cc: -smcgruer@chromium.org
Owner: smcgruer@chromium.org
Status: Assigned (was: Available)
From some poking around, both DocumentTimeline and rAF should be using the same clock:

PageAnimator::ServiceScriptedAnimations:
  => Calls Clock().UpdateTime(monotonic_animation_start_time);

DocumentTimeline::CurrentTimeInternal:
  => GetDocument()->GetAnimationClock().CurrentTime()
    => GetPage()->Animator().Clock()

So I suspect something is changing the clock between then. Looking now.

This fails on my local Linux workstation btw, so seems reproducible.
The simple answer is that AnimationClock is designed to increase in time whenever CurrentTime is called, and time passes between the rAF timestamp being recorded and DocumentTimeline being called:

[1:1:0216/122003.684589:INFO:PageAnimator.cpp(34)] Calling Clock().UpdateTime(2.15869e+06)
[1:1:0216/122003.684625:INFO:AnimationClock.cpp(52)] time_ now 2.15869e+06, task_for_which_time_was_calculated_: 101
[1:1:0216/122003.684688:INFO:PageAnimator.cpp(76)] Calling ServiceScriptedAnimations with 2.15869e+06
[1:1:0216/122003.684703:INFO:ScriptedAnimationController.cpp(147)] high_res_now_ms: 625.737, legacy_high_res_now_ms: 1.5188e+12
[1:1:0216/122003.691007:INFO:AnimationClock.cpp(60)] monotonically_increasing_time_, task_for_which_time_was_calculated_ != currently_running_task_
[1:1:0216/122003.691021:INFO:AnimationClock.cpp(62)] current_time is 2.15869e+06, time_ is 2.15869e+06
[1:1:0216/122003.691038:INFO:AnimationClock.cpp(64)] Advancing clock
[1:1:0216/122003.691051:INFO:AnimationClock.cpp(52)] time_ now 2.15869e+06, task_for_which_time_was_calculated_: 102
[1:1:0216/122003.691074:INFO:AnimationClock.cpp(57)] AnimationClock::CurrentTime()
[1:1:0216/122003.691065:INFO:DocumentTimeline.cpp(236)] DocumentTimeline::CurrentTimeInternal, ZeroTime() == 2.15869e+06, playback_rate_: 1, GetDocument()->GetAnimationClock().CurrentTime(): 2.15869e+06, result: 642.404



Next step; dig into the spec to see what the expected behavior for DocumentTimeline is, and fix it if necessary.
"designed to increase in time whenever CurrentTime is called"

This statement is somewhat inaccurate. The AnimationClock documentation says:

"Maintains a stationary clock time during script execution."

This is done via a 'task counter', incremented by EndOfTaskRunner::WillProcessTask. So it is possible that the designers of AnimationClock expected the rAF calculation and the contents of the rAF to run in the same 'task', whilst that is not what appears to happen.
Hrm, it also appears I may be wrong about the time update causing the discrepancy:

[1:1:0216/131818.917373:INFO:PageAnimator.cpp(34)] Calling Clock().UpdateTime(2.16218e+06)
[1:1:0216/131818.917398:INFO:AnimationClock.cpp(52)] time_ now 2.16218e+06, task_for_which_time_was_calculated_: 99
[1:1:0216/131818.917418:INFO:AnimationClock.cpp(57)] AnimationClock::CurrentTime()
[1:1:0216/131818.917441:INFO:AnimationClock.cpp(57)] AnimationClock::CurrentTime()
[1:1:0216/131818.917432:INFO:DocumentTimeline.cpp(236)] DocumentTimeline::CurrentTimeInternal, ZeroTime() == 2.16218e+06, playback_rate_: 1, GetDocument()->GetAnimationClock().CurrentTime(): 2.16218e+06, result: 702.525
[1:1:0216/131818.917472:INFO:PageAnimator.cpp(76)] Calling ServiceScriptedAnimations with 2.16218e+06
[1:1:0216/131818.917490:INFO:ScriptedAnimationController.cpp(147)] high_res_now_ms: 691.824, legacy_high_res_now_ms: 1.51881e+12
[1:1:0216/131818.917706:INFO:AnimationClock.cpp(57)] AnimationClock::CurrentTime()
[1:1:0216/131818.917747:INFO:AnimationClock.cpp(57)] AnimationClock::CurrentTime()
[1:1:0216/131818.917736:INFO:DocumentTimeline.cpp(236)] DocumentTimeline::CurrentTimeInternal, ZeroTime() == 2.16218e+06, playback_rate_: 1, GetDocument()->GetAnimationClock().CurrentTime(): 2.16218e+06, result: 702.525


In this case, DocumentTimeline::CurrentTime was 702.525 before and after ServiceScriptedAnimations ran. 
Oh now this is interesting:

$ git diff third_party/WebKit/Source/core/page/PageAnimator.cpp
diff --git a/third_party/WebKit/Source/core/page/PageAnimator.cpp b/third_party/WebKit/Source/core/page/PageAnimator.cpp
index 43cffcaaeac8..563a09fc96c9 100644
--- a/third_party/WebKit/Source/core/page/PageAnimator.cpp
+++ b/third_party/WebKit/Source/core/page/PageAnimator.cpp
@@ -31,7 +31,9 @@ void PageAnimator::Trace(blink::Visitor* visitor) {
 void PageAnimator::ServiceScriptedAnimations(
     double monotonic_animation_start_time) {
   AutoReset<bool> servicing(&servicing_animations_, true);
   Clock().UpdateTime(monotonic_animation_start_time);
+  monotonic_animation_start_time = Clock().CurrentTime();
 
   HeapVector<Member<Document>, 32> documents;
   for (Frame* frame = page_->MainFrame(); frame;

With this change, the tests pass[0]. That is, after Clock().UpdateTime(monotonic_animation_start_time), we have:

Clock().CurrentTime() != monotonic_animation_start_time




[0] Well, nearly. There's a very small diffence in value that I suspect is a precision bug somewhere.
The time passed to PageAnimator::ServiceScriptedAnimations is in the 'past':

$ ./out/Release/content_shell --run-layout-test external/wpt/web-animations/timing-model/timelines/document-timelines.html
#READY

DevTools listening on ws://127.0.0.1:42259/devtools/browser/9ab5989d-dba2-411d-ae71-f24920b20440
...
[1:1:0216/134143.411763:INFO:AnimationClock.cpp(50)] Changed time from 0.000000 to 2163586.250000  // This update is caused by a call to GetCurrentTime

[1:1:0216/134143.426887:INFO:AnimationClock.cpp(50)] Changed time from 2163586.250000 to 2163586.266667  // This update is caused by a call to GetCurrentTime

[1:1:0216/134143.460957:INFO:PageAnimator.cpp(34)] monotonic_animation_start_time: 2163586.262056  // This is behind current AnimationClock!

[1:1:0216/134143.470656:INFO:PageAnimator.cpp(34)] monotonic_animation_start_time: 2163586.303269  // This is now ahead of AnimationClock

[1:1:0216/134143.470727:INFO:AnimationClock.cpp(50)] Changed time from 2163586.266667 to 2163586.303269  // This update is caused by PageAnimator

[1:1:0216/134143.490590:INFO:AnimationClock.cpp(50)] Changed time from 2163586.303269 to 2163586.336602  // This update is caused by a call to GetCurrentTime

[1:1:0216/134143.494700:INFO:PageAnimator.cpp(34)] monotonic_animation_start_time: 2163586.327320  // This is behind current AnimationClock!


So a few questions to be answered here:

  - Where is PageAnimator getting its monotonic_animation_start_time from?
  - Who is updating the clock in the meantime (via a call to GetCurrentTime)
>  - Where is PageAnimator getting its monotonic_animation_start_time from?

WebViewImpl::BeginFrame(double last_frame_time_monotonic), as far as I can tell. Where last_frame_time_monotonic is ultimately (I think) base::TimeTicks::Now() at the point where the compositor sends a BeginMainFrame request.
Cc: smcgruer@chromium.org
 Issue 800897  has been merged into this issue.
From  issue 800897 :

"AnimationClock should source its time the same way as RAF (during raf callback tasks). The approximation is only used for tasks outside frame callbacks.

https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/page/PageAnimator.cpp?sq=package:chromium&l=34"

"So if the delta is non-zero during requestAnimationFrame, then we must have incorrectly called AnimationClock::NotifyTaskStart (i.e. putting us in the outside frame callbacks approximation mode)."
Well, this is fun. Spot the difference!

$ while true; do DISPLAY=:0 ./out/Release/content_shell --run-layout-test external/wpt/web-animations/timing-model/timelines/document-timelines.html 2>&1 | grep FAIL; done
FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 990.9879999999999 but got 1000.4723332822323
FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 1011.59 but got 1019.8923335410655
FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 1057.684 but got 1071.201333310455
FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 1038.045 but got 1050.1523334532976

$ while true; do DISPLAY=:0 ./out/Default/content_shell --run-layout-test external/wpt/web-animations/timing-model/timelines/document-timelines.h
tml 2>&1 | grep FAIL; done
FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 2392.337 but got 2392.3369999974966
FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 2478.227 but got 2478.516333270818
FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 2526.4280000000003 but got 2526.4280000701547
FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 2356.47 but got 2356.469999998808

So with is_debug=true, the difference is just rounding errors (there is a bug there but it's small), but when is_debug=false we get the clock forwarding time.

I'm trying to see if I can find some IS_DCHECK_ON macro that may be involved, but there are... a lot of them. Worst case will be if it's a race condition :(.

Comment 19 Deleted

Little progress so far. Setting dcheck_always_on = true on my release build did not stop the large-difference failures, so it is likely that the difference is simply a symptom of slower/faster (uh-oh) rather than anything enabled/disabled by DCHECK.

Unfortunately I've had no luck capturing a real reproduction in my debug build (with or without rr running). I've had a few false starts where I thought I had captured it, but sadly not :(.
Cc: dstockwell@chromium.org
Yeah, so I don't get how this is ever meant to work :/.

Taking an example failure case (where P.A. falls behind). Leading digits stripped off for readability (irrelevant to the math):

Last good rAF time in P.A.: 0.100602, this caused a successful UpdateTime().

At some point after that rAF, user code called CurrentTime() (so new task).
  - current_time was 0.136429
  - 0.136429 - 0.100602 = 0.035827    # frame_shift = (current_time - last_) mod (1/60)
  - 0.035827 modulo (1/60) = 0.00249   # frame_shift = (current_time - last_) mod (1/60)
  - 0.136429 + ((1/60) - 0.00249) = 0.15060564   # new_time = current_time + ((1/60) - frame_shift)

The a new rAF triggers, with a time of .1417666. Our model immediately breaks down, because this is < 0.15060564.

It seems like the assumption is that frames would always be evenly spaced at 1/60ths, even between disjoint rAFs. But that isn't (afaik?) true even in the case where main isn't swamped - at the very least in this case the distance between consecutive rAFs was:

.1418666 - .100602 = 0.0412646 ~= 2.5 16ms 'frames'.

Under a model where consecutive frames may not be spaced in some multiple of 16ms, this math just... doesn't work to me? At least not if we wish to guarantee that whenever rAF fires, its value is ahead of AnimationClock.

Do I massively misunderstand how AnimationClock is meant to work? dstockwell@/flackr@?
Labels: Hotlist-Interop
Blocking: 772407
Ping for comment from dstockwell@, flackr@ - do either of you have thoughts on comment 21?
Summary: AnimationClock::currentTime is often incorrectly ahead of rAF time (was: animations: external/wpt/web-animations/timing-model/timelines/document-timelines.html is failing)
(Updating the title to make it clearer what issue this bug is tracking. And subtly re-pinging flackr@ and dstockwell@ for comment >_>.)
Cc: -dstockwell@chromium.org dstockwell@google.com
I think #21 is basically correct. The discrepancy where AnimationClock can get ahead of the frame time is unfortunate, but expected. AnimationClock is intended to be monotonic and as you've discovered is advanced by either fixed frame times or approximation. We cannot accurately know what frame time is going to be supplied in the future, clamping to 60hz is the approximation that we went with at the time (based on most users having 60hz displays, and idealised frame times aligning with that).

We certainly should align the time passed to RAF callbacks and timeline.currentTime. I would suggest you do just that -- have the time passed to RAF callbacks be the output of the animation clock.

I think fixing/reducing the discrepancy is lower priority issue. This should be infrequent as it should only be triggered when author code accesses currentTime (or triggers style recalc, etc.) outside of RAF callbacks. Doing that should be discouraged in general. What we need to provide is a reasonable approximation of currentTime so we have a fair amount of flexibility. The main use case is authors inspecting animation state or outputs when a RAF loop is not possible (eg. the page is in the background). So for example introducing a buffer where the approximation is always some XXms before monotonic time might be OK.
Thanks for the reply dstockwell@, I appreciate your expert advice here.

I gave aligning the clock a go in https://chromium-review.googlesource.com/c/chromium/src/+/1053415, which sadly lead to a lot of test failures. I didn't get a chance to debug them yet (some are probably expected).

It also concerns me since we would be taking rAF time out of sync with the other steps it should be in sync with (e.g. resize, scroll, etc, see https://html.spec.whatwg.org/multipage/webappapis.html#update-the-rendering). That said, I'm not sure if it is actually in sync with them now.
After dealing with a similar issue in  issue 888260 , I'd like to consider ways of making sure the animation clock doesn't get ahead of begin frame timestamps. I have a few thoughts:

From BeginMainFrame to response, we should hold the time as the begin frame time.  Issue 888260  seems to have produced a case where that's not true.

Outside of that, we could update the clock only once the complete current set of tasks had been processed (i.e. each time we set task_for_which_time_was_calculated_ we instead advance it by the current size of the task queue). This means that if a BeginMainFrame has already been posted we should see it before we artificially advance the clock again.

When artificially advancing the clock, advance it no further than the time the current time rounded down with the assumption that if a BeginMainFrame is posted soon, it will have a later time. Optionally, we could even save the time from when we set the current end of the task queue so that when we artifically advance it we are advancing to a time that we believe to be for sure before a rAF has been posted.
Issue 896052 has been merged into this issue.
Cc: yigu@chromium.org
Blocking: 887659
Owner: ----
Status: Available (was: Assigned)
Unassigning myself as I am not currently looking at this.

Sign in to add a comment