AnimationClock::currentTime is often incorrectly ahead of rAF time |
||||||||||||||
Issue descriptionThis test, added in https://github.com/w3c/web-platform-tests/commit/5f00b0e203676b2ff7fbfb8727c0b7e3728d4173#diff-3d02e7b1e09cf4b9e5060e79092c5942 upstream, is currently blocking WPT imports. See, for example, https://luci-milo.appspot.com/buildbot/chromium.infra.cron/wpt-importer/5341. Although it passes the Blink trybots, all the {linux,mac,win}_chromium_rel_ng CQ bots are failing: FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 60.492999999041785 but got 76.60099999884551 (the actual numbers vary) For now, I'll skip the test to unblock imports.
,
Nov 16 2017
,
Nov 16 2017
Hmm, so this test used to be
crbug.com/626703 [ Mac ] external/wpt/web-animations/interfaces/AnimationTimeline/document-timeline.html [ Failure ]
In any case, it is no longer failing only on Mac.
,
Nov 16 2017
Did you have a chance to look into why the test was behaving differently on try bots and CQ?
,
Nov 16 2017
Nope; I originally thought it was the DCHECK on-and-off issue, but since it's not a crash I couldn't figure out why only the CQ bots were failing :/
,
Nov 16 2017
,
Nov 21 2017
,
Feb 16 2018
,
Feb 16 2018
From some poking around, both DocumentTimeline and rAF should be using the same clock:
PageAnimator::ServiceScriptedAnimations:
=> Calls Clock().UpdateTime(monotonic_animation_start_time);
DocumentTimeline::CurrentTimeInternal:
=> GetDocument()->GetAnimationClock().CurrentTime()
=> GetPage()->Animator().Clock()
So I suspect something is changing the clock between then. Looking now.
This fails on my local Linux workstation btw, so seems reproducible.
,
Feb 16 2018
The simple answer is that AnimationClock is designed to increase in time whenever CurrentTime is called, and time passes between the rAF timestamp being recorded and DocumentTimeline being called: [1:1:0216/122003.684589:INFO:PageAnimator.cpp(34)] Calling Clock().UpdateTime(2.15869e+06) [1:1:0216/122003.684625:INFO:AnimationClock.cpp(52)] time_ now 2.15869e+06, task_for_which_time_was_calculated_: 101 [1:1:0216/122003.684688:INFO:PageAnimator.cpp(76)] Calling ServiceScriptedAnimations with 2.15869e+06 [1:1:0216/122003.684703:INFO:ScriptedAnimationController.cpp(147)] high_res_now_ms: 625.737, legacy_high_res_now_ms: 1.5188e+12 [1:1:0216/122003.691007:INFO:AnimationClock.cpp(60)] monotonically_increasing_time_, task_for_which_time_was_calculated_ != currently_running_task_ [1:1:0216/122003.691021:INFO:AnimationClock.cpp(62)] current_time is 2.15869e+06, time_ is 2.15869e+06 [1:1:0216/122003.691038:INFO:AnimationClock.cpp(64)] Advancing clock [1:1:0216/122003.691051:INFO:AnimationClock.cpp(52)] time_ now 2.15869e+06, task_for_which_time_was_calculated_: 102 [1:1:0216/122003.691074:INFO:AnimationClock.cpp(57)] AnimationClock::CurrentTime() [1:1:0216/122003.691065:INFO:DocumentTimeline.cpp(236)] DocumentTimeline::CurrentTimeInternal, ZeroTime() == 2.15869e+06, playback_rate_: 1, GetDocument()->GetAnimationClock().CurrentTime(): 2.15869e+06, result: 642.404 Next step; dig into the spec to see what the expected behavior for DocumentTimeline is, and fix it if necessary.
,
Feb 16 2018
"designed to increase in time whenever CurrentTime is called" This statement is somewhat inaccurate. The AnimationClock documentation says: "Maintains a stationary clock time during script execution." This is done via a 'task counter', incremented by EndOfTaskRunner::WillProcessTask. So it is possible that the designers of AnimationClock expected the rAF calculation and the contents of the rAF to run in the same 'task', whilst that is not what appears to happen.
,
Feb 16 2018
Hrm, it also appears I may be wrong about the time update causing the discrepancy: [1:1:0216/131818.917373:INFO:PageAnimator.cpp(34)] Calling Clock().UpdateTime(2.16218e+06) [1:1:0216/131818.917398:INFO:AnimationClock.cpp(52)] time_ now 2.16218e+06, task_for_which_time_was_calculated_: 99 [1:1:0216/131818.917418:INFO:AnimationClock.cpp(57)] AnimationClock::CurrentTime() [1:1:0216/131818.917441:INFO:AnimationClock.cpp(57)] AnimationClock::CurrentTime() [1:1:0216/131818.917432:INFO:DocumentTimeline.cpp(236)] DocumentTimeline::CurrentTimeInternal, ZeroTime() == 2.16218e+06, playback_rate_: 1, GetDocument()->GetAnimationClock().CurrentTime(): 2.16218e+06, result: 702.525 [1:1:0216/131818.917472:INFO:PageAnimator.cpp(76)] Calling ServiceScriptedAnimations with 2.16218e+06 [1:1:0216/131818.917490:INFO:ScriptedAnimationController.cpp(147)] high_res_now_ms: 691.824, legacy_high_res_now_ms: 1.51881e+12 [1:1:0216/131818.917706:INFO:AnimationClock.cpp(57)] AnimationClock::CurrentTime() [1:1:0216/131818.917747:INFO:AnimationClock.cpp(57)] AnimationClock::CurrentTime() [1:1:0216/131818.917736:INFO:DocumentTimeline.cpp(236)] DocumentTimeline::CurrentTimeInternal, ZeroTime() == 2.16218e+06, playback_rate_: 1, GetDocument()->GetAnimationClock().CurrentTime(): 2.16218e+06, result: 702.525 In this case, DocumentTimeline::CurrentTime was 702.525 before and after ServiceScriptedAnimations ran.
,
Feb 16 2018
Oh now this is interesting:
$ git diff third_party/WebKit/Source/core/page/PageAnimator.cpp
diff --git a/third_party/WebKit/Source/core/page/PageAnimator.cpp b/third_party/WebKit/Source/core/page/PageAnimator.cpp
index 43cffcaaeac8..563a09fc96c9 100644
--- a/third_party/WebKit/Source/core/page/PageAnimator.cpp
+++ b/third_party/WebKit/Source/core/page/PageAnimator.cpp
@@ -31,7 +31,9 @@ void PageAnimator::Trace(blink::Visitor* visitor) {
void PageAnimator::ServiceScriptedAnimations(
double monotonic_animation_start_time) {
AutoReset<bool> servicing(&servicing_animations_, true);
Clock().UpdateTime(monotonic_animation_start_time);
+ monotonic_animation_start_time = Clock().CurrentTime();
HeapVector<Member<Document>, 32> documents;
for (Frame* frame = page_->MainFrame(); frame;
With this change, the tests pass[0]. That is, after Clock().UpdateTime(monotonic_animation_start_time), we have:
Clock().CurrentTime() != monotonic_animation_start_time
[0] Well, nearly. There's a very small diffence in value that I suspect is a precision bug somewhere.
,
Feb 16 2018
The time passed to PageAnimator::ServiceScriptedAnimations is in the 'past': $ ./out/Release/content_shell --run-layout-test external/wpt/web-animations/timing-model/timelines/document-timelines.html #READY DevTools listening on ws://127.0.0.1:42259/devtools/browser/9ab5989d-dba2-411d-ae71-f24920b20440 ... [1:1:0216/134143.411763:INFO:AnimationClock.cpp(50)] Changed time from 0.000000 to 2163586.250000 // This update is caused by a call to GetCurrentTime [1:1:0216/134143.426887:INFO:AnimationClock.cpp(50)] Changed time from 2163586.250000 to 2163586.266667 // This update is caused by a call to GetCurrentTime [1:1:0216/134143.460957:INFO:PageAnimator.cpp(34)] monotonic_animation_start_time: 2163586.262056 // This is behind current AnimationClock! [1:1:0216/134143.470656:INFO:PageAnimator.cpp(34)] monotonic_animation_start_time: 2163586.303269 // This is now ahead of AnimationClock [1:1:0216/134143.470727:INFO:AnimationClock.cpp(50)] Changed time from 2163586.266667 to 2163586.303269 // This update is caused by PageAnimator [1:1:0216/134143.490590:INFO:AnimationClock.cpp(50)] Changed time from 2163586.303269 to 2163586.336602 // This update is caused by a call to GetCurrentTime [1:1:0216/134143.494700:INFO:PageAnimator.cpp(34)] monotonic_animation_start_time: 2163586.327320 // This is behind current AnimationClock! So a few questions to be answered here: - Where is PageAnimator getting its monotonic_animation_start_time from? - Who is updating the clock in the meantime (via a call to GetCurrentTime)
,
Feb 16 2018
> - Where is PageAnimator getting its monotonic_animation_start_time from? WebViewImpl::BeginFrame(double last_frame_time_monotonic), as far as I can tell. Where last_frame_time_monotonic is ultimately (I think) base::TimeTicks::Now() at the point where the compositor sends a BeginMainFrame request.
,
Mar 2 2018
,
Mar 2 2018
From issue 800897 : "AnimationClock should source its time the same way as RAF (during raf callback tasks). The approximation is only used for tasks outside frame callbacks. https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/page/PageAnimator.cpp?sq=package:chromium&l=34" "So if the delta is non-zero during requestAnimationFrame, then we must have incorrectly called AnimationClock::NotifyTaskStart (i.e. putting us in the outside frame callbacks approximation mode)."
,
Mar 2 2018
Well, this is fun. Spot the difference! $ while true; do DISPLAY=:0 ./out/Release/content_shell --run-layout-test external/wpt/web-animations/timing-model/timelines/document-timelines.html 2>&1 | grep FAIL; done FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 990.9879999999999 but got 1000.4723332822323 FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 1011.59 but got 1019.8923335410655 FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 1057.684 but got 1071.201333310455 FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 1038.045 but got 1050.1523334532976 $ while true; do DISPLAY=:0 ./out/Default/content_shell --run-layout-test external/wpt/web-animations/timing-model/timelines/document-timelines.h tml 2>&1 | grep FAIL; done FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 2392.337 but got 2392.3369999974966 FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 2478.227 but got 2478.516333270818 FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 2526.4280000000003 but got 2526.4280000701547 FAIL Document timelines report current time relative to navigationStart assert_equals: The current time matches requestAnimationFrame time expected 2356.47 but got 2356.469999998808 So with is_debug=true, the difference is just rounding errors (there is a bug there but it's small), but when is_debug=false we get the clock forwarding time. I'm trying to see if I can find some IS_DCHECK_ON macro that may be involved, but there are... a lot of them. Worst case will be if it's a race condition :(.
,
Mar 2 2018
Little progress so far. Setting dcheck_always_on = true on my release build did not stop the large-difference failures, so it is likely that the difference is simply a symptom of slower/faster (uh-oh) rather than anything enabled/disabled by DCHECK. Unfortunately I've had no luck capturing a real reproduction in my debug build (with or without rr running). I've had a few false starts where I thought I had captured it, but sadly not :(.
,
Mar 2 2018
Yeah, so I don't get how this is ever meant to work :/. Taking an example failure case (where P.A. falls behind). Leading digits stripped off for readability (irrelevant to the math): Last good rAF time in P.A.: 0.100602, this caused a successful UpdateTime(). At some point after that rAF, user code called CurrentTime() (so new task). - current_time was 0.136429 - 0.136429 - 0.100602 = 0.035827 # frame_shift = (current_time - last_) mod (1/60) - 0.035827 modulo (1/60) = 0.00249 # frame_shift = (current_time - last_) mod (1/60) - 0.136429 + ((1/60) - 0.00249) = 0.15060564 # new_time = current_time + ((1/60) - frame_shift) The a new rAF triggers, with a time of .1417666. Our model immediately breaks down, because this is < 0.15060564. It seems like the assumption is that frames would always be evenly spaced at 1/60ths, even between disjoint rAFs. But that isn't (afaik?) true even in the case where main isn't swamped - at the very least in this case the distance between consecutive rAFs was: .1418666 - .100602 = 0.0412646 ~= 2.5 16ms 'frames'. Under a model where consecutive frames may not be spaced in some multiple of 16ms, this math just... doesn't work to me? At least not if we wish to guarantee that whenever rAF fires, its value is ahead of AnimationClock. Do I massively misunderstand how AnimationClock is meant to work? dstockwell@/flackr@?
,
Mar 4 2018
,
Mar 31 2018
,
Apr 26 2018
Ping for comment from dstockwell@, flackr@ - do either of you have thoughts on comment 21?
,
May 9 2018
(Updating the title to make it clearer what issue this bug is tracking. And subtly re-pinging flackr@ and dstockwell@ for comment >_>.)
,
May 10 2018
I think #21 is basically correct. The discrepancy where AnimationClock can get ahead of the frame time is unfortunate, but expected. AnimationClock is intended to be monotonic and as you've discovered is advanced by either fixed frame times or approximation. We cannot accurately know what frame time is going to be supplied in the future, clamping to 60hz is the approximation that we went with at the time (based on most users having 60hz displays, and idealised frame times aligning with that). We certainly should align the time passed to RAF callbacks and timeline.currentTime. I would suggest you do just that -- have the time passed to RAF callbacks be the output of the animation clock. I think fixing/reducing the discrepancy is lower priority issue. This should be infrequent as it should only be triggered when author code accesses currentTime (or triggers style recalc, etc.) outside of RAF callbacks. Doing that should be discouraged in general. What we need to provide is a reasonable approximation of currentTime so we have a fair amount of flexibility. The main use case is authors inspecting animation state or outputs when a RAF loop is not possible (eg. the page is in the background). So for example introducing a buffer where the approximation is always some XXms before monotonic time might be OK.
,
May 29 2018
Thanks for the reply dstockwell@, I appreciate your expert advice here. I gave aligning the clock a go in https://chromium-review.googlesource.com/c/chromium/src/+/1053415, which sadly lead to a lot of test failures. I didn't get a chance to debug them yet (some are probably expected). It also concerns me since we would be taking rAF time out of sync with the other steps it should be in sync with (e.g. resize, scroll, etc, see https://html.spec.whatwg.org/multipage/webappapis.html#update-the-rendering). That said, I'm not sure if it is actually in sync with them now.
,
Sep 28
After dealing with a similar issue in issue 888260 , I'd like to consider ways of making sure the animation clock doesn't get ahead of begin frame timestamps. I have a few thoughts: From BeginMainFrame to response, we should hold the time as the begin frame time. Issue 888260 seems to have produced a case where that's not true. Outside of that, we could update the clock only once the complete current set of tasks had been processed (i.e. each time we set task_for_which_time_was_calculated_ we instead advance it by the current size of the task queue). This means that if a BeginMainFrame has already been posted we should see it before we artificially advance the clock again. When artificially advancing the clock, advance it no further than the time the current time rounded down with the assumption that if a BeginMainFrame is posted soon, it will have a later time. Optionally, we could even save the time from when we set the current end of the task queue so that when we artifically advance it we are advancing to a time that we believe to be for sure before a rAF has been posted.
,
Oct 23
Issue 896052 has been merged into this issue.
,
Nov 22
,
Jan 8
,
Jan 8
Unassigning myself as I am not currently looking at this. |
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by bugdroid1@chromium.org
, Nov 16 2017