Issue metadata
Sign in to add a comment
|
Several tests failing on chromium.perf Win High-DPI Perf due to crash in SyntheticGestureTargetBase::DispatchInputEventToPlatform |
||||||||||||||||||||||||
Issue descriptionRevision range first seen: Unknown Link to failing step log: https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Win%20Power%20High-DPI%20Perf/builds/21/steps/scheduler.tough_scheduling_cases/logs/stdio Link to bot: https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Win%20Power%20High-DPI%20Perf """ Exception raised when cleaning story run: Traceback (most recent call last): _RunStoryAndProcessErrorIfNeeded at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\story_runner.py:109 test.DidRunPage(state.platform) DidRunPage at C:\b\c\b\Win_Power_High_DPI_Perf\src\tools\perf\measurements\smoothness.py:66 self._tbm.DidRunStory(platform) DidRunStory at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\web_perf\timeline_based_measurement.py:303 platform.tracing_controller.StopTracing() StopTracing at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\core\tracing_controller.py:47 return self._tracing_controller_backend.StopTracing() StopTracing at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\platform\tracing_controller_backend.py:108 self._IssueClockSyncMarker() _IssueClockSyncMarker at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\platform\tracing_controller_backend.py:199 self._RecordIssuerClockSyncMarker) RecordClockSyncMarker at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\chrome_tracing_agent.py:184 sync_id, record_controller_clock_sync_marker_callback) _RecordClockSyncMarkerAsyncEvent at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\chrome_tracing_agent.py:167 raise ChromeClockSyncError() ChromeClockSyncError Locals: has_clock_synced : False record_controller_clock_sync_marker_callback : <bound method TracingControllerBackend._RecordIssuerClockSyncMarker of <telemetry.internal.platform.tracing_controller_backend.TracingControllerBackend object at 0x03401E10>> sync_id : 'ffa1a719-d665-4afa-a032-7120399e99d5' [ FAILED ] touch_handler_scrolling.html?super_slow_handler (9013 ms) """
,
Jul 25 2016
,
Jul 25 2016
zhenw@, nednguyen@, do either of you have any idea why these tests might be having problems with clock sync on this hardware config? If nothing strikes you, I can look deeper into it.
,
Jul 25 2016
Ah: scrolling up, it looks like this is actually due to the devtools target crashing:
"""
Traceback (most recent call last):
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 85, in _RunStoryAndProcessErrorIfNeeded
state.RunStory(results)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\page\shared_page_state.py", line 319, in RunStory
self._current_page.Run(self)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\page\__init__.py", line 99, in Run
self.RunPageInteractions(action_runner)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\tools\perf\page_sets\tough_scheduling_cases.py", line 19, in RunPageInteractions
action_runner.ScrollPage()
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\actions\action_runner.py", line 361, in ScrollPage
use_touch=use_touch, synthetic_gesture_source=synthetic_gesture_source))
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\actions\action_runner.py", line 45, in _RunAction
action.RunAction(self._tab)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\actions\scroll.py", line 106, in RunAction
tab.WaitForJavaScriptExpression('window.__scrollActionDone', 60)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\browser\web_contents.py", line 123, in WaitForJavaScriptExpression
util.WaitFor(IsJavaScriptExpressionTrue, timeout)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\core\util.py", line 86, in WaitFor
res = condition()
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\browser\web_contents.py", line 116, in IsJavaScriptExpressionTrue
return bool(self.EvaluateJavaScript(expr))
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\browser\web_contents.py", line 187, in EvaluateJavaScript
expr, context_id=None, timeout=timeout)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\browser\web_contents.py", line 215, in EvaluateJavaScriptInContext
expr, context_id=context_id, timeout=timeout)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py", line 35, in inner
inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py", line 32, in inner
return func(inspector_backend, *args, **kwargs)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py", line 203, in EvaluateJavaScript
return self._runtime.Evaluate(expr, context_id, timeout)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_runtime.py", line 45, in Evaluate
res = self._inspector_websocket.SyncRequest(request, timeout)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 110, in SyncRequest
res = self._Receive(timeout)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 149, in _Receive
data = self._socket.recv()
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 596, in recv
opcode, data = self.recv_data()
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 606, in recv_data
frame = self.recv_frame()
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 637, in recv_frame
self._frame_header = self._recv_strict(2)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 746, in _recv_strict
bytes = self._recv(shortage)
File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 730, in _recv
bytes = self.sock.recv(bufsize)
DevtoolsTargetCrashException: Devtools target crashed
"""
Is there anything that can cause devtools to crash much more on one machine than another?
,
Jul 25 2016
,
Jul 25 2016
Interesting: it looks like this (https://goo.gl/7VRkuQ) is the crash screenshot. Not quite sure what to make of that.
,
Jul 25 2016
I wonder if this has something to do with this device handling input events differently than others. Among the other junk in the STDIO file, I see a Windows stack trace that includes: content::SyntheticGestureTargetBase::DispatchInputEventToPlatform+0x122 content::SyntheticSmoothMoveGesture::ForwardTouchInputEvents+0x169 content::SyntheticSmoothMoveGesture::ForwardInputEvents+0xad content::SyntheticSmoothScrollGesture::ForwardInputEvents+0x42 content::SyntheticGestureController::Flush+0xc8 content::RenderWidgetHostImpl::FlushInput+0x37 content::RenderWidgetHostViewBase::FlushInput+0x32 base::Timer::RunScheduledTask+0xd0 base::debug::TaskAnnotator::RunTask+0x184 base::MessageLoop::RunTask+0x468 base::MessageLoop::DeferOrRunPendingTask+0x52 base::MessageLoop::DoDelayedWork+0x1b4 base::MessagePumpForUI::DoRunLoop+0xa8 base::MessagePumpWin::Run+0x54 base::RunLoop::Run+0xed ChromeBrowserMainParts::MainMessageLoopRun+0xbc content::BrowserMainLoop::RunMainMessageLoopParts+0x6a content::BrowserMainRunnerImpl::Run+0x17 content::BrowserMain+0x10b content::RunNamedProcessTypeMain+0x16b content::ContentMainRunnerImpl::Run+0xcb content::ContentMain+0x30 That looks suspicious.
,
Jul 26 2016
Why do we care about the failure of non battor benchmarks?
,
Jul 26 2016
I think the clock sync failure is due to the fact that we try to do clock sync even when chrome has already crashed by the time we try to do clock sync.
,
Jul 26 2016
Ned: We are testing both the laptop and the BattOr on the FYI. We cannot purchase and rack in these new laptops until we know how to get the tests passing.
,
Jul 26 2016
+Bruce who may know how to interpret those crash stack. +Bruce: the stack is in https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Win%20Power%20High-DPI%20Perf/builds/21/steps/scheduler.tough_scheduling_cases/logs/stdio, just search for "Microsoft (R) Windows Debugger Version 10.0.10586.567 AMD64"
,
Jul 27 2016
Well that's weird...
I attached a debugger to a 64-bit canary browser and looked for this instruction in SyntheticGestureTargetBase::DispatchInputEventToPlatform:
mov byte ptr [<Unloaded_E> (00000000`00000000)],al
Here's the sequence where it occurs:
// Check that all touch pointers are within the content bounds.
for (unsigned i = 0; i < web_touch.touchesLength; i++)
00007FFE876BA635 xor esi,esi
00007FFE876BA637 cmp dword ptr [rbx+14h],esi
00007FFE876BA63A jbe content::SyntheticGestureTargetBase::DispatchInputEventToPlatform+144h (07FFE876BA684h)
CHECK(web_touch.touches[i].state != WebTouchPoint::StatePressed ||
00007FFE876BA63C mov eax,esi
00007FFE876BA63E imul rcx,rax,38h
00007FFE876BA642 cmp dword ptr [rcx+rbx+30h],2
00007FFE876BA647 jne content::SyntheticGestureTargetBase::DispatchInputEventToPlatform+129h (07FFE876BA669h)
00007FFE876BA649 cvttss2si r8d,dword ptr [rcx+rbx+40h]
00007FFE876BA650 cvttss2si edx,dword ptr [rcx+rbx+3Ch]
00007FFE876BA656 mov rcx,rdi
CHECK(web_touch.touches[i].state != WebTouchPoint::StatePressed ||
00007FFE876BA659 call content::SyntheticGestureTargetBase::PointIsWithinContents (07FFE876BA790h)
00007FFE876BA65E test al,al
00007FFE876BA660 jne content::SyntheticGestureTargetBase::DispatchInputEventToPlatform+129h (07FFE876BA669h)
00007FFE876BA662 mov byte ptr [0],al
00007FFE876BA669 test r14b,1
00007FFE876BA66D je content::SyntheticGestureTargetBase::DispatchInputEventToPlatform+13Dh (07FFE876BA67Dh)
00007FFE876BA66F and r14d,0FFFFFFFEh
00007FFE876BA673 lea rcx,[rsp+78h]
00007FFE876BA678 call logging::LogMessage::~LogMessage (07FFE857DC2E0h)+100000000h
So, shortly after calling PointIsWithinContents, within a CHECK statement, the compiler has generated code that unconditionally writes to address zero. Here's the CHECK statement and surrounding source code:
// Check that all touch pointers are within the content bounds.
for (unsigned i = 0; i < web_touch.touchesLength; i++)
CHECK(web_touch.touches[i].state != WebTouchPoint::StatePressed ||
PointIsWithinContents(web_touch.touches[i].position.x,
web_touch.touches[i].position.y))
<< "Touch coordinates are not within content bounds on TouchStart.";
There's another write to [0] slightly later, also within a CHECK statement.
Those two writes to [0] will always crash, so this is pretty serious.
I'm away from my desk all week so I won't be able to investigate in as much detail as I'd like, but this looks very odd - compiler bug? Change to the CHECK macro?
I'll try to investigate a bit more but I'm sharing this information now.
,
Jul 27 2016
I ran this command:
dumpbin /disasm chrome.dll | find /c "byte ptr [0000000000000000h]"
and found 3,331 hits, plus 15 hits where "byte ptr" was missing. I *think* that these all represent instructions that will crash 100% of the time, but this all seems too weird.
To sanity check I tested on a recent local 64-bit (I'm assuming this is 64-bit only) official build and saw zero. This build is from R407377, July 23rd. FWIW.
,
Jul 27 2016
+Dave: how do we build the binary for this machine?
,
Jul 27 2016
I just checked and realized the obvious: writing to location zero is how we intentionally crash when the CHECK fails. I'm not sure if this is new or not, but that's not really relevant. The crash is happening because a CHECK is failing, probably the one listed in comment #12.
,
Jul 27 2016
Thanks for your analysis, Bruce.
+Tim, owner of input. Can you or someone on your team help taking a look at this? Looks like this test is failing the CHECK in
// Check that all touch pointers are within the content bounds.
for (unsigned i = 0; i < web_touch.touchesLength; i++)
CHECK(web_touch.touches[i].state != WebTouchPoint::StatePressed ||
PointIsWithinContents(web_touch.touches[i].position.x,
web_touch.touches[i].position.y))
<< "Touch coordinates are not within content bounds on TouchStart.";
,
Jul 27 2016
FYI, this is happening on a new laptop we're evaluating adding to the perf lab (it is the same one that media team uses for hidpi coverage): https://www.amazon.com/Dell-Touchscreen-Laptop-Microsoft-Signature/dp/B010DODC66 Can I send this laptop to anyone to help them debug?
,
Jul 27 2016
Lan has been doing a bunch of work in this space. I suspect this was broken by https://chromium.googlesource.com/chromium/src/+/a9d9a6721c939076b6c688b137378f2c4b939eb9.
,
Jul 28 2016
It's also possible that it was a different CHECK that was breaking - I can only tell for sure what function it was. It's odd that the CHECK failure showed up as a crash - that seems bad.
,
Aug 31 2016
I'm now seeing these failures on (non-FYI) Win HighDPI perf bots, eg: https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%281%29/builds/59/steps/thread_times.tough_scrolling_cases/logs/stdio which makes it urgent from a perf bot health perspective. Lan, do you want to take a look to see if you can understand the crash (or even just temporarily disable the relevant CHECK) or should we revert your CL in order to try to get the bots green?
,
Aug 31 2016
Tests I've seen fail due to this: jitter thread_times.tough_scrolling_cases thread_times.tough_compositor_cases smoothness.tough_scrolling_cases scheduler.tough_scheduling_cases Since this appears to be just Windows High-DPI I suspect we're missing a scale factor compensation somewhere on Windows. You can probably trigger this crash with a DEBUG build on windows with --force-device-scale-factor=2 running any of these tests. More logs: https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%281%29/builds/59/steps/jitter/logs/stdio https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%282%29/builds/55/steps/thread_times.tough_compositor_cases/logs/stdio https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%282%29/builds/55/steps/smoothness.tough_scrolling_cases/logs/stdio https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%282%29/builds/55/steps/scheduler.tough_scheduling_cases/logs/stdio
,
Aug 31 2016
Also causing failures in: smoothness.top_25_smooth system_health.memory_desktop smoothness.tough_image_decode_cases v8.browsing_desktop v8.browsing_desktop_ignition Logs: https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%283%29/builds/35/steps/smoothness.top_25_smooth/logs/stdio https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%283%29/builds/35/steps/system_health.memory_desktop/logs/stdio https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%284%29/builds/24/steps/smoothness.tough_image_decode_cases/logs/stdio https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%284%29/builds/24/steps/v8.browsing_desktop/logs/stdio https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%284%29/builds/24/steps/v8.browsing_desktop_ignition/logs/stdio
,
Aug 31 2016
Making sure Bret has seen this since he's been doing the High-DPI work.
,
Aug 31 2016
,
Aug 31 2016
Hmm I recall oshima changed how synthetic input events work when we launched use-zoom-for-dsf, see https://codereview.chromium.org/2093343002 for example. Though that CL is only for mouse events. That's the only thing I know of off the top of my head that could be related.
,
Aug 31 2016
Thanks Bret. AFAIK we still have use-zoom-for-dsf enabled only on ChromeOS, not Windows or Linux yet, right? Talking with Lan and looking at her change, it seems likely to me that this has always been failing on HighDPI. Are these HighDPI bots new to chromium.perf with these crashes occurring from the start? I can't find any history for these bots showing the tests as having ever passed. Who added these bots to chromium.perf? Anyway I'll see if I can get these crashes to occur locally and verify they occur with or without Lan's change.
,
Aug 31 2016
It's enabled on Windows and Chrome OS. Not Linux yet.
,
Aug 31 2016
The bots are new. I added them last week. https://codereview.chromium.org/2275143002/ The bots were on FYI from May 27. https://codereview.chromium.org/2017163002/
,
Aug 31 2016
,
Aug 31 2016
bsep: Ah thanks. I didn't see any mention of enabling on Windows in issue 485650 - can you please add a comment there with the details of which CL enabled it on Windows? So perhaps enabling use-zoom-for-dsf is what caused these crashes to start occurring - that the synthetic gesture framework wasn't updated to account for use-zoom-for-dsf? I'm guessing we don't have the bot history anymore to verify that, but I can check to see if i get similar crashes on Linux when I run with the flag.
,
Sep 1 2016
Ok, I verified that I can reproduce these crashes on Linux if (and only if) I run with --enable-use-zoom-for-dsf. Eg. the following triggers the crash: tools/perf/run_benchmark --browser=debug --show-stdout smoothness.top_25_smooth --story-filter=".*barack.*" --extra-browser-args="--force-device-scale-factor=2 --enable-use-zoom-for-dsf" So enabling --enable-use-zoom-for-dsf is what caused these tests to start failing on Windows HighDPI perf bots. My guess is that the correct fix is to update the CssPixelsToPointF and CssPixelsToVector2dF functions in content/browser/devtools/protocol/input_handler.cc to take use-zoom-for-dsf's impact on page_scale_factor into account, similar to what https://codereview.chromium.org/2093343002 did. But I don't know the exact details of how use-zoom-for-dsf works. oshima (or bsep) can you please take a look?
,
Sep 1 2016
Will https://codereview.chromium.org/2269483002/ have fixed this?
,
Sep 1 2016
Nope. It's true I happened to be testing without that CL (hadn't sync'd in a couple days) but the bot failures include it. I just applied that CL locally and re-tested and still get the same crash. I'm not sure if the correct fix is in this Aura-specific conversion code or the more general content layer code I pointed too earlier (too many co-ordinate systems and not enough documentation on which variables are in which system!).
,
Sep 2 2016
This CL does not fix this problem. It made changes in SyntheticGestureTargetAura, which is after the touch position check. I think the problem is GpuBenchmarking::VisualViewportHeight() and VisualViewportWidth() are not zoom-independent, which are used for the touch start's default position. https://codesearch.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/actions/scroll.js?sq=package:chromium&dr=CSs&l=128.
,
Sep 6 2016
,
Sep 6 2016
,
Sep 17 2016
*Gentle Ping from your friendly neighborhood PerfBot Health Sheriff.*
,
Sep 20 2016
Issue 638412 has been merged into this issue.
,
Sep 20 2016
Issue 638413 has been merged into this issue.
,
Sep 29 2016
ClusterFuzz testcase is verified as fixed, closing issue. If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
,
Oct 5 2016
Issue 652601 has been merged into this issue.
,
Oct 7 2016
Reopening this: I don't see any indication in the comments that this should have been marked as fixed.
,
Oct 7 2016
,
Oct 7 2016
,
Sep 18 2017
We have made a bunch of changes on ClusterFuzz side, so resetting ClusterFuzz-Wrong label. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by charliea@chromium.org
, Jul 25 2016