New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 631196 link

Starred by 5 users

Issue metadata

Status: Duplicate
Merged: issue 599656
Owner:
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocking:
issue 599656
issue 631200



Sign in to add a comment

Several tests failing on chromium.perf Win High-DPI Perf due to crash in SyntheticGestureTargetBase::DispatchInputEventToPlatform

Project Member Reported by charliea@chromium.org, Jul 25 2016

Issue description

Revision range first seen: Unknown
Link to failing step log: https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Win%20Power%20High-DPI%20Perf/builds/21/steps/scheduler.tough_scheduling_cases/logs/stdio
Link to bot: https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Win%20Power%20High-DPI%20Perf

"""
Exception raised when cleaning story run: 

Traceback (most recent call last):
  _RunStoryAndProcessErrorIfNeeded at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\story_runner.py:109
    test.DidRunPage(state.platform)
  DidRunPage at C:\b\c\b\Win_Power_High_DPI_Perf\src\tools\perf\measurements\smoothness.py:66
    self._tbm.DidRunStory(platform)
  DidRunStory at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\web_perf\timeline_based_measurement.py:303
    platform.tracing_controller.StopTracing()
  StopTracing at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\core\tracing_controller.py:47
    return self._tracing_controller_backend.StopTracing()
  StopTracing at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\platform\tracing_controller_backend.py:108
    self._IssueClockSyncMarker()
  _IssueClockSyncMarker at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\platform\tracing_controller_backend.py:199
    self._RecordIssuerClockSyncMarker)
  RecordClockSyncMarker at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\chrome_tracing_agent.py:184
    sync_id, record_controller_clock_sync_marker_callback)
  _RecordClockSyncMarkerAsyncEvent at C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\chrome_tracing_agent.py:167
    raise ChromeClockSyncError()
ChromeClockSyncError

Locals:
  has_clock_synced                             : False
  record_controller_clock_sync_marker_callback : <bound method TracingControllerBackend._RecordIssuerClockSyncMarker of <telemetry.internal.platform.tracing_controller_backend.TracingControllerBackend object at 0x03401E10>>
  sync_id                                      : 'ffa1a719-d665-4afa-a032-7120399e99d5'

[  FAILED  ] touch_handler_scrolling.html?super_slow_handler (9013 ms)
"""
 
Summary: Several tests failing on chromium.perf.fyi Win Power High-DPI Perf due to clock sync failure (was: Several tests failing on Win Power High-DPI Perf due to clock sync failure)
Note that this is on the FYI waterfall, not the main waterfall. In order to eventually get this hardware on the main waterfall, though, we'll need to get this resolved.
Blocking: 631200
zhenw@, nednguyen@, do either of you have any idea why these tests might be having problems with clock sync on this hardware config? If nothing strikes you, I can look deeper into it.
Ah: scrolling up, it looks like this is actually due to the devtools target crashing:

"""
Traceback (most recent call last):
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 85, in _RunStoryAndProcessErrorIfNeeded
    state.RunStory(results)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\page\shared_page_state.py", line 319, in RunStory
    self._current_page.Run(self)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\page\__init__.py", line 99, in Run
    self.RunPageInteractions(action_runner)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\tools\perf\page_sets\tough_scheduling_cases.py", line 19, in RunPageInteractions
    action_runner.ScrollPage()
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\actions\action_runner.py", line 361, in ScrollPage
    use_touch=use_touch, synthetic_gesture_source=synthetic_gesture_source))
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\actions\action_runner.py", line 45, in _RunAction
    action.RunAction(self._tab)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\actions\scroll.py", line 106, in RunAction
    tab.WaitForJavaScriptExpression('window.__scrollActionDone', 60)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\browser\web_contents.py", line 123, in WaitForJavaScriptExpression
    util.WaitFor(IsJavaScriptExpressionTrue, timeout)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\core\util.py", line 86, in WaitFor
    res = condition()
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\browser\web_contents.py", line 116, in IsJavaScriptExpressionTrue
    return bool(self.EvaluateJavaScript(expr))
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\browser\web_contents.py", line 187, in EvaluateJavaScript
    expr, context_id=None, timeout=timeout)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\browser\web_contents.py", line 215, in EvaluateJavaScriptInContext
    expr, context_id=context_id, timeout=timeout)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py", line 35, in inner
    inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py", line 32, in inner
    return func(inspector_backend, *args, **kwargs)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py", line 203, in EvaluateJavaScript
    return self._runtime.Evaluate(expr, context_id, timeout)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_runtime.py", line 45, in Evaluate
    res = self._inspector_websocket.SyncRequest(request, timeout)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 110, in SyncRequest
    res = self._Receive(timeout)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 149, in _Receive
    data = self._socket.recv()
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 596, in recv
    opcode, data = self.recv_data()
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 606, in recv_data
    frame = self.recv_frame()
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 637, in recv_frame
    self._frame_header = self._recv_strict(2)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 746, in _recv_strict
    bytes = self._recv(shortage)
  File "C:\b\c\b\Win_Power_High_DPI_Perf\src\third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 730, in _recv
    bytes = self.sock.recv(bufsize)
DevtoolsTargetCrashException: Devtools target crashed
"""

Is there anything that can cause devtools to crash much more on one machine than another?
Summary: Several tests failing on chromium.perf.fyi Win Power High-DPI Perf due to devtools crashing (was: Several tests failing on chromium.perf.fyi Win Power High-DPI Perf due to clock sync failure)
Interesting: it looks like this (https://goo.gl/7VRkuQ) is the crash screenshot. Not quite sure what to make of that.
I wonder if this has something to do with this device handling input events differently than others. Among the other junk in the STDIO file, I see a Windows stack trace that includes:

content::SyntheticGestureTargetBase::DispatchInputEventToPlatform+0x122
content::SyntheticSmoothMoveGesture::ForwardTouchInputEvents+0x169
content::SyntheticSmoothMoveGesture::ForwardInputEvents+0xad
content::SyntheticSmoothScrollGesture::ForwardInputEvents+0x42
content::SyntheticGestureController::Flush+0xc8
content::RenderWidgetHostImpl::FlushInput+0x37
content::RenderWidgetHostViewBase::FlushInput+0x32
base::Timer::RunScheduledTask+0xd0
base::debug::TaskAnnotator::RunTask+0x184
base::MessageLoop::RunTask+0x468
base::MessageLoop::DeferOrRunPendingTask+0x52
base::MessageLoop::DoDelayedWork+0x1b4
base::MessagePumpForUI::DoRunLoop+0xa8
base::MessagePumpWin::Run+0x54
base::RunLoop::Run+0xed
ChromeBrowserMainParts::MainMessageLoopRun+0xbc
content::BrowserMainLoop::RunMainMessageLoopParts+0x6a
content::BrowserMainRunnerImpl::Run+0x17
content::BrowserMain+0x10b
content::RunNamedProcessTypeMain+0x16b
content::ContentMainRunnerImpl::Run+0xcb
content::ContentMain+0x30

That looks suspicious.
Why do we care about the failure of non battor benchmarks?
I think the clock sync failure is due to the fact that we try to do clock sync even when chrome has already crashed by the time we try to do clock sync.
Ned: We are testing both the laptop and the BattOr on the FYI. We cannot purchase and rack in these new laptops until we know how to get the tests passing.
Cc: brucedaw...@chromium.org
+Bruce who may know how to interpret those crash stack.

+Bruce: the stack is in https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Win%20Power%20High-DPI%20Perf/builds/21/steps/scheduler.tough_scheduling_cases/logs/stdio, just search for "Microsoft (R) Windows Debugger Version 10.0.10586.567 AMD64"
Well that's weird...

I attached a debugger to a 64-bit canary browser and looked for this instruction in SyntheticGestureTargetBase::DispatchInputEventToPlatform:

mov     byte ptr [<Unloaded_E> (00000000`00000000)],al

Here's the sequence where it occurs:

    // Check that all touch pointers are within the content bounds.
    for (unsigned i = 0; i < web_touch.touchesLength; i++)
00007FFE876BA635  xor         esi,esi  
00007FFE876BA637  cmp         dword ptr [rbx+14h],esi  
00007FFE876BA63A  jbe         content::SyntheticGestureTargetBase::DispatchInputEventToPlatform+144h (07FFE876BA684h)  
      CHECK(web_touch.touches[i].state != WebTouchPoint::StatePressed ||
00007FFE876BA63C  mov         eax,esi  
00007FFE876BA63E  imul        rcx,rax,38h  
00007FFE876BA642  cmp         dword ptr [rcx+rbx+30h],2  
00007FFE876BA647  jne         content::SyntheticGestureTargetBase::DispatchInputEventToPlatform+129h (07FFE876BA669h)  
00007FFE876BA649  cvttss2si   r8d,dword ptr [rcx+rbx+40h]  
00007FFE876BA650  cvttss2si   edx,dword ptr [rcx+rbx+3Ch]  
00007FFE876BA656  mov         rcx,rdi  
      CHECK(web_touch.touches[i].state != WebTouchPoint::StatePressed ||
00007FFE876BA659  call        content::SyntheticGestureTargetBase::PointIsWithinContents (07FFE876BA790h)  
00007FFE876BA65E  test        al,al  
00007FFE876BA660  jne         content::SyntheticGestureTargetBase::DispatchInputEventToPlatform+129h (07FFE876BA669h)  
00007FFE876BA662  mov         byte ptr [0],al  
00007FFE876BA669  test        r14b,1  
00007FFE876BA66D  je          content::SyntheticGestureTargetBase::DispatchInputEventToPlatform+13Dh (07FFE876BA67Dh)  
00007FFE876BA66F  and         r14d,0FFFFFFFEh  
00007FFE876BA673  lea         rcx,[rsp+78h]  
00007FFE876BA678  call        logging::LogMessage::~LogMessage (07FFE857DC2E0h)+100000000h  

So, shortly after calling PointIsWithinContents, within a CHECK statement, the compiler has generated code that unconditionally writes to address zero. Here's the CHECK statement and surrounding source code:

    // Check that all touch pointers are within the content bounds.
    for (unsigned i = 0; i < web_touch.touchesLength; i++)
      CHECK(web_touch.touches[i].state != WebTouchPoint::StatePressed ||
            PointIsWithinContents(web_touch.touches[i].position.x,
                                  web_touch.touches[i].position.y))
          << "Touch coordinates are not within content bounds on TouchStart.";

There's another write to [0] slightly later, also within a CHECK statement.

Those two writes to [0] will always crash, so this is pretty serious.

I'm away from my desk all week so I won't be able to investigate in as much detail as I'd like, but this looks very odd - compiler bug? Change to the CHECK macro?

I'll try to investigate a bit more but I'm sharing this information now.

I ran this command:

    dumpbin /disasm chrome.dll | find /c "byte ptr [0000000000000000h]"

and found 3,331 hits, plus 15 hits where "byte ptr" was missing. I *think* that these all represent instructions that will crash 100% of the time, but this all seems too weird.

To sanity check I tested on a recent local 64-bit (I'm assuming this is 64-bit only) official build and saw zero. This build is from R407377, July 23rd. FWIW.

Cc: dtu@chromium.org
+Dave: how do we build the binary for this machine?
I just checked and realized the obvious: writing to location zero is how we intentionally crash when the CHECK fails. I'm not sure if this is new or not, but that's not really relevant.

The crash is happening because a CHECK is failing, probably the one listed in comment #12.
Cc: tdres...@chromium.org
Thanks for your analysis, Bruce. 

+Tim, owner of input. Can you or someone on your team help taking a look at this?  Looks like this test is failing the CHECK in 

    // Check that all touch pointers are within the content bounds.
    for (unsigned i = 0; i < web_touch.touchesLength; i++)
      CHECK(web_touch.touches[i].state != WebTouchPoint::StatePressed ||
            PointIsWithinContents(web_touch.touches[i].position.x,
                                  web_touch.touches[i].position.y))
          << "Touch coordinates are not within content bounds on TouchStart.";
FYI, this is happening on a new laptop we're evaluating adding to the perf lab (it is the same one that media team uses for hidpi coverage): https://www.amazon.com/Dell-Touchscreen-Laptop-Microsoft-Signature/dp/B010DODC66

Can I send this laptop to anyone to help them debug?
Owner: lanwei@chromium.org
Status: Assigned (was: Untriaged)
Lan has been doing a bunch of work in this space. 

I suspect this was broken by https://chromium.googlesource.com/chromium/src/+/a9d9a6721c939076b6c688b137378f2c4b939eb9.
It's also possible that it was a different CHECK that was breaking - I can only tell for sure what function it was.

It's odd that the CHECK failure showed up as a crash - that seems bad.
Labels: -Pri-2 Performance-Sheriff-BotHealth Pri-1
Summary: Several tests failing on chromium.perf Win High-DPI Perf due to crash in SyntheticGestureTargetBase::DispatchInputEventToPlatform (was: Several tests failing on chromium.perf.fyi Win Power High-DPI Perf due to devtools crashing)
I'm now seeing these failures on (non-FYI) Win HighDPI perf bots, eg: https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%281%29/builds/59/steps/thread_times.tough_scrolling_cases/logs/stdio which makes it urgent from a perf bot health perspective.

Lan, do you want to take a look to see if you can understand the crash (or even just temporarily disable the relevant CHECK) or should we revert your CL in order to try to get the bots green?
Tests I've seen fail due to this:
jitter
thread_times.tough_scrolling_cases
thread_times.tough_compositor_cases
smoothness.tough_scrolling_cases
scheduler.tough_scheduling_cases

Since this appears to be just Windows High-DPI I suspect we're missing a scale factor compensation somewhere on Windows.  You can probably trigger this crash with a DEBUG build on windows with --force-device-scale-factor=2 running any of these tests.

More logs:

https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%281%29/builds/59/steps/jitter/logs/stdio

https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%282%29/builds/55/steps/thread_times.tough_compositor_cases/logs/stdio

https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%282%29/builds/55/steps/smoothness.tough_scrolling_cases/logs/stdio

https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf%20%282%29/builds/55/steps/scheduler.tough_scheduling_cases/logs/stdio
Cc: bsep@chromium.org
Making sure Bret has seen this since he's been doing the High-DPI work.
Cc: charliea@chromium.org

Comment 25 by bsep@chromium.org, Aug 31 2016

Hmm I recall oshima changed how synthetic input events work when we launched use-zoom-for-dsf, see https://codereview.chromium.org/2093343002 for example. Though that CL is only for mouse events. That's the only thing I know of off the top of my head that could be related.
Thanks Bret.  AFAIK we still have use-zoom-for-dsf enabled only on ChromeOS, not Windows or Linux yet, right?

Talking with Lan and looking at her change, it seems likely to me that this has always been failing on HighDPI.  Are these HighDPI bots new to chromium.perf with these crashes occurring from the start?  I can't find any history for these bots showing the tests as having ever passed.  Who added these bots to chromium.perf?

Anyway I'll see if I can get these crashes to occur locally and verify they occur with or without Lan's change.

Comment 27 by bsep@chromium.org, Aug 31 2016

It's enabled on Windows and Chrome OS. Not Linux yet.

Comment 28 by dtu@chromium.org, Aug 31 2016

The bots are new. I added them last week.
https://codereview.chromium.org/2275143002/

The bots were on FYI from May 27.
https://codereview.chromium.org/2017163002/
Blocking: 599656
bsep: Ah thanks.  I didn't see any mention of enabling on Windows in issue 485650 - can you please add a comment there with the details of which CL enabled it on Windows?

So perhaps enabling use-zoom-for-dsf is what caused these crashes to start occurring - that the synthetic gesture framework wasn't updated to account for use-zoom-for-dsf?  I'm guessing we don't have the bot history anymore to verify that, but I can check to see if i get similar crashes on Linux when I run with the flag. 
Cc: lanwei@chromium.org
Owner: osh...@chromium.org
Ok, I verified that I can reproduce these crashes on Linux if (and only if) I run with --enable-use-zoom-for-dsf.  Eg. the following triggers the crash:

tools/perf/run_benchmark --browser=debug --show-stdout smoothness.top_25_smooth --story-filter=".*barack.*" --extra-browser-args="--force-device-scale-factor=2 --enable-use-zoom-for-dsf"

So enabling --enable-use-zoom-for-dsf is what caused these tests to start failing on Windows HighDPI perf bots.  

My guess is that the correct fix is to update the CssPixelsToPointF and CssPixelsToVector2dF functions in content/browser/devtools/protocol/input_handler.cc to take use-zoom-for-dsf's impact on page_scale_factor into account, similar to what https://codereview.chromium.org/2093343002 did.  But I don't know the exact details of how use-zoom-for-dsf works.  oshima (or bsep) can you please take a look?

Will https://codereview.chromium.org/2269483002/ have fixed this?
Nope.  It's true I happened to be testing without that CL (hadn't sync'd in a couple days) but the bot failures include it.  I just applied that CL locally and re-tested and still get the same crash.  I'm not sure if the correct fix is in this Aura-specific conversion code or the more general content layer code I pointed too earlier (too many co-ordinate systems and not enough documentation on which variables are in which system!).
Cc: bokan@chromium.org
This CL does not fix this problem. It made changes in SyntheticGestureTargetAura, which is after the touch position check. I think the problem is GpuBenchmarking::VisualViewportHeight() and VisualViewportWidth() are not zoom-independent, which are used for the touch start's default position. 
https://codesearch.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/actions/scroll.js?sq=package:chromium&dr=CSs&l=128.
Cc: eyaich@chromium.org vmi...@chromium.org
 Issue 644279  has been merged into this issue.
Cc: bccheng@chromium.org
*Gentle Ping from your friendly neighborhood PerfBot Health Sheriff.*
 Issue 638412  has been merged into this issue.
 Issue 638413  has been merged into this issue.
Project Member

Comment 40 by ClusterFuzz, Sep 29 2016

Labels: ClusterFuzz-Verified
Status: Verified (was: Assigned)
ClusterFuzz testcase is verified as fixed, closing issue.

If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
 Issue 652601  has been merged into this issue.
Labels: -ClusterFuzz-Verified ClusterFuzz-Wrong
Reopening this: I don't see any indication in the comments that this should have been marked as fixed.
Status: Assigned (was: Verified)
Mergedinto: 599656
Status: Duplicate (was: Assigned)
I landed a fix in  crbug.com/599656 
Labels: -ClusterFuzz-Wrong
We have made a bunch of changes on ClusterFuzz side, so resetting ClusterFuzz-Wrong label.

Sign in to add a comment