New issue
Advanced search Search tips

Issue 891270 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: ----



Sign in to add a comment

telemetry_perf_unittests causing a DCHECK in paint on chromium.linux/Linux Tests (dbg)(1)

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Oct 2

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of clamy@google.com

telemetry_perf_unittests failing on chromium.linux/Linux Tests (dbg)(1)

Builders failed on: 
- Linux Tests (dbg)(1): 
  https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20Tests%20%28dbg%29%281%29

I suspect this was caused by the latest catapult roll: https://chromium-review.googlesource.com/c/chromium/src/+/1256077
 
Owner: sullivan@chromium.org
Status: Assigned (was: Available)
Owner: mattcary@chromium.org
mattcary: can you check if this was caused by your CL from the roll:

2018-10-02 mattcary@chromium.org Tracing: add java_base* executable memory metrics.
Looking.

Telemetry perf tests are timing out on one shard.

It seems unlikely to be my change, which added a small amount of extra processing of memory stats in order to add a new metric.

The perf tests have been sporadically timing out since yesterday, before my change was submitted. So I suspect that whatever caused that to start happening hasn't fixed itself yet :)

There seems to be something strange going on in the SystemHealthSmokeTests, but as that seems to be a generated benchmark I'm not entirely sure what's going on.
Cc: wangxianzhu@chromium.org
I'm not clear how to read the output from the telemetry perf tests, but based on https://chromium-swarm.appspot.com/task?id=4053080c07276210&refresh=10&show_raw=1 it looks like there may be a crash happening?

"""
  Found minidump via globbing in minidump dir
...
  Thread 0 (crashed)
   0  libc-2.19.so + 0x36c37
...
      Found by: given as instruction pointer in context
   1  libbase.so!base::debug::BreakDebugger() + 0x18
...
   8  libbase.so!~LogMessage [logging.cc : 833 + 0x3e]
...
   9  libblink_core.so!blink::FindVisualRectNeedingUpdateScopeBase::FindVisualRectNeedingUpdateScopeBase(blink::LayoutObject const&, blink::PaintInvalidatorContext const&, blink::LayoutRect const&) + 0x121
...

~LogMessage implies a DCHECK or LOG(FATAL) to me. The FindVisualRectNeedingUpdateScopeBase constructor does have two of them:

  DCHECK(context.tree_builder_context_actually_needed_);
  DCHECK(context.NeedsVisualRectUpdate(object));

A long shot, but wangxianzhu@ was involved in fixing  issue 836097  which also mentioned a DCHECK similar to the latter (but in a different context), so I'm cc-ing them. wangxianzhu@, can you add any information about FindVisualRectNeedingUpdateScopeBase ? (Please note this may be a red herring and I may be way off base).
Note: I am seeing this minidump consistently in the failing telemetry_perf_unittests cases; every run I've checked since 2018-10-02 10:04 AM.


Cc: vmp...@chromium.org
Components: Blink>Paint>Invalidation
vmpstr@, this seems related to the recent change about paint invalidation dirty flags (https://chromium-review.googlesource.com/c/chromium/src/+/1237246). It seems the case that some ancestor previously needed visual rect update but we just marked it through the ancestor chain of some descendant needing visual rect update.
Owner: vmp...@chromium.org
Good call, that seems plausible to me.

Reassigning the bug as I'm clearly out of my depth here.
We had two optimizations here. One that is mentioned in #6, which could cause the DCHECK(context.NeedsVisualRectUpdate(object)); dcheck. We also had one in https://chromium-review.googlesource.com/c/chromium/src/+/1252941 which could cause DCHECK(context.tree_builder_context_actually_needed_); to trigger as well. 

The latter is in the blamelist of https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20Tests%20%28dbg%29%281%29/74779 which is the first build where it started failing a lot.

I'm going to revert a part of that change that I suspect cause this and then investigate. 
Project Member

Comment 9 by bugdroid1@chromium.org, Oct 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5d1aacaa63cd61c15db0d96d63047a29a527cd35

commit 5d1aacaa63cd61c15db0d96d63047a29a527cd35
Author: Vladimir Levin <vmpstr@chromium.org>
Date: Wed Oct 03 19:36:43 2018

Partially revert an isolation boundary optimization.

This patch reverts a part of crrev.com/367a97e. This seems to have
caused DCHECK failures on one of the telemetry tests.

R=chrishtr@chromium.org, pdr@chromium.org, wangxianzhu@chromium.org

Bug:  891270 
Cq-Include-Trybots: luci.chromium.try:linux_layout_tests_slimming_paint_v2;master.tryserver.blink:linux_trusty_blink_rel
Change-Id: Ib99768154470fe69796e3a47645ffb1c981661a9
Reviewed-on: https://chromium-review.googlesource.com/c/1259298
Reviewed-by: Philip Rogers <pdr@chromium.org>
Reviewed-by: Xianzhu Wang <wangxianzhu@chromium.org>
Reviewed-by: Chris Harrelson <chrishtr@chromium.org>
Commit-Queue: vmpstr <vmpstr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#596325}
[modify] https://crrev.com/5d1aacaa63cd61c15db0d96d63047a29a527cd35/third_party/blink/renderer/core/paint/pre_paint_tree_walk.cc

I can confirm locally that it's the latter dcheck from #8:

on www.westpac.com.au (telemetry archive):
[1:1:1003/131039.617050:FATAL:find_paint_offset_and_visual_rect_needing_update.h(72)] Check failed: context.tree_builder_context_actually_needed_. 
#0 0x7f23945fae3d base::debug::StackTrace::StackTrace()
#1 0x7f239430388a base::debug::StackTrace::StackTrace()
#2 0x7f239437369b logging::LogMessage::~LogMessage()
#3 0x7f23761de961 blink::FindVisualRectNeedingUpdateScopeBase::FindVisualRectNeedingUpdateScopeBase()
#4 0x7f2376f1cbf0 blink::FindObjectVisualRectNeedingUpdateScope::FindObjectVisualRectNeedingUpdateScope()
#5 0x7f2376f1b65c blink::PaintInvalidator::InvalidatePaint()
#6 0x7f2376f7d1ba blink::PrePaintTreeWalk::WalkInternal()
...

From initial logging:
LayoutBlockFlow UL class='slider js-slider' descendant needs paint property update
LayoutListItem (relative positioned) LI class='slide slide-classic' needs paint property update
LayoutBlockFlow (positioned) <pseudo:before> needs paint property update
LayoutTextFragment (anonymous) not required
^^^^^^^^^^^^^^^^^^ - that one actually needs a visual rect update.

Will update if I can reduce, or if I have further leads.

Here's the bug:

When creating a tree builder context, we consider only the parent context's subtree needs visual rect update. That fails, so we don't actually need the tree builder context. Then, we update for self (this is a debug build so we always create a context).

After that we hit this path: https://cs.chromium.org/chromium/src/third_party/blink/renderer/core/paint/pre_paint_tree_walk.cc?dr&sq=package:chromium&g=0&l=334

in this case, the clip_changed is an inherited property from the parent. Because of this, we set the subtree needs visual rect update, but we don't have a valid tree builder context (we wouldn't have it in release), which triggers a dcheck later on.

The fix is probably to remove this line as well, since it follows the same reasoning: clip changing should not cause the subtree to need visual rect updates _especially_ if the clip changed is inherited from the parent.

I'll be writing a test and put up a patch with the reland of #10 as well as the fix to ensure we don't set the subtree needs visual rect update here as well.
Cc: sullivan@chromium.org mattcary@chromium.org
Summary: telemetry_perf_unittests causing a DCHECK in paint on chromium.linux/Linux Tests (dbg)(1) (was: telemetry_perf_unittests failing on chromium.linux/Linux Tests (dbg)(1))
+sullivan, +mattcary

The patch in 9 fixed the crash, which I'm no longer seeing in builds. However, the builder isn't green all the way. Ie there are still telemetry_perf_unittests and telemetry_unittests failures very frequently. I think this needs to be re-triaged. 

I'm renaming this issue to be specific to the DCHECK that was happening.
Project Member

Comment 13 by bugdroid1@chromium.org, Oct 4

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3fa6c505ea3f2e3dc229164c357052b11ca81023

commit 3fa6c505ea3f2e3dc229164c357052b11ca81023
Author: Vladimir Levin <vmpstr@chromium.org>
Date: Thu Oct 04 19:00:56 2018

Reland of visual rect subtree walk optimization.

This patch relands a previous revert (crrev.com/5d1aacaa) with a fix.
The problem was that if self clip changed, then it would set a
bitfield requiring a subtree walk, but at that point we already don't
have a valid tree builder context. This should only have triggered in
debug, but it's good to fix anyway since the release will cause an
unnecessary recursion.

Added a test that fails the same DCHECK with just a pure re-land, and
passes with the additional changes.

R=chrishtr@chromium.org, pdr@chromium.org, wangxianzhu@chromium.org

Bug:  891270 
Cq-Include-Trybots: luci.chromium.try:linux_layout_tests_slimming_paint_v2;master.tryserver.blink:linux_trusty_blink_rel
Change-Id: I20bec21e474fe22752eaf2737e43a121125e20e2
Reviewed-on: https://chromium-review.googlesource.com/c/1261930
Reviewed-by: Xianzhu Wang <wangxianzhu@chromium.org>
Commit-Queue: vmpstr <vmpstr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#596788}
[modify] https://crrev.com/3fa6c505ea3f2e3dc229164c357052b11ca81023/third_party/blink/renderer/core/paint/pre_paint_tree_walk.cc
[modify] https://crrev.com/3fa6c505ea3f2e3dc229164c357052b11ca81023/third_party/blink/renderer/core/paint/pre_paint_tree_walk_test.cc

Labels: -Sheriff-Chromium
Status: Fixed (was: Assigned)
Thanks for your fix!

I'll look at the bot failures and create a bug as appropriate.
The linux telemetry tests seem stable now.

Sign in to add a comment