New issue
Advanced search Search tips

Issue 842019 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 840988
issue 846951

Blocking:
issue 844270
issue 851868



Sign in to add a comment

Calling WebMemoryCoordinator::OnPurgeMemory in RenderThreadImpl::ReleaseFreeMemory causes WebGL test failures

Project Member Reported by kbr@chromium.org, May 11 2018

Issue description

In https://chromium-review.googlesource.com/1052987 there was an attempt to call WebMemoryCoordinator::OnPurgeMemory from RenderThreadImpl::ReleaseFreeMemory.

It turns out that this introduced flaky timeouts in the WebGL conformance tests; see  Issue 840988 . It was difficult to track down this CL as the cause because the flakiness was highly intermittent. One of the bots in particular was failing at least one test on almost every run, and that helped identify the culprit.

It's unclear why this change caused that sort of flakiness. Does that call cause Oilpan GCs? Could there be bugs in WebGL's tracing code (or elsewhere – perhaps in the implementation of requestAnimationFrame?) that are causing objects to be inadvertently GCd at the wrong time? That might explain these and other flaky timeouts. Studying the code for:
https://cs.chromium.org/chromium/src/third_party/blink/renderer/platform/memory_coordinator.cc?q=OnPurgeMemory&sq=package:chromium&l=103&dr=CSs

seems unlikely.

However, some other clients look suspicious, like this one:
https://cs.chromium.org/chromium/src/third_party/blink/renderer/core/script/classic_pending_script.cc?q=OnPurgeMemory&sq=package:chromium&l=381&dr=CSs

Does this mean that if ClassicPendingScript::OnPurgeMemory is called, a script tag won't be executed? That would explain the random failures across a range of tests. From  Issue 840988 , here were some of the random failures:

WebglConformance_deqp_functional_gles3_shadermatrix_add_assign
WebglConformance_deqp_functional_gles3_texturespecification_basic_teximage3d_2d_array_02
WebglConformance_conformance_more_conformance_constants
WebglConformance_conformance_glsl_functions_glsl_function_clamp_float
WebglConformance_deqp_functional_gles3_shaderoperator_angle_and_trigonometry_03
WebglConformance_deqp_functional_gles3_shadermatrix_add_const
WebglConformance_conformance_glsl_functions_glsl_function_clamp_float

Gyuyoung, could you please investigate why your CL caused these failures? It seems that there's some underlying bug which needs to be tracked down and fixed.

Thanks.

 

Comment 1 by gyuyoung...@lge.com, May 11 2018

>Gyuyoung, could you please investigate why your CL caused these failures? It >seems that there's some underlying bug which needs to be tracked down and fixed.

ok, I will investigate what was the problem next week. Thanks!
Project Member

Comment 2 by sheriffbot@chromium.org, May 11 2018

Labels: Hotlist-Recharge-BouncingOwner
Owner: ----
Status: Untriaged (was: Assigned)
The assigned owner "gyuyoung.kim@lge.com" is not able to receive e-mails, please re-triage.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Cc: gyuyoung...@lge.com
Status: Assigned (was: Untriaged)
crbug is not happy with the bug assigned to gyuyoung.kim@lge.com. gyuyoung@, do you have a chromium account we should use instead?
Owner: haraken@chromium.org
Assigning to haraken@ in the meanwhile to track progress.

Comment 5 by gyuyoung...@lge.com, May 12 2018

Owner: gyuyoung...@lge.com

Comment 6 by gyuyoung...@lge.com, May 12 2018

Owner: gyuyoung...@chromium.org

Comment 7 by gyuyoung...@lge.com, May 12 2018

>crbug is not happy with the bug assigned to gyuyoung.kim@lge.com. gyuyoung@, do you have a chromium account we should use instead?

ah, I registered my @chromium.org account. Thanks.

Comment 8 by gyuyoung...@lge.com, May 14 2018

I started to investigate this issue today. Once I verified |blink::MemoryCoordinator::OnPurgeMemory()| was called by my CL during the webgl test. I'm checking it continually.

Comment 9 by gyuyoung...@lge.com, May 16 2018

Once I found that my CL made us call blink::DecommitFreeableMemory() twice. But, I'm not sure if this caused the problem yet. Still investigating it.


  if (blink_platform_impl_) {
    // Purge Skia font cache, resource cache, and image filter.
    SkGraphics::PurgeAllCaches();
    blink::DecommitFreeableMemory();
    blink::WebMemoryCoordinator::OnPurgeMemory();
  }

  
void MemoryCoordinator::OnPurgeMemory() {
  for (auto& client : clients_)
    client->OnPurgeMemory();
  ...
  ImageDecodingStore::Instance().Clear();
  WTF::Partitions::DecommitFreeableMemory();

Comment 10 by kbr@chromium.org, May 18 2018

Blocking: 844270
Let me share the result of my testing about this issue for last week. Calling |blink::DecommitFreeableMemory()| twice caused the timeout failure more often, but not 100% caused. And, the timeout test failure happened less when I call |blink::DecommitFreeableMemory()| once. So I don't 100% convince that calling the function twice caused the timeout failure yet. But, I feel like that we need to check how is the result of the bots when I land a new CL that calls |blink::DecommitFreeableMemory()| once.

Comment 12 by kbr@chromium.org, May 25 2018

Cc: hirosh...@chromium.org
gyuyoung.kim@: thanks for continuing to investigate this. I suspect that one of the classes that purges caches when MemoryCoordinator::OnPurgeMemory is called is doing the wrong thing, and that it is currently buggy, and the cause of intermittent timeouts in these tests. Your CL probably just exacerbated an existing problem.

When I glanced through the code earlier:
https://cs.chromium.org/chromium/src/third_party/blink/renderer/platform/memory_coordinator.cc?type=cs&q=MemoryCoordinator::OnPurgeMemory&g=0&l=103

one of the clients affected by this is ClassicPendingScript:
https://cs.chromium.org/chromium/src/third_party/blink/renderer/core/script/classic_pending_script.cc?type=cs&g=0&l=385

I don't know the semantics of these methods, but to me, having ClassicPendingScript::OnPurgeMemory call ClassicPendingScript::CancelStreaming seems wrong. Doesn't this mean that if we purge memory, then any script nodes which are still in the process of being loaded will never be loaded? This could easily explain random timeouts of these tests, if the script nodes sometimes aren't executed at all.

Cc: vogelheim@chromium.org tasak@google.com
+vogelheim@ for script streaming, +tasak@ for the long-standing testing issue around memory coordinator.

Expectation is that when we cancel the streaming there, still ClassicPendingScript is finished via non-streaming path, but this path is not well tested. I'll try to add unit tests and see whether the expectation holds.
FYI testing issues: Issue 671502 and Issue 682662.
The current implementation of ClassicPendingScript::OnPurgeMemory() is wrong:
PendingScriptClient is not notified of finish (i.e. the expectation in Comment #13 doesn't hold).
See the unit test CancellingStreamingByPurgeAfterResourceFinish in https://chromium-review.googlesource.com/#/c/chromium/src/+/1073990 (which fails).
Cc: kouhei@chromium.org
Components: Blink>HTML>Script
As this is a bug that affects functional behavior (not performance-only bug), I feel it would be helpful to merge a fix to beta/stable.

I feel disabling the streaming cancel in OnPurgeMemory() (i.e. remove ClassicPendingScript::OnPurgeMemory()) is the simplest fix. tasak@, do you expect performance/memory regressions caused by this?

Fixing this so that we can correctly cancel the streaming will be more complicated.
IIUC OnPurgeMemory() is the only code path that cancels the streaming from ClassicPendingScript, either before or after resource load finish.
Also, this is the only caller of ClassicPendingScript::CancelStreaming() that needs finish notification to PendingScriptClient after CancelStreaming(), because other callers are for disposing PendingScript and thus we no longer need any action after CancelStreaming().

I feel we should cleanup/simplify the semantics of script streaming cancellation, as we have similar but different cancel-like methods like:
- ClassicPendingScript::CancelStreaming()
- ScriptStreamer::Cancel()
- SourceStream::Cancel()
- ScriptStreamer::SuppressStreaming()

vogelheim@, WDYT?

Comment 17 by kbr@chromium.org, May 25 2018

hiroshige@: this issue has probably been there for a long time so I'm not convinced that we should decide up front to merge a fix to beta, and especially not stable. First we should figure out what the fix is and what sort of risk it carries.

Should we file a new bug about this, or should we make fixes in this area under this bug ID?

Filed  Issue 846951  for the specific ClassicPendingScript issue.
Blockedon: 846951
hiroshige@, FYI, I tested the webGL tests after disabling |CancelSreaming| with my CL yesterday. I ran it 3 times and there was no timeout during the 3 times. It seems to me that CancelStreaming() might cause the timeout intermittently.


void ClassicPendingScript::OnPurgeMemory() {
  CheckState();
//  CancelStreaming();
}
kbr@, It seems to me that the failures of webgl have been disappeared since #1637 build. 

- https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20%28NVIDIA%29/1637

Comment 22 by kbr@chromium.org, Jun 4 2018

Yes, a suppression was landed for the failing test.

Project Member

Comment 23 by bugdroid1@chromium.org, Jun 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f3ab72bc34a678e7c8e164869a52b951c2cf051b

commit f3ab72bc34a678e7c8e164869a52b951c2cf051b
Author: Gyuyoung Kim <gyuyoung.kim@lge.com>
Date: Tue Jun 05 01:26:01 2018

Reland: Call blink::WebMemoryCoordinator::OnPurgeMemory in RenderThreadImpl::ReleaseFreeMemory

Orignal commit: f4544427fd7e40abba353339146780a2a2b90051

This CL was reverted because it caused the timeout test failure on fyi bots.
But, it looks like the issue was solved. So now this CL lands original CL without
a duplicated function invocation.

Bug:  842019 
Test: Covered by RenderThreadImplDiscardableMemoryBrowserTest.ReleaseFreeMemory
Change-Id: I52a4bafc1688822a11fa2408ad7422331fa62a46
Reviewed-on: https://chromium-review.googlesource.com/1084369
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Gyuyoung Kim <gyuyoung.kim@lge.com>
Cr-Commit-Position: refs/heads/master@{#564331}
[modify] https://crrev.com/f3ab72bc34a678e7c8e164869a52b951c2cf051b/content/renderer/render_thread_impl.cc

Status: Fixed (was: Assigned)
After re-landing this CL, the bots are still fine. So I'd like to close this bug. If regressions happen again, I'll open this bug again.
Blocking: 851868

Sign in to add a comment