New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 718952 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 721550



Sign in to add a comment

Fix audio_CrasSanity and reenable the test

Project Member Reported by victorhsieh@chromium.org, May 5 2017

Issue description

Original issue: https://bugs.chromium.org/p/chromium/issues/detail?id=718171

https://codereview.chromium.org/2858763002 was the change the triggered the problem, but the change itself seems to be legit.  Since this only happened to audio_CrasSanity, the problem might be on the test itself, or the telemetry API it uses.


TimeoutException: Timed out while waiting 5s for py_utils.WaitFor(lambda: tab_id not in self.IterContextIds(), timeout=5).
  Traceback (most recent call last):
    File "/usr/local/autotest/common_lib/test.py", line 817, in _call_test_function
      return func(*args, **dargs)
    File "/usr/local/autotest/common_lib/test.py", line 470, in execute
      dargs)
    File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry
      postprocess_profiled_run, args, dargs)
    File "/usr/local/autotest/common_lib/test.py", line 380, in _call_run_once
      self.run_once(*args, **dargs)
    File "/usr/local/autotest/tests/audio_CrasSanity/audio_CrasSanity.py", line 88, in run_once
      self._cr.browser.tabs[total_tests].Close()
    File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/tab.py", line 100, in Close
      self._tab_list_backend.CloseTab(self.id)
    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/tab_list_backend.py", line 67, in CloseTab
      py_utils.WaitFor(lambda: tab_id not in self.IterContextIds(), timeout=5)
    File "/usr/local/telemetry/src/third_party/catapult/common/py_utils/py_utils/__init__.py", line 132, in WaitFor
      (timeout, GetConditionString()))
  TimeoutException: Timed out while waiting 5s for py_utils.WaitFor(lambda: tab_id not in self.IterContextIds(), timeout=5).
 
Cc: bccheng@chromium.org nednguyen@chromium.org
It looks like page_cycler_v2 also suffers:

https://bugs.chromium.org/p/chromium/issues/detail?id=718481
https://bugs.chromium.org/p/chromium/issues/detail?id=718687
Cc: josa...@chromium.org laszio@chromium.org
Labels: -Pri-2 Pri-1
Allow me raise this to P1, as it will block master-chromium-pfq shortly.
Cc: dgozman@chromium.org
Status: Available (was: Untriaged)
nednguyen@, from the code it looks like devtool's server does not return the expected tab IDs (but I can't confirm yet).  Do you have any insight that why https://codereview.chromium.org/2858763002 may change the behavior?
Cc: vsu...@chromium.org
A straight revert of https://codereview.chromium.org/2858763002 does not seem to fix the problem.
Owner: achuith@chromium.org
Status: Assigned (was: Available)
#5: I can actually reliably reproduce the problem only with the change.

Achuith will look into this next Monday.
Cc: hychao@chromium.org
Cc: jamescook@chromium.org
I'm gardener this week.

Comment 9 by warx@chromium.org, May 9 2017

Cc: shivanisha@chromium.org warx@chromium.org
shivanisha@ asked me to run the newest patch in https://codereview.chromium.org/2721933002/ regarding of issue 716109. But I am hitting this timeout locally on tricky, so could not verify it right now.
Blocking: 716109
OK, so the reason why my revert was ineffective appears to be due to a tooling configuration error on my PC. Reverting ttps://codereview.chromium.org/2858763002 does indeed fix this problem. Unfortunately, when the change to chrome/browser/devtools/chrome_devtools_manager_delegate.cc is reverted,  https://crbug.com/717003  comes back. I would however push for a revert at this stage because #717003 appears to be a fairly minor bug, whereas the TimeoutException has managed to our page_cycler and smoothness metrics for about a week so far.

Reverting devtools patch will bring back the highly visible to-be-merged-to-59 regression.

As pointed in https://bugs.chromium.org/p/chromium/issues/detail?id=718171#c2, I think that combination of telemetry and this specific test have a problem.

Does this affect anything else except for a single test?
On CrOS, it also affects smoothness.top_25_smooth, page_cycler_v2.typical_25, all the blink_perf tests and anything based on any of these.

	
Issue 718687 claims the problem is in 8GB hardcoded somewhere. Do you have any specific links to failures?
So to avoid conflating things: the telemetry failure (i.e. the TimeoutException) seems to go away when https://codereview.chromium.org/2858763002 is reverted. The CL is most likely unrelated to the underlying test. I also filed a ticket on the telemetry issue at  https://crbug.com/718848 .
Thank you for the info. I've created a simple patch [1] which should address the problem (although I still think there is a bug in test/telemetry). Could you please try it locally to see whether timeouts go away?

[1] https://codereview.chromium.org/2873393002/
Owner: ----
Status: Untriaged (was: Assigned)
I don't think it makes sense for me to own this bug. 

Richard: I cannot assign to you since you're not a project member, but can you own this? 
OK, https://codereview.chromium.org/2873393002/ seems to fix the telemetry problem, at least on an old checkout. I'm just going to verify it against a more recent tip.
OK, seems to work on the master branch, at least as it was on 0300 UTC. Investigating the root cause a bit more, I added some logging[1] into this section of ChromeDevToolsManagerDelegate::GetTargetType:

  for (TabContentsIterator it; !it.done(); it.Next()) {
    if (*it == web_contents)
      return DevToolsAgentHost::kTypePage;
  }

I think the intention of this code is to iterate over all the content::WebContent pointers (from a quick read of the code, WebContent doesn't seem to overload equality) in the current browser process and check if they match up with what's retrieved from the content::RenderFrameHost that's being operated on. Interestingly though, this doesn't seem to behave correctly on CrOS. Clicking on "Inspect" in the context menu of a New Tab page might gives this:

61:[3879:3879:0510/183836.792871:INFO:chrome_devtools_manager_delegate.cc(368)] ChromeDevToolsManagerDelegate::GetTargetType it=0xb277b680 web_contents=0xb3d7c480
72:[3879:3879:0510/183836.793502:INFO:chrome_devtools_manager_delegate.cc(368)] ChromeDevToolsManagerDelegate::GetTargetType it=0xb277b680 web_contents=0xb3f8e200

Then, after opening a new tab and doing it, again, it emits this:

88:[3879:3879:0510/183856.655102:INFO:chrome_devtools_manager_delegate.cc(368)] ChromeDevToolsManagerDelegate::GetTargetType it=0xb277b680 web_contents=0xb44a9b00 [2]
133:[3879:3879:0510/183900.037841:INFO:chrome_devtools_manager_delegate.cc(368)] ChromeDevToolsManagerDelegate::GetTargetType it=0xb277b680 web_contents=0xb3d7c000
134:[3879:3879:0510/183900.037859:INFO:chrome_devtools_manager_delegate.cc(368)] ChromeDevToolsManagerDelegate::GetTargetType it=0xb44a9b00 web_contents=0xb3d7c000

So there are now get two distinct tab web_contents, but there are also two new ChromeDevToolsManagerDelegate::GetTargetType web_contents. This is a typical example, but sometimes (very rarely) the check does succeed, because the two pointer values happen to line up. Is this worth fixing?

[1] LOG(INFO) << "ChromeDevToolsManagerDelegate::GetTargetType *it=" << *it << " web_contents=" << web_contents;
[2] Just in case you're wondering, 3879 is the browser process in this case.

Just a reminder, this is going to block master-chromium-pfq today.  The last successful AutoFDO profile was generated on 4/27 and it has been 14 days (a predetermined timeout) because of this.  Although we can extend the timeout, the performance of the CrOS images may already start to degrade, which is why the timeout exists.  It does not affect users before the branch point, but people who look at performance dashboard might feel confused.
achuith, out of curiosity, should the chromeos telemetry bots have caught this issue?

Thanks laszio for the pre-emptive update! If this CL lands before 8 PM Pacific we should be good for Chrome PFQ tonight.

Blocking: 721550
Most of the telemetry-based tests in the BVT seem to be fine, so seems unlikely. I think most of the telemetry unit tests are passing on the chromeos telemetry bots as well.
Project Member

Comment 24 by bugdroid1@chromium.org, May 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8e75b1bc9944bfe1a4a1a006e212418243e96abd

commit 8e75b1bc9944bfe1a4a1a006e212418243e96abd
Author: dgozman <dgozman@chromium.org>
Date: Thu May 11 22:52:05 2017

[DevTools] Fix telemetry flakiness based on target type

For some unknown reason, crrev.com/468815 caused multiple
telemetry tests to timeout. This patch partially reverts
that revision, applying another fix to original bug by
re-requesting target type on demand.

BUG=718952, 717003 

Review-Url: https://codereview.chromium.org/2873393002
Cr-Commit-Position: refs/heads/master@{#471102}

[modify] https://crrev.com/8e75b1bc9944bfe1a4a1a006e212418243e96abd/chrome/browser/devtools/chrome_devtools_manager_delegate.cc
[modify] https://crrev.com/8e75b1bc9944bfe1a4a1a006e212418243e96abd/content/browser/devtools/render_frame_devtools_agent_host.cc
[modify] https://crrev.com/8e75b1bc9944bfe1a4a1a006e212418243e96abd/content/browser/devtools/render_frame_devtools_agent_host.h

Blocking: -716109
Project Member

Comment 26 by bugdroid1@chromium.org, May 15 2017

Labels: merge-merged-3071
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cadfb3115e3b20f61e342523a12325a94ded2786

commit cadfb3115e3b20f61e342523a12325a94ded2786
Author: Dmitry Gozman <dgozman@chromium.org>
Date: Mon May 15 22:38:39 2017

Merge to 3071 "[DevTools] Fix telemetry flakiness based on target type"

For some unknown reason, crrev.com/468815 caused multiple
telemetry tests to timeout. This patch partially reverts
that revision, applying another fix to original bug by
re-requesting target type on demand.

BUG=718952, 717003 

Review-Url: https://codereview.chromium.org/2873393002
Cr-Commit-Position: refs/heads/master@{#471102}
(cherry picked from commit 8e75b1bc9944bfe1a4a1a006e212418243e96abd)

TBR=pfeldman

Review-Url: https://codereview.chromium.org/2886583002 .
Cr-Commit-Position: refs/branch-heads/3071@{#571}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[modify] https://crrev.com/cadfb3115e3b20f61e342523a12325a94ded2786/chrome/browser/devtools/chrome_devtools_manager_delegate.cc
[modify] https://crrev.com/cadfb3115e3b20f61e342523a12325a94ded2786/content/browser/devtools/render_frame_devtools_agent_host.cc
[modify] https://crrev.com/cadfb3115e3b20f61e342523a12325a94ded2786/content/browser/devtools/render_frame_devtools_agent_host.h

Components: Infra>Client>ChromeOS
Hi, this bug has not been updated recently and remains untriaged. Please acknowledge the bug and provide status within two weeks (6/8/2018), or the bug will be closed. Thank you.
Cc: -nednguyen@chromium.org
Components: -Infra>Client>ChromeOS OS>Kernel>Audio
Owner: rohi...@chromium.org
Status: Assigned (was: Untriaged)
This looks still relevant.

This isn't really an Infra problem, and I don't know who, exactly should
own it.  But ... maybe rohitbm@ knows?

Owner: vsu...@chromium.org
Assigning to Vin Audio/Video test owner.

Sign in to add a comment