Fix audio_CrasSanity and reenable the test |
|||||||||||||||||
Issue descriptionOriginal issue: https://bugs.chromium.org/p/chromium/issues/detail?id=718171 https://codereview.chromium.org/2858763002 was the change the triggered the problem, but the change itself seems to be legit. Since this only happened to audio_CrasSanity, the problem might be on the test itself, or the telemetry API it uses. TimeoutException: Timed out while waiting 5s for py_utils.WaitFor(lambda: tab_id not in self.IterContextIds(), timeout=5). Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 817, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/common_lib/test.py", line 470, in execute dargs) File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/common_lib/test.py", line 380, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/tests/audio_CrasSanity/audio_CrasSanity.py", line 88, in run_once self._cr.browser.tabs[total_tests].Close() File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/tab.py", line 100, in Close self._tab_list_backend.CloseTab(self.id) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/tab_list_backend.py", line 67, in CloseTab py_utils.WaitFor(lambda: tab_id not in self.IterContextIds(), timeout=5) File "/usr/local/telemetry/src/third_party/catapult/common/py_utils/py_utils/__init__.py", line 132, in WaitFor (timeout, GetConditionString())) TimeoutException: Timed out while waiting 5s for py_utils.WaitFor(lambda: tab_id not in self.IterContextIds(), timeout=5).
,
May 5 2017
Allow me raise this to P1, as it will block master-chromium-pfq shortly.
,
May 5 2017
nednguyen@, from the code it looks like devtool's server does not return the expected tab IDs (but I can't confirm yet). Do you have any insight that why https://codereview.chromium.org/2858763002 may change the behavior?
,
May 5 2017
,
May 5 2017
A straight revert of https://codereview.chromium.org/2858763002 does not seem to fix the problem.
,
May 5 2017
#5: I can actually reliably reproduce the problem only with the change. Achuith will look into this next Monday.
,
May 8 2017
,
May 8 2017
I'm gardener this week.
,
May 9 2017
shivanisha@ asked me to run the newest patch in https://codereview.chromium.org/2721933002/ regarding of issue 716109. But I am hitting this timeout locally on tricky, so could not verify it right now.
,
May 9 2017
,
May 10 2017
OK, so the reason why my revert was ineffective appears to be due to a tooling configuration error on my PC. Reverting ttps://codereview.chromium.org/2858763002 does indeed fix this problem. Unfortunately, when the change to chrome/browser/devtools/chrome_devtools_manager_delegate.cc is reverted, https://crbug.com/717003 comes back. I would however push for a revert at this stage because #717003 appears to be a fairly minor bug, whereas the TimeoutException has managed to our page_cycler and smoothness metrics for about a week so far.
,
May 10 2017
Reverting devtools patch will bring back the highly visible to-be-merged-to-59 regression. As pointed in https://bugs.chromium.org/p/chromium/issues/detail?id=718171#c2, I think that combination of telemetry and this specific test have a problem. Does this affect anything else except for a single test?
,
May 10 2017
On CrOS, it also affects smoothness.top_25_smooth, page_cycler_v2.typical_25, all the blink_perf tests and anything based on any of these.
,
May 10 2017
Issue 718687 claims the problem is in 8GB hardcoded somewhere. Do you have any specific links to failures?
,
May 10 2017
So to avoid conflating things: the telemetry failure (i.e. the TimeoutException) seems to go away when https://codereview.chromium.org/2858763002 is reverted. The CL is most likely unrelated to the underlying test. I also filed a ticket on the telemetry issue at https://crbug.com/718848 .
,
May 10 2017
Thank you for the info. I've created a simple patch [1] which should address the problem (although I still think there is a bug in test/telemetry). Could you please try it locally to see whether timeouts go away? [1] https://codereview.chromium.org/2873393002/
,
May 10 2017
I don't think it makes sense for me to own this bug. Richard: I cannot assign to you since you're not a project member, but can you own this?
,
May 10 2017
OK, https://codereview.chromium.org/2873393002/ seems to fix the telemetry problem, at least on an old checkout. I'm just going to verify it against a more recent tip.
,
May 10 2017
OK, seems to work on the master branch, at least as it was on 0300 UTC. Investigating the root cause a bit more, I added some logging[1] into this section of ChromeDevToolsManagerDelegate::GetTargetType:
for (TabContentsIterator it; !it.done(); it.Next()) {
if (*it == web_contents)
return DevToolsAgentHost::kTypePage;
}
I think the intention of this code is to iterate over all the content::WebContent pointers (from a quick read of the code, WebContent doesn't seem to overload equality) in the current browser process and check if they match up with what's retrieved from the content::RenderFrameHost that's being operated on. Interestingly though, this doesn't seem to behave correctly on CrOS. Clicking on "Inspect" in the context menu of a New Tab page might gives this:
61:[3879:3879:0510/183836.792871:INFO:chrome_devtools_manager_delegate.cc(368)] ChromeDevToolsManagerDelegate::GetTargetType it=0xb277b680 web_contents=0xb3d7c480
72:[3879:3879:0510/183836.793502:INFO:chrome_devtools_manager_delegate.cc(368)] ChromeDevToolsManagerDelegate::GetTargetType it=0xb277b680 web_contents=0xb3f8e200
Then, after opening a new tab and doing it, again, it emits this:
88:[3879:3879:0510/183856.655102:INFO:chrome_devtools_manager_delegate.cc(368)] ChromeDevToolsManagerDelegate::GetTargetType it=0xb277b680 web_contents=0xb44a9b00 [2]
133:[3879:3879:0510/183900.037841:INFO:chrome_devtools_manager_delegate.cc(368)] ChromeDevToolsManagerDelegate::GetTargetType it=0xb277b680 web_contents=0xb3d7c000
134:[3879:3879:0510/183900.037859:INFO:chrome_devtools_manager_delegate.cc(368)] ChromeDevToolsManagerDelegate::GetTargetType it=0xb44a9b00 web_contents=0xb3d7c000
So there are now get two distinct tab web_contents, but there are also two new ChromeDevToolsManagerDelegate::GetTargetType web_contents. This is a typical example, but sometimes (very rarely) the check does succeed, because the two pointer values happen to line up. Is this worth fixing?
[1] LOG(INFO) << "ChromeDevToolsManagerDelegate::GetTargetType *it=" << *it << " web_contents=" << web_contents;
[2] Just in case you're wondering, 3879 is the browser process in this case.
,
May 11 2017
Just a reminder, this is going to block master-chromium-pfq today. The last successful AutoFDO profile was generated on 4/27 and it has been 14 days (a predetermined timeout) because of this. Although we can extend the timeout, the performance of the CrOS images may already start to degrade, which is why the timeout exists. It does not affect users before the branch point, but people who look at performance dashboard might feel confused.
,
May 11 2017
achuith, out of curiosity, should the chromeos telemetry bots have caught this issue? Thanks laszio for the pre-emptive update! If this CL lands before 8 PM Pacific we should be good for Chrome PFQ tonight.
,
May 11 2017
,
May 11 2017
Most of the telemetry-based tests in the BVT seem to be fine, so seems unlikely. I think most of the telemetry unit tests are passing on the chromeos telemetry bots as well.
,
May 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8e75b1bc9944bfe1a4a1a006e212418243e96abd commit 8e75b1bc9944bfe1a4a1a006e212418243e96abd Author: dgozman <dgozman@chromium.org> Date: Thu May 11 22:52:05 2017 [DevTools] Fix telemetry flakiness based on target type For some unknown reason, crrev.com/468815 caused multiple telemetry tests to timeout. This patch partially reverts that revision, applying another fix to original bug by re-requesting target type on demand. BUG=718952, 717003 Review-Url: https://codereview.chromium.org/2873393002 Cr-Commit-Position: refs/heads/master@{#471102} [modify] https://crrev.com/8e75b1bc9944bfe1a4a1a006e212418243e96abd/chrome/browser/devtools/chrome_devtools_manager_delegate.cc [modify] https://crrev.com/8e75b1bc9944bfe1a4a1a006e212418243e96abd/content/browser/devtools/render_frame_devtools_agent_host.cc [modify] https://crrev.com/8e75b1bc9944bfe1a4a1a006e212418243e96abd/content/browser/devtools/render_frame_devtools_agent_host.h
,
May 15 2017
,
May 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cadfb3115e3b20f61e342523a12325a94ded2786 commit cadfb3115e3b20f61e342523a12325a94ded2786 Author: Dmitry Gozman <dgozman@chromium.org> Date: Mon May 15 22:38:39 2017 Merge to 3071 "[DevTools] Fix telemetry flakiness based on target type" For some unknown reason, crrev.com/468815 caused multiple telemetry tests to timeout. This patch partially reverts that revision, applying another fix to original bug by re-requesting target type on demand. BUG=718952, 717003 Review-Url: https://codereview.chromium.org/2873393002 Cr-Commit-Position: refs/heads/master@{#471102} (cherry picked from commit 8e75b1bc9944bfe1a4a1a006e212418243e96abd) TBR=pfeldman Review-Url: https://codereview.chromium.org/2886583002 . Cr-Commit-Position: refs/branch-heads/3071@{#571} Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641} [modify] https://crrev.com/cadfb3115e3b20f61e342523a12325a94ded2786/chrome/browser/devtools/chrome_devtools_manager_delegate.cc [modify] https://crrev.com/cadfb3115e3b20f61e342523a12325a94ded2786/content/browser/devtools/render_frame_devtools_agent_host.cc [modify] https://crrev.com/cadfb3115e3b20f61e342523a12325a94ded2786/content/browser/devtools/render_frame_devtools_agent_host.h
,
Nov 10 2017
,
May 31 2018
Hi, this bug has not been updated recently and remains untriaged. Please acknowledge the bug and provide status within two weeks (6/8/2018), or the bug will be closed. Thank you.
,
May 31 2018
,
Jun 1 2018
This looks still relevant. This isn't really an Infra problem, and I don't know who, exactly should own it. But ... maybe rohitbm@ knows?
,
Jun 1 2018
Assigning to Vin Audio/Video test owner.
,
Jun 4 2018
The test seems to be running well on bvt-perbuild: https://stainless.corp.google.com/search?test=%5Eaudio%5C_CrasSanity%24&exclude_non_release=true&exclude_cts=true&col=build&row=board_model&view=matrix&first_date=2018-05-29&last_date=2018-06-04 |
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by laszio@chromium.org
, May 5 2017