New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 776108 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 785446
Owner:
Closed: Jan 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

memory.top_10_mobile failing on chromium.perf/Android One Perf due to failure starting browser backend

Project Member Reported by charliea@google.com, Oct 18 2017

Issue description

memory.top_10_mobile failing on chromium.perf/Android One Perf

Builders failed on: 
- Android One Perf: 
  https://build.chromium.org/p/chromium.perf/builders/Android%20One%20Perf

Seems like this has been going on for about as long as I can scroll back (50 builds). There doesn't seem to be a consistent story that's failing.

In each failure (https://chromium-swarm.appspot.com/task?id=39483c4f29919f10&refresh=10&show_raw=1, https://chromium-swarm.appspot.com/task?id=3941939af9398310&refresh=10&show_raw=1), the failure seems to be preceded by a failure when starting the browser backend:

(ERROR) 2017-10-17 05:32:43,837 browser.__init__:68  Failed with Exception while starting the browser backend.

Followed by a later:

Traceback (most recent call last):
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 97, in _RunStoryAndProcessErrorIfNeeded
    state.WillRunStory(story)
  File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
    return func(*args, **kwargs)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 240, in WillRunStory
    self._StartBrowser(page)
  File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
    return func(*args, **kwargs)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 201, in _StartBrowser
    self._browser = self._possible_browser.Create(self._finder_options)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_finder.py", line 136, in Create
    browser_backend, self._platform_backend, self._credentials_path)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 60, in __init__
    self._LogBrowserInfo()
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 117, in _LogBrowserInfo
    logging.info('Browser started (pid=%s).', self._browser_backend.pid)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py", line 236, in pid
    '%s' % (self._backend_settings.package, pids))
Exception: At most one instance of process org.chromium.chrome expected but found pids: defaultdict(<type 'list'>, {'org.chromium.chrome:sandboxed_process0': ['24794'], 'org.chromium.chrome:privileged_process0': ['24839'], 'org.chromium.chrome': ['24763', '24899']})

This failure sure makes it look like we're failing to start the browser backend because other Chrome processes are already running. 

Assigning this to perezju@, the benchmark owner. Juan, I remember you having problems with zombie Chrome instances before. Is it possible that this is something similar?


 
I did see this on another bot, however it seemed like a flake there.

Anyway, having two Chrome processes on Android sounds like a bug in Chrome? Or how can that happen?

+primiano in case you have some thoughts.

Sad that we didn't catch this when it started happening, a return code bisect might have been able to find the culprit :(
Cc: primiano@chromium.org
actually +primiano, see #1
We probably should actually be able to see the revision range at which this starts, but can't because of  bug 776432 .
Actually, I lied: we can just hit "Next" to go to the next page. I'll see if that helps us identify when this started.
Seems like the first instance of this failure was at 8d4a33b33a9e1151ddad2acf1a353bb3f1ce4b22. Given that this is flaky, it's a little bit hard to decide where to start the bisect at.

It's failed 19 out of the last 25 runs, which means that it fails 76% of the time. That means that, for a given run, there's a 24% chance that it won't flake. If we want to get up to a 99.99% chance that testing at a given revision will indeed fail if the problem exists at that revision, we can do a little math:

0.24 ^ N <= .0001

ln(0.24 ^ N) <= ln(.0001)
N * ln(0.24) <= ln(.0001)
N >= ln(.0001) / ln(0.24) 
N >= 6.45

That means that if we have a repeat count of 7, we have a 99.99% chance that we'll see the failure at least once.

Now, for choosing the revision range, we can apply the same logic: if we want to be 99.99% confident that the revision range we choose will contain the introduction of the problem, we should go back 7 revisions from the first instance of the failure. That gives us a "good" revision of 7eabc099a4dcbf647d1bffcf009df0fff6b5b33a. 

As for my own confidence in my math, I'd say I'm about 10% confident.

Anyhow, I'll give it a shot.
Project Member

Comment 7 by 42576172...@developer.gserviceaccount.com, Oct 19 2017


=== BISECT JOB RESULTS ===
NO Test failure found

Bisect Details
  Configuration: android_one_perf_bisect
  Benchmark    : memory.top_10_mobile
  Metric       : memory:chrome:gpu_process:reported_by_os:system_memory:java_heap:private_dirty_size_avg/foreground/https_m_facebook_com_rihanna

Revision             Exit Code      N
chromium@506279      0 +- N/A       7      good
chromium@506959      0 +- N/A       7      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests memory.top_10_mobile

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8965285243887288272


For feedback, file a bug with component Speed>Bisection
Labels: Performance-Memory
Labels: Pri-2 Type-Bug-Security
Labels: -Type-Bug-Security Type-Bug-Regression
Mergedinto: 785446
Status: Duplicate (was: Assigned)
This is a duplicate of (now fixed)  issue 785446 .

Sign in to add a comment