Issue metadata
Sign in to add a comment
|
memory.top_10_mobile failing on chromium.perf/Android One Perf due to failure starting browser backend |
||||||||||||||||||||||
Issue descriptionmemory.top_10_mobile failing on chromium.perf/Android One Perf Builders failed on: - Android One Perf: https://build.chromium.org/p/chromium.perf/builders/Android%20One%20Perf Seems like this has been going on for about as long as I can scroll back (50 builds). There doesn't seem to be a consistent story that's failing. In each failure (https://chromium-swarm.appspot.com/task?id=39483c4f29919f10&refresh=10&show_raw=1, https://chromium-swarm.appspot.com/task?id=3941939af9398310&refresh=10&show_raw=1), the failure seems to be preceded by a failure when starting the browser backend: (ERROR) 2017-10-17 05:32:43,837 browser.__init__:68 Failed with Exception while starting the browser backend. Followed by a later: Traceback (most recent call last): File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 97, in _RunStoryAndProcessErrorIfNeeded state.WillRunStory(story) File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function return func(*args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 240, in WillRunStory self._StartBrowser(page) File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function return func(*args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 201, in _StartBrowser self._browser = self._possible_browser.Create(self._finder_options) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_finder.py", line 136, in Create browser_backend, self._platform_backend, self._credentials_path) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 60, in __init__ self._LogBrowserInfo() File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 117, in _LogBrowserInfo logging.info('Browser started (pid=%s).', self._browser_backend.pid) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py", line 236, in pid '%s' % (self._backend_settings.package, pids)) Exception: At most one instance of process org.chromium.chrome expected but found pids: defaultdict(<type 'list'>, {'org.chromium.chrome:sandboxed_process0': ['24794'], 'org.chromium.chrome:privileged_process0': ['24839'], 'org.chromium.chrome': ['24763', '24899']}) This failure sure makes it look like we're failing to start the browser backend because other Chrome processes are already running. Assigning this to perezju@, the benchmark owner. Juan, I remember you having problems with zombie Chrome instances before. Is it possible that this is something similar?
,
Oct 19 2017
actually +primiano, see #1
,
Oct 19 2017
We probably should actually be able to see the revision range at which this starts, but can't because of bug 776432 .
,
Oct 19 2017
Actually, I lied: we can just hit "Next" to go to the next page. I'll see if that helps us identify when this started.
,
Oct 19 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8965285243887288272
,
Oct 19 2017
Seems like the first instance of this failure was at 8d4a33b33a9e1151ddad2acf1a353bb3f1ce4b22. Given that this is flaky, it's a little bit hard to decide where to start the bisect at. It's failed 19 out of the last 25 runs, which means that it fails 76% of the time. That means that, for a given run, there's a 24% chance that it won't flake. If we want to get up to a 99.99% chance that testing at a given revision will indeed fail if the problem exists at that revision, we can do a little math: 0.24 ^ N <= .0001 ln(0.24 ^ N) <= ln(.0001) N * ln(0.24) <= ln(.0001) N >= ln(.0001) / ln(0.24) N >= 6.45 That means that if we have a repeat count of 7, we have a 99.99% chance that we'll see the failure at least once. Now, for choosing the revision range, we can apply the same logic: if we want to be 99.99% confident that the revision range we choose will contain the introduction of the problem, we should go back 7 revisions from the first instance of the failure. That gives us a "good" revision of 7eabc099a4dcbf647d1bffcf009df0fff6b5b33a. As for my own confidence in my math, I'd say I'm about 10% confident. Anyhow, I'll give it a shot.
,
Oct 19 2017
=== BISECT JOB RESULTS === NO Test failure found Bisect Details Configuration: android_one_perf_bisect Benchmark : memory.top_10_mobile Metric : memory:chrome:gpu_process:reported_by_os:system_memory:java_heap:private_dirty_size_avg/foreground/https_m_facebook_com_rihanna Revision Exit Code N chromium@506279 0 +- N/A 7 good chromium@506959 0 +- N/A 7 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests memory.top_10_mobile More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8965285243887288272 For feedback, file a bug with component Speed>Bisection
,
Oct 20 2017
,
Oct 20 2017
,
Oct 20 2017
,
Jan 2 2018
This is a duplicate of (now fixed) issue 785446 . |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by perezju@chromium.org
, Oct 19 2017