New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 729089 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 728785
Owner:
Last visit > 30 days ago
Closed: Jun 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: ----
Type: ----



Sign in to add a comment

system_health.common_desktop's browse:media:flickr_infinite_scroll generating a too-big trace on Windows

Project Member Reported by simonhatch@chromium.org, Jun 2 2017

Issue description

I see this in log:


Traceback (most recent call last):
  RunBenchmark at c:\b\s\w\ir\third_party\catapult\telemetry\telemetry\internal\story_runner.py:384
    expectations=expectations)
  Run at c:\b\s\w\ir\third_party\catapult\telemetry\telemetry\internal\story_runner.py:253
    _RunStoryAndProcessErrorIfNeeded(story, results, state, test)
  _RunStoryAndProcessErrorIfNeeded at c:\b\s\w\ir\third_party\catapult\telemetry\telemetry\internal\story_runner.py:90
    state.RunStory(results)
  traced_function at c:\b\s\w\ir\third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:75
    return func(*args, **kwargs)
  RunStory at c:\b\s\w\ir\third_party\catapult\telemetry\telemetry\page\shared_page_state.py:296
    self._current_page.Run(self)
  Run at c:\b\s\w\ir\third_party\catapult\telemetry\telemetry\page\__init__.py:112
    self.RunPageInteractions(action_runner)
  RunPageInteractions at c:\b\s\w\ir\tools\perf\page_sets\system_health\system_health_story.py:142
    self._DidLoadDocument(action_runner)
  _DidLoadDocument at c:\b\s\w\ir\tools\perf\page_sets\system_health\browsing_stories.py:807
    self._Scroll(action_runner, self.SCROLL_DISTANCE, self.SCROLL_STEP)
  _Scroll at c:\b\s\w\ir\tools\perf\page_sets\system_health\browsing_stories.py:828
    raise Exception('Scrolling stuck at %d' % remaining)
Exception: Scrolling stuck at 18560

Locals:
  action_runner : <telemetry.internal.actions.action_runner.ActionRunner object at 0x040508D0>
  distance      : 25000
  new_remaining : 18560
  remaining     : 18560
  retry_count   : 3
  step_size     : 1000

I saw a MemoryError in this log: https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FWin_10_High-DPI_Perf%2F663%2F%2B%2Frecipes%2Fsteps%2Fsystem_health.common_desktop_on_Intel_GPU_on_Windows_on_Windows-10-10240%2F0%2Fstdout

I bet this is related to me enabling the CPU time metric and the "toplevel" trace category yesterday, which would definitely cause the size of the traces to grow. It seems like all of the failures are on our Windows bots, where Python is limited to 2GB.
(Python is limited to 2GB because we use the 32 bit version, and 32 bit Windows processes are limited to using 2GB of RAM.)
One thing that's interesting: the Mac version of the same trace (browse:media:flickr_infinite_scroll) seems to have a *significantly* smaller trace (150MB on Mac, as opposed to 1.2GB on Windows). That story, which causes the rest of the Windows system health stories to not even run, is among the largest Mac traces, but isn't _the_ largest. misc:multitab, for example, is right around the same size, as is browse:news:nytimes.

I think that something on Windows is causing the trace size to be significantly larger than it should be.
Revert in progress here: https://chromium-review.googlesource.com/c/522887
Unfortunately, the only trace that was collected on Windows before the failure was browse_news_cnn, which is disabled on Mac. Comparing it to Linux, though, it's actually pretty comparable: the Windows trace is something like 110MB, and the Linux trace is about 90MB. This makes me think that there's not some universal problem where Windows always issues way more trace events than other platforms.

However, looking at flickr_infinite_scroll, it looks like there may be some problem specific to Windows. The Linux trace size is 400MB, whereas the Windows trace size is more like 2GB.
Ack! I don't think this is actually my CLs fault: I think this is due to Ned's migration of the v8.infinite_scroll stories to the system health pageset. His CL (https://codereview.chromium.org/2890283002) is in the list of changelists for the first failing CL. I'm going to go ahead and disable this story on Windows.
Owner: nednguyen@chromium.org
Status: Assigned (was: Available)
Summary: system_health.common_desktop's browse:media:flickr_infinite_scroll generating a too-big trace on Windows (was: system_health.common_desktop failing on 5 builders)
Going to go ahead and assign this to Ned.
Mergedinto: 728785
Status: Duplicate (was: Assigned)
Aaaand there's already been someone that did all of this investigation :-(
(Just to save someone the trouble of clicking the dupe link: this story was disabled on Windows by Stephen 24h ago and is passing in the last run: https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FWin_10_High-DPI_Perf%2F664%2F%2B%2Frecipes%2Fsteps%2Fsystem_health.common_desktop_on_Intel_GPU_on_Windows_on_Windows-10-10240%2F0%2Fstdout)

=== BISECT JOB RESULTS ===
Bisect failed for unknown reasons

Please contact the team (see below) and report the error.


Bisect Details
  Configuration: winx64_high_dpi_perf_bisect
  Benchmark    : system_health.common_desktop
  Metric       : after_load:power_avg/after_load:power_avg


To Run This Test
  src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests system_health.common_desktop

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8977876431121517248

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5825151280611328


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!

=== BISECT JOB RESULTS ===
Bisect failed for unknown reasons

Please contact the team (see below) and report the error.


Bisect Details
  Configuration: winx64_high_dpi_perf_bisect
  Benchmark    : system_health.common_desktop
  Metric       : after_load:power_avg/after_load:power_avg


To Run This Test
  src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests system_health.common_desktop

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8977797067289172512

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5825151280611328


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!

Sign in to add a comment