New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 620550 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Trace size of battor + chrome is too big, leading to python OOM

Project Member Reported by nedngu...@google.com, Jun 16 2016

Issue description

Log: https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Win%20Power%20Perf%20%28DELL%29/builds/75/steps/BattOr.BattOrCases/logs/stdio

I think the fix here is we update the call in https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/value/trace.py#L51 to use python subprocess module to launch the command "tracing/bin/html2trace <chrome_trace.tmp> <battor_trace.tmp> ..." so that there is less memory pressure on the current telemetry python process.

Randy: can you take over making this fix?
 
Also can you add logging about the trace size in trace_data.TraceData so we can be sure that this MemoryError is due to the trace size but not something else?
One thing to note is that all the Windows machines use 32 bit python, which is the version available in depot_tools. That means it's limited to 4GB RAM. We could consider adding 64 bit python to depot_tools, but we will hit this anyway when the 2GB and 4GB laptops come online.
Project Member

Comment 3 by sheriffbot@chromium.org, Jun 16 2016

Labels: Hotlist-Google
Owner: rnep...@chromium.org
Status: Started (was: Untriaged)
There is apparently no easy way to get memory footprints for python objects/dicts.
Randy: once we go to the route of using trace2html as a python subprocess, we will need to dump the dictionary object or string to files to do "trace2html <file1> <file2> ...". At that point, we can log the files size.
Yeah, I came to that realization a few minutes ago. Thanks!
Project Member

Comment 8 by bugdroid1@chromium.org, Jun 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/56226927344ff0bd246de7fb1f19d2720fa64fa8

commit 56226927344ff0bd246de7fb1f19d2720fa64fa8
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Sat Jun 18 02:11:38 2016

Roll src/third_party/catapult/ f7df8568a..713623b39 (11 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/f7df8568ae1f..713623b39217

$ git log f7df8568a..713623b39 --date=short --no-merges --format='%ad %ae %s'

BUG= 621035 , 620550 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2079983002
Cr-Commit-Position: refs/heads/master@{#400571}

[modify] https://crrev.com/56226927344ff0bd246de7fb1f19d2720fa64fa8/DEPS

Unfortunatly it looks like that will not solve the memory problem...

(INFO) 2016-06-17 21:23:26,381 trace._GetTempFileHandle:65  Trace of size 7840261 bytes saved.
(INFO) 2016-06-17 21:24:28,430 trace._GetTempFileHandle:65  Trace of size 258748382 bytes saved.
(INFO) 2016-06-17 21:24:28,436 trace._GetTempFileHandle:65  Trace of size 40 bytes saved.
(INFO) 2016-06-17 21:24:28,487 trace._GetTempFileHandle:65  Trace of size 856 bytes saved.
Traceback (most recent call last):
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\value\..\..\..\tracing\bin\trace2html", line 14, in <module>
    sys.exit(trace2html.Main(sys.argv))
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\tracing\tracing_build\trace2html.py", line 47, in Main
    WriteHTMLForTracesToFile(args.trace_files, f, config_name=args.config_name)
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\tracing\tracing_build\trace2html.py", line 110, in WriteHTMLForTracesToFile
    trace_data = json.loads(trace_data)
  File "C:\b\depot_tools\python276_bin\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\b\depot_tools\python276_bin\lib\json\decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\b\depot_tools\python276_bin\lib\json\decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
MemoryError
Traceback (most recent call last):
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 86, in _RunStoryAndProcessErrorIfNeeded
    test.Measure(state.platform, results)
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\web_perf\timeline_based_measurement.py", line 280, in Measure
    trace_value = trace.TraceValue(results.current_page, trace_result)
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\value\trace.py", line 37, in __init__
    self._temp_file = self._GetTempFileHandle(trace_data)
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\value\trace.py", line 72, in _GetTempFileHandle
    subprocess.check_call(cmd)
  File "C:\b\depot_tools\python276_bin\lib\subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['python', 'C:\\b\\build\\slave\\Win_Power_Perf__DELL_\\build\\src\\third_party\\catapult\\telemetry\\telemetry\\value\\..\\..\\..\\tracing\\bin\\trace2html', 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpoww80r\\0.trace', 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpoww80r\\1.trace', 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpoww80r\\2.trace', 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpoww80r\\3.trace', '--output', 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpmuofao.html']' returned non-zero exit status 1

Would increasing the swap space on the laptop solve this? I dont know enough about how memory is handled by python to know.

And even if it does, will that change the behavior of the system in ways that would be detrimental to the perf tests?
Project Member

Comment 11 by bugdroid1@chromium.org, Jun 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e3576191681a39b8df1093d43c541b2d18ab4f25

commit e3576191681a39b8df1093d43c541b2d18ab4f25
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Sat Jun 18 06:57:26 2016

Roll src/third_party/catapult/ ea37a065d..bc8cee3e0 (1 commit).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/ea37a065d0ca..bc8cee3e012f

$ git log ea37a065d..bc8cee3e0 --date=short --no-merges --format='%ad %ae %s'

BUG= 620550 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2077873004
Cr-Commit-Position: refs/heads/master@{#400582}

[modify] https://crrev.com/e3576191681a39b8df1093d43c541b2d18ab4f25/DEPS

Project Member

Comment 12 by bugdroid1@chromium.org, Jun 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fe573658df0fc425dc958daf925866d3eaf95d41

commit fe573658df0fc425dc958daf925866d3eaf95d41
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Sat Jun 18 07:48:39 2016

Roll src/third_party/catapult/ bc8cee3e0..184187257 (1 commit).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/bc8cee3e012f..184187257422

$ git log bc8cee3e0..184187257 --date=short --no-merges --format='%ad %ae %s'

BUG= 620550 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2081503002
Cr-Commit-Position: refs/heads/master@{#400585}

[modify] https://crrev.com/fe573658df0fc425dc958daf925866d3eaf95d41/DEPS

Project Member

Comment 13 by bugdroid1@chromium.org, Jun 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6dd6bf24186bcf7bb585698150a17762677da939

commit 6dd6bf24186bcf7bb585698150a17762677da939
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Sat Jun 18 13:08:02 2016

Roll src/third_party/catapult/ 184187257..0dd659eb5 (1 commit).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/184187257422..0dd659eb524f

$ git log 184187257..0dd659eb5 --date=short --no-merges --format='%ad %ae %s'

BUG= 620550 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2076983004
Cr-Commit-Position: refs/heads/master@{#400591}

[modify] https://crrev.com/6dd6bf24186bcf7bb585698150a17762677da939/DEPS

Project Member

Comment 14 by bugdroid1@chromium.org, Jun 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/09621481168cd94f09dfe06cfeb4c09120b16b8e

commit 09621481168cd94f09dfe06cfeb4c09120b16b8e
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Sat Jun 18 14:49:20 2016

Roll src/third_party/catapult/ 0dd659eb5..de3ea6fd8 (1 commit).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/0dd659eb524f..de3ea6fd8d96

$ git log 0dd659eb5..de3ea6fd8 --date=short --no-merges --format='%ad %ae %s'

BUG= 620550 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2078043003
Cr-Commit-Position: refs/heads/master@{#400593}

[modify] https://crrev.com/09621481168cd94f09dfe06cfeb4c09120b16b8e/DEPS

After #12, the battor.power_cases is greening, but still quite flaky with this stack:
Traceback (most recent call last):
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\internal\platform\tracing_controller_backend.py", line 259, in CollectAgentTraceData
    os.remove(self._trace_log)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpqfhwih'
(INFO) 2016-06-20 05:50:31,869 trace._GetTempFileHandle:65  Trace (TraceDataPart("powerTraceAsString")) of size 7861861 bytes saved.
(INFO) 2016-06-20 05:51:35,315 trace._GetTempFileHandle:65  Trace (TraceDataPart("traceEvents")) of size 261223556 bytes saved.
(INFO) 2016-06-20 05:51:35,319 trace._GetTempFileHandle:65  Trace (TraceDataPart("tabIds")) of size 40 bytes saved.
(INFO) 2016-06-20 05:51:35,322 trace._GetTempFileHandle:65  Trace (TraceDataPart("telemetry")) of size 858 bytes saved.
c:\users\chrome~1.lab\appdata\local\temp\tmpmb3i6g.html
Traceback (most recent call last):
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\value\failure.py", line 41, in _GetExcInfoFromMessage
    raise Exception(message)
Exception: Unknown stack
------------------------------------------------------
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 86, in _RunStoryAndProcessErrorIfNeeded
    test.Measure(state.platform, results)
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\web_perf\timeline_based_measurement.py", line 284, in Measure
    self._ComputeTimelineBasedMetric(results, trace_value)
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\web_perf\timeline_based_measurement.py", line 302, in _ComputeTimelineBasedMetric
    trace_value.filename, metric, extra_import_options)
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\tracing\tracing\metrics\metric_runner.py", line 31, in RunMetric
    result = RunMetricOnTraces([filename], metric, extra_import_options)
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\tracing\tracing\metrics\metric_runner.py", line 42, in RunMetricOnTraces
    map_results = runner.RunMapper()
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\perf_insights\perf_insights\map_runner.py", line 94, in RunMapper
    self._wq.Run()
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\perf_insights\perf_insights\mre\threaded_work_queue.py", line 38, in Run
    self._RunMultiThreaded()
  File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\perf_insights\perf_insights\mre\threaded_work_queue.py", line 106, in _RunMultiThreaded
    t.start()
  File "C:\b\depot_tools\python276_bin\lib\threading.py", line 745, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread

[  FAILED  ] about:blank (67334 ms)


I suspect this is still due to the high memory usage.

At this point, I think the best action is to either reduce the chrome tracing categories or reduce the length of the story (currently around 45s in total). I would vote for reduce the length of story to only wait 20s (is currently 30s) after the loading here.

Anyone has any other thoughts?
Project Member

Comment 16 by bugdroid1@chromium.org, Jun 21 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ad5510111c3ffc65f0a122cc88bf57e80996bbdf

commit ad5510111c3ffc65f0a122cc88bf57e80996bbdf
Author: rnephew <rnephew@chromium.org>
Date: Tue Jun 21 19:04:34 2016

[Telemetry] Decrease run time of power_cases page set from 30 to 10.

BUG= 620550 

Review-Url: https://codereview.chromium.org/2084553005
Cr-Commit-Position: refs/heads/master@{#401066}

[modify] https://crrev.com/ad5510111c3ffc65f0a122cc88bf57e80996bbdf/tools/perf/page_sets/power_cases.py

Status: Fixed (was: Started)
I think this is okay to close, right? I think that the answer that we came up with is that collecting a trace w/ all of the default categories enabled for 30s on a normal page is just too much information. Reducing this to 10s improved the situation drastically.

If work still needs to be done on this, feel free to reopen.

Sign in to add a comment