Trace size of battor + chrome is too big, leading to python OOM |
||||
Issue descriptionLog: https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Win%20Power%20Perf%20%28DELL%29/builds/75/steps/BattOr.BattOrCases/logs/stdio I think the fix here is we update the call in https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/value/trace.py#L51 to use python subprocess module to launch the command "tracing/bin/html2trace <chrome_trace.tmp> <battor_trace.tmp> ..." so that there is less memory pressure on the current telemetry python process. Randy: can you take over making this fix?
,
Jun 16 2016
One thing to note is that all the Windows machines use 32 bit python, which is the version available in depot_tools. That means it's limited to 4GB RAM. We could consider adding 64 bit python to depot_tools, but we will hit this anyway when the 2GB and 4GB laptops come online.
,
Jun 16 2016
,
Jun 16 2016
,
Jun 16 2016
There is apparently no easy way to get memory footprints for python objects/dicts.
,
Jun 16 2016
Randy: once we go to the route of using trace2html as a python subprocess, we will need to dump the dictionary object or string to files to do "trace2html <file1> <file2> ...". At that point, we can log the files size.
,
Jun 16 2016
Yeah, I came to that realization a few minutes ago. Thanks!
,
Jun 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/56226927344ff0bd246de7fb1f19d2720fa64fa8 commit 56226927344ff0bd246de7fb1f19d2720fa64fa8 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Sat Jun 18 02:11:38 2016 Roll src/third_party/catapult/ f7df8568a..713623b39 (11 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/f7df8568ae1f..713623b39217 $ git log f7df8568a..713623b39 --date=short --no-merges --format='%ad %ae %s' BUG= 621035 , 620550 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2079983002 Cr-Commit-Position: refs/heads/master@{#400571} [modify] https://crrev.com/56226927344ff0bd246de7fb1f19d2720fa64fa8/DEPS
,
Jun 18 2016
Unfortunatly it looks like that will not solve the memory problem...
(INFO) 2016-06-17 21:23:26,381 trace._GetTempFileHandle:65 Trace of size 7840261 bytes saved.
(INFO) 2016-06-17 21:24:28,430 trace._GetTempFileHandle:65 Trace of size 258748382 bytes saved.
(INFO) 2016-06-17 21:24:28,436 trace._GetTempFileHandle:65 Trace of size 40 bytes saved.
(INFO) 2016-06-17 21:24:28,487 trace._GetTempFileHandle:65 Trace of size 856 bytes saved.
Traceback (most recent call last):
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\value\..\..\..\tracing\bin\trace2html", line 14, in <module>
sys.exit(trace2html.Main(sys.argv))
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\tracing\tracing_build\trace2html.py", line 47, in Main
WriteHTMLForTracesToFile(args.trace_files, f, config_name=args.config_name)
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\tracing\tracing_build\trace2html.py", line 110, in WriteHTMLForTracesToFile
trace_data = json.loads(trace_data)
File "C:\b\depot_tools\python276_bin\lib\json\__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "C:\b\depot_tools\python276_bin\lib\json\decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\b\depot_tools\python276_bin\lib\json\decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
MemoryError
Traceback (most recent call last):
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 86, in _RunStoryAndProcessErrorIfNeeded
test.Measure(state.platform, results)
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\web_perf\timeline_based_measurement.py", line 280, in Measure
trace_value = trace.TraceValue(results.current_page, trace_result)
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\value\trace.py", line 37, in __init__
self._temp_file = self._GetTempFileHandle(trace_data)
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\value\trace.py", line 72, in _GetTempFileHandle
subprocess.check_call(cmd)
File "C:\b\depot_tools\python276_bin\lib\subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['python', 'C:\\b\\build\\slave\\Win_Power_Perf__DELL_\\build\\src\\third_party\\catapult\\telemetry\\telemetry\\value\\..\\..\\..\\tracing\\bin\\trace2html', 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpoww80r\\0.trace', 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpoww80r\\1.trace', 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpoww80r\\2.trace', 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpoww80r\\3.trace', '--output', 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpmuofao.html']' returned non-zero exit status 1
,
Jun 18 2016
Would increasing the swap space on the laptop solve this? I dont know enough about how memory is handled by python to know. And even if it does, will that change the behavior of the system in ways that would be detrimental to the perf tests?
,
Jun 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e3576191681a39b8df1093d43c541b2d18ab4f25 commit e3576191681a39b8df1093d43c541b2d18ab4f25 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Sat Jun 18 06:57:26 2016 Roll src/third_party/catapult/ ea37a065d..bc8cee3e0 (1 commit). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/ea37a065d0ca..bc8cee3e012f $ git log ea37a065d..bc8cee3e0 --date=short --no-merges --format='%ad %ae %s' BUG= 620550 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2077873004 Cr-Commit-Position: refs/heads/master@{#400582} [modify] https://crrev.com/e3576191681a39b8df1093d43c541b2d18ab4f25/DEPS
,
Jun 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fe573658df0fc425dc958daf925866d3eaf95d41 commit fe573658df0fc425dc958daf925866d3eaf95d41 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Sat Jun 18 07:48:39 2016 Roll src/third_party/catapult/ bc8cee3e0..184187257 (1 commit). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/bc8cee3e012f..184187257422 $ git log bc8cee3e0..184187257 --date=short --no-merges --format='%ad %ae %s' BUG= 620550 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2081503002 Cr-Commit-Position: refs/heads/master@{#400585} [modify] https://crrev.com/fe573658df0fc425dc958daf925866d3eaf95d41/DEPS
,
Jun 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6dd6bf24186bcf7bb585698150a17762677da939 commit 6dd6bf24186bcf7bb585698150a17762677da939 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Sat Jun 18 13:08:02 2016 Roll src/third_party/catapult/ 184187257..0dd659eb5 (1 commit). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/184187257422..0dd659eb524f $ git log 184187257..0dd659eb5 --date=short --no-merges --format='%ad %ae %s' BUG= 620550 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2076983004 Cr-Commit-Position: refs/heads/master@{#400591} [modify] https://crrev.com/6dd6bf24186bcf7bb585698150a17762677da939/DEPS
,
Jun 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/09621481168cd94f09dfe06cfeb4c09120b16b8e commit 09621481168cd94f09dfe06cfeb4c09120b16b8e Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Sat Jun 18 14:49:20 2016 Roll src/third_party/catapult/ 0dd659eb5..de3ea6fd8 (1 commit). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/0dd659eb524f..de3ea6fd8d96 $ git log 0dd659eb5..de3ea6fd8 --date=short --no-merges --format='%ad %ae %s' BUG= 620550 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2078043003 Cr-Commit-Position: refs/heads/master@{#400593} [modify] https://crrev.com/09621481168cd94f09dfe06cfeb4c09120b16b8e/DEPS
,
Jun 20 2016
After #12, the battor.power_cases is greening, but still quite flaky with this stack:
Traceback (most recent call last):
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\internal\platform\tracing_controller_backend.py", line 259, in CollectAgentTraceData
os.remove(self._trace_log)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\chrome~1.lab\\appdata\\local\\temp\\tmpqfhwih'
(INFO) 2016-06-20 05:50:31,869 trace._GetTempFileHandle:65 Trace (TraceDataPart("powerTraceAsString")) of size 7861861 bytes saved.
(INFO) 2016-06-20 05:51:35,315 trace._GetTempFileHandle:65 Trace (TraceDataPart("traceEvents")) of size 261223556 bytes saved.
(INFO) 2016-06-20 05:51:35,319 trace._GetTempFileHandle:65 Trace (TraceDataPart("tabIds")) of size 40 bytes saved.
(INFO) 2016-06-20 05:51:35,322 trace._GetTempFileHandle:65 Trace (TraceDataPart("telemetry")) of size 858 bytes saved.
c:\users\chrome~1.lab\appdata\local\temp\tmpmb3i6g.html
Traceback (most recent call last):
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\value\failure.py", line 41, in _GetExcInfoFromMessage
raise Exception(message)
Exception: Unknown stack
------------------------------------------------------
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 86, in _RunStoryAndProcessErrorIfNeeded
test.Measure(state.platform, results)
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\web_perf\timeline_based_measurement.py", line 284, in Measure
self._ComputeTimelineBasedMetric(results, trace_value)
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\telemetry\telemetry\web_perf\timeline_based_measurement.py", line 302, in _ComputeTimelineBasedMetric
trace_value.filename, metric, extra_import_options)
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\tracing\tracing\metrics\metric_runner.py", line 31, in RunMetric
result = RunMetricOnTraces([filename], metric, extra_import_options)
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\tracing\tracing\metrics\metric_runner.py", line 42, in RunMetricOnTraces
map_results = runner.RunMapper()
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\perf_insights\perf_insights\map_runner.py", line 94, in RunMapper
self._wq.Run()
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\perf_insights\perf_insights\mre\threaded_work_queue.py", line 38, in Run
self._RunMultiThreaded()
File "C:\b\build\slave\Win_Power_Perf__DELL_\build\src\third_party\catapult\perf_insights\perf_insights\mre\threaded_work_queue.py", line 106, in _RunMultiThreaded
t.start()
File "C:\b\depot_tools\python276_bin\lib\threading.py", line 745, in start
_start_new_thread(self.__bootstrap, ())
error: can't start new thread
[ FAILED ] about:blank (67334 ms)
I suspect this is still due to the high memory usage.
At this point, I think the best action is to either reduce the chrome tracing categories or reduce the length of the story (currently around 45s in total). I would vote for reduce the length of story to only wait 20s (is currently 30s) after the loading here.
Anyone has any other thoughts?
,
Jun 21 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ad5510111c3ffc65f0a122cc88bf57e80996bbdf commit ad5510111c3ffc65f0a122cc88bf57e80996bbdf Author: rnephew <rnephew@chromium.org> Date: Tue Jun 21 19:04:34 2016 [Telemetry] Decrease run time of power_cases page set from 30 to 10. BUG= 620550 Review-Url: https://codereview.chromium.org/2084553005 Cr-Commit-Position: refs/heads/master@{#401066} [modify] https://crrev.com/ad5510111c3ffc65f0a122cc88bf57e80996bbdf/tools/perf/page_sets/power_cases.py
,
Jun 24 2016
I think this is okay to close, right? I think that the answer that we came up with is that collecting a trace w/ all of the default categories enabled for 30s on a normal page is just too much information. Reducing this to 10s improved the situation drastically. If work still needs to be done on this, feel free to reopen. |
||||
►
Sign in to add a comment |
||||
Comment 1 by nedngu...@google.com
, Jun 16 2016