system_health.common_desktop failing on Win 10 High-DPI Perf |
||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromium.perf/builders/Win%2010%20High-DPI%20Perf?numbuilds=200 So system_health.common_Desktop is still failing after the fix for crbug.com/799106 . It looks like it has been timing out since at least November in tracing_controller_backend.py line 291 trying to access a log file (https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py?q=tracing_controller_backend&sq=package:chromium&l=291) (INFO) 2018-01-12 02:42:15,437 cache_temperature.EnsurePageCacheTemperature:181 PageCacheTemperature: any (INFO) 2018-01-12 02:44:10,438 chrome_tracing_agent.RecordClockSyncMarker:183 Chrome version: 3320 (INFO) 2018-01-12 02:45:30,969 battor_wrapper.CollectTraceData:271 CollectTraceData serial log: (INFO) 2018-01-12 02:45:30,969 cloud_storage.Insert:378 Uploading c:\b\s\w\it2ypqoi\tmpwpyjyr to gs://chrome-telemetry-output/battor-serial-log-2018-01-12_02-45.txt-65225.txt View BattOr serial log at https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/battor-serial-log-2018-01-12_02-45.txt-65225.txt (ERROR) 2018-01-12 02:45:37,596 tracing_controller_backend.CollectAgentTraceData:295 Error when deleting c:\b\s\w\it2ypqoi\tmpvs6vtp, will try again at exit. Traceback (most recent call last): File "c:\b\s\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_controller_backend.py", line 291, in CollectAgentTraceData os.remove(self._trace_log) WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\b\\s\\w\\it2ypqoi\\tmpvs6vtp' (INFO) 2018-01-12 02:45:37,878 trace_data.Serialize:191 Trace sizes in bytes: {'cpuSnapshots': 1595363L, 'traceEvents': 368114071L, 'telemetry': 98040L, 'tabIds': 143L, 'powerTraceAsString': 25980869L} I can't tell if it is due to a particular story from the logs. Ned and Charlie, you had a few CLs in early November around the return value of TraceValue. They could be completely unrelated given I don't know when it started timing out, but do either of you have insight onto what file it could be locked on?
,
Jan 16 2018
I have a potential fix for this in the CQ. My plan is to try each of these, in order, stopping when the problem is fixed: 1) Add logging after trace2html completes 2) If the problem persists and it's apparent that trace2html isn't completing, add a heartbeat during the trace2html command. If it's apparent that it's the metric processing, add a heartbeat during the metric computation command I'll check back tomorrow to see if that first fix works.
,
Jan 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/da6be7b0dc364fb96c688e504b2d9c70073f83d7 commit da6be7b0dc364fb96c688e504b2d9c70073f83d7 Author: Charlie Andrews <charliea@chromium.org> Date: Tue Jan 16 15:43:18 2018 Emit output when trace2html is complete We should be emitting output after all long-running subtasks are complete in order to give hints to swarming that we're making forward progress and therefore shouldn't run up against the I/O timeout. In this case, it's not unusual for this command to take upwards of a minute and it's immediately followed by another long-running task (calculating metrics from the trace). Bug: chromium:801596 Change-Id: Ia393ca6d5120cc3d7f5e46c82911fb82bca26204 Reviewed-on: https://chromium-review.googlesource.com/867935 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Charlie Andrews <charliea@chromium.org> [modify] https://crrev.com/da6be7b0dc364fb96c688e504b2d9c70073f83d7/telemetry/telemetry/timeline/trace_data.py
,
Jan 18 2018
I've confirmed that this seems to have fixed the problem for now. If this timeout happens in the future, on this platform or any other, we may have to consider either adding heartbeat messages as the tracing metrics are computed or trying to optimize the tracing metric computation script. |
||
►
Sign in to add a comment |
||
Comment 1 by charliea@chromium.org
, Jan 16 2018Status: Assigned (was: Untriaged)