"telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testCollectAgentTraceDataFormat" is flaky |
|||||||||
Issue description"telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testCollectAgentTraceDataFormat" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNygwELEgVGbGFrZSJ4dGVsZW1ldHJ5LmludGVybmFsLnBsYXRmb3JtLnRyYWNpbmdfYWdlbnQuY3B1X3RyYWNpbmdfYWdlbnRfdW5pdHRlc3QuQ3B1VHJhY2luZ0FnZW50VGVzdC50ZXN0Q29sbGVjdEFnZW50VHJhY2VEYXRhRm9ybWF0DA. Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs
,
Jan 18 2018
Issue 803249 has been merged into this issue.
,
Jan 18 2018
Issue 803229 has been merged into this issue.
,
Jan 18 2018
Issue 803211 has been merged into this issue.
,
Jan 18 2018
It's possible that this is due to Ehsan's tracing change. But for now I am suspecting Juan's recent work
,
Jan 18 2018
Errors appear to be on:
telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest
Traceback (most recent call last):
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent_unittest.py", line 61, in testStartAgentTracing
self.assertTrue(self._agent.StartAgentTracing(self._config, 0))
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 289, in StartAgentTracing
self._KeepTakingSnapshots()
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 298, in _KeepTakingSnapshots
(self._collector.GetProcesses(), trace_time.Now()))
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 119, in GetProcesses
processes = super(WindowsProcessCollector, self).GetProcesses()
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 69, in GetProcesses
self._ParseProcessString(proc_string) for proc_string in proc_strings
TypeError: 'NoneType' object is not iterable
From: https://chromium-swarm.appspot.com/task?id=3b1c2155bc1bb210&refresh=10&show_raw=1&wide_logs=true
Looking into it ...
,
Jan 18 2018
Actually, it looks like the error is coming from here: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/cpu_tracing_agent.py?rcl=fbd65219a8245c27c19f00ea37d3136e00d9a993&l=144 Which from the logs shows up as: Error when running wmic command, which gave output: (note: no output is given) I don't think this is related to my changes. Would there be a way to get more diagnostic info? As the message is logged as a warning, is it safe to just return an empty list here to make the error non-fatal?
,
Jan 18 2018
Looks like that warning message was added by Charlie on issue 732584 to investigate some similar flakiness. And the problem just went away that time. Assigning to Charlie to decide what to do next.
,
Jan 18 2018
I see no flakes for this since 2018-01-17 20:29:31 UTC. Dropping from sheriff queue for now, but please do your best to fix this test soon or a future sheriff will disable it.
,
Jan 18 2018
Have a patch in the CQ now which adds logging of the exit code of the command, which should help us debug this. Going to snooze this for 4 days, during which time hopefully another flake happens.
,
Jan 18 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/031058b18eab12c71dd9a0122dd0d8a525d01f72 commit 031058b18eab12c71dd9a0122dd0d8a525d01f72 Author: Charlie Andrews <charliea@chromium.org> Date: Thu Jan 18 20:37:47 2018 Log wmic exit code when it fails Previously, when wmic failed, I tried to log the output that it gave, hoping that it would give some clue as to why it failed. Unfortunately, it looks like when it fails, _no_ output is given. wmic does give very specific exit codes when it fails, though, so I'm hoping that this latest logging will help us diagnose the failure. TBR=nednguyen@google.com Bug: chromium:803210 Change-Id: I9f31a08712545940c4c9f2a82aac77fb99fb013e Reviewed-on: https://chromium-review.googlesource.com/874599 Reviewed-by: Charlie Andrews <charliea@chromium.org> Commit-Queue: Charlie Andrews <charliea@chromium.org> [modify] https://crrev.com/031058b18eab12c71dd9a0122dd0d8a525d01f72/telemetry/telemetry/internal/platform/tracing_agent/cpu_tracing_agent.py
,
Jan 19 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/9886544c22cf9a4763e6d8a400ae3a1f1c02777e commit 9886544c22cf9a4763e6d8a400ae3a1f1c02777e Author: Charlie Andrews <charliea@chromium.org> Date: Fri Jan 19 15:29:51 2018 Reraise exception when wmic fails This was suggested by perezju@, and ensures that this root exception is raised instead of a more obscure one down the line. R=perezju@chromium.org Bug: chromium:803210 Change-Id: I432b4699e17209ae67b25b3b5f841099d3ed3dd3 Reviewed-on: https://chromium-review.googlesource.com/875912 Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> Commit-Queue: Charlie Andrews <charliea@chromium.org> [modify] https://crrev.com/9886544c22cf9a4763e6d8a400ae3a1f1c02777e/telemetry/telemetry/internal/platform/tracing_agent/cpu_tracing_agent.py
,
Jan 30 2018
Detected 3 new flakes for test/step "telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testTraceSpecifiesTelemetryClockDomain". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyigELEgVGbGFrZSJ_dGVsZW1ldHJ5LmludGVybmFsLnBsYXRmb3JtLnRyYWNpbmdfYWdlbnQuY3B1X3RyYWNpbmdfYWdlbnRfdW5pdHRlc3QuQ3B1VHJhY2luZ0FnZW50VGVzdC50ZXN0VHJhY2VTcGVjaWZpZXNUZWxlbWV0cnlDbG9ja0RvbWFpbgw. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
,
Jan 30 2018
Logs:
=== testTraceSpecifiesTelemetryClockDomain ===
[115/292] telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testTraceSpecifiesTelemetryClockDomain queuedERROR:
Description =
[116/292] telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testTraceSpecifiesTelemetryClockDomain failed unexpectedly 0.7080s:
wmic failed with error code -1073738817 when running command, which gave output:
Traceback (most recent call last):
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent_unittest.py", line 143, in testTraceSpecifiesTelemetryClockDomain
self._agent.StartAgentTracing(self._config, 0)
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 289, in StartAgentTracing
self._collector.Init()
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 116, in Init
self._GetProcessesAsStrings()
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 143, in _GetProcessesAsStrings
self._GET_PERF_DATA_SHELL_COMMAND).strip().split('\n')[2:]
File "c:\infra-system\bin\lib\subprocess.py", line 219, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['wmic', 'path', 'Win32_PerfFormattedData_PerfProc_Process', 'get', 'CreatingProcessID,IDProcess,Name,PercentProcessorTime,WorkingSet']' returned non-zero exit status -1073738817
[116/292] telemetry.internal.results.artifact_results_unittest.ArtifactResultsUnittest.testAddAndMove queued
== testCollectAgentTraceDataFormat ===
ERROR:
Description = [1/3] telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testCollectAgentTraceDataFormat failed unexpectedly 0.7670s:
wmic failed with error code -1073738817 when running command, which gave output:
Traceback (most recent call last):
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent_unittest.py", line 107, in testCollectAgentTraceDataFormat
self._agent.StartAgentTracing(self._config, 0)
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 289, in StartAgentTracing
self._collector.Init()
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 116, in Init
self._GetProcessesAsStrings()
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 143, in _GetProcessesAsStrings
self._GET_PERF_DATA_SHELL_COMMAND).strip().split('\n')[2:]
File "c:\infra-system\bin\lib\subprocess.py", line 219, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['wmic', 'path', 'Win32_PerfFormattedData_PerfProc_Process', 'get', 'CreatingProcessID,IDProcess,Name,PercentProcessorTime,WorkingSet']' returned non-zero exit status -1073738817
ERROR:
=== testStartAgentTracing ===
ERROR:
Description = [2/3] telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testStartAgentTracing failed unexpectedly 0.7750s:
wmic failed with error code -1073738817 when running command, which gave output:
Traceback (most recent call last):
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent_unittest.py", line 61, in testStartAgentTracing
self.assertTrue(self._agent.StartAgentTracing(self._config, 0))
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 289, in StartAgentTracing
self._collector.Init()
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 116, in Init
self._GetProcessesAsStrings()
File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 143, in _GetProcessesAsStrings
self._GET_PERF_DATA_SHELL_COMMAND).strip().split('\n')[2:]
File "c:\infra-system\bin\lib\subprocess.py", line 219, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['wmic', 'path', 'Win32_PerfFormattedData_PerfProc_Process', 'get', 'CreatingProcessID,IDProcess,Name,PercentProcessorTime,WorkingSet']' returned non-zero exit status -1073738817
,
Jan 30 2018
the pending CL is https://chromium-review.googlesource.com/c/catapult/+/892862.
,
Jan 30 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/857b65094956a0c65aa76fa2247d6217129cc3e7 commit 857b65094956a0c65aa76fa2247d6217129cc3e7 Author: vitaliii <vitaliii@chromium.org> Date: Tue Jan 30 12:34:10 2018 Disable some telemetry.i.p.t.c.CpuTracingAgentTest tests on Win (flaky). Disable: telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testTraceSpecifiesTelemetryClockDomain telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testCollectAgentTraceDataFormat telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testStartAgentTracing on Win, because flaky. TBR=perezju@chromium.org Bug: chromium:803210 Change-Id: I27b61d197a6831d0a5133ac2cfeb30f8f8d8bf1d Reviewed-on: https://chromium-review.googlesource.com/892862 Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> [modify] https://crrev.com/857b65094956a0c65aa76fa2247d6217129cc3e7/telemetry/telemetry/internal/platform/tracing_agent/cpu_tracing_agent_unittest.py
,
Jan 30 2018
Please have a look and reenable the tests. Thank you!
,
Jan 30 2018
Great! I was waiting for this to happen. Thanks for reporting it vitaliii@. It looks like wmic is failing with error code -1073738817 as interpreted by Python. Python interprets this error code as a 32 bit signed integer, whereas in reality, it's actually a 32 bit unsigned integer. Using http://www.binaryconvert.com/result_signed_int.html?decimal=045049048055051055051056056049055, I can find out that the number is equivalent to 0xC0000BBF in hex. I then go to the WMI error constants page (https://msdn.microsoft.com/en-us/library/aa394559(v=vs.85).aspx) and look up that hex number and... nothing. brucedawson@, is there any chance you could help me figure out what this means? I'm stumped.
,
Feb 1 2018
I search the VS include files for C0000BBF and found two hits. Unfortunately error codes are not namespaced and it's not clear which (if either) of these is applicable. Here they are: C:\Program Files (x86)\Windows Kits\10\Include\10.0.15063.0\um\PdhMsg.h #define PDH_CSTATUS_NO_COUNTERNAME 0xC0000BBFL ...\Windows Kits\10\Include\10.0.15063.0\um\WindowsSearchErrors.h // // MessageId: EVENT_GATHERER_PERFMON // // MessageText: // // Performance monitoring cannot be initialized for the gatherer object, because the counters are not loaded or the shared memory object cannot be opened. This only affects availability of the perfmon counters. Restart the computer.%1 // #define EVENT_GATHERER_PERFMON 0xC0000BBFL The name of the second one sounds encouraging but WindowsSearchErrors.h seems like an odd location for a WMI error. So maybe WMI uses Pdh? BTW, I filed bug 803617 for the confusion caused by printing the inherently hexadecimal error codes as signed decimal.
,
Feb 2 2018
Thanks for the tip Bruce! Searching around, I also found https://msdn.microsoft.com/en-us/library/windows/desktop/aa373214(v=vs.85).aspx In there is: WMI, Event 10 Event filter with query "select * from __instancemodificationevent within 30 where targetinstance isa ‘Win32_PerfFormattedData_PerfDisk_LogicalDisk’ and targetinstance.PercentFreeSpace < 1 and targetinstance.Name != ‘_Total’" could not be reactivated in namespace "//./root/CIMV2" because of error 0xc0000bbf. Events cannot be delivered through this filter until the problem is corrected. WMI, Event 5612: Windows Management Instrumentation has stopped WMIPRVSE.EXE because a quota reached a warning value. Quota: HandleCount Value: 12853 Maximum value: 12288 WMIPRVSE PID: 7232 http://bit.ly/2nC6KGf also seems to suggest that the counters can get into a corrupted state and need to be cleared. It's unfortunate that this seems to be happening every ~10 days, though. That's not very reliable :-/ Both also seem to suggest that, if I'm serious about debugging this, then finding some way to dump the event logs from when the WMI command was run is the next step.
,
Feb 2 2018
I'm probably going to put this on my backlog, but not work on it right now. It doesn't seem like there's a really obvious fix to this, and it seems like the sort of thing that I could spend an entire week trying to debug. There are also no severe ramifications of it, given that things are working well enough when this code is used in Telemetry that we haven't had any errors reported.
,
Feb 6 2018
Issue 753257 has been merged into this issue.
,
Feb 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/874bc7c4eabb1a1b08968210723ba4137b4a76aa commit 874bc7c4eabb1a1b08968210723ba4137b4a76aa Author: Charlie Andrews <charliea@chromium.org> Date: Tue Feb 06 21:43:59 2018 Disable all CPU tracing agent integration tests on Windows All of these tests are susceptible to a bug where wmic fails for seemingly no reason. TBR=nednguyen@google.com Bug: chromium:803210 Change-Id: Ice0d0bec30a4564b6437c5dd8b5d1dd489893c44 Reviewed-on: https://chromium-review.googlesource.com/905284 Reviewed-by: Charlie Andrews <charliea@chromium.org> Commit-Queue: Charlie Andrews <charliea@chromium.org> [modify] https://crrev.com/874bc7c4eabb1a1b08968210723ba4137b4a76aa/telemetry/telemetry/internal/platform/tracing_agent/cpu_tracing_agent_unittest.py
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by nedngu...@google.com
, Jan 18 2018Owner: perezju@chromium.org
Status: Assigned (was: Untriaged)