New issue
Advanced search Search tips

Issue 803617 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2018
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Windows crash return codes should be printed in recognizable hex when appropriate

Project Member Reported by brucedaw...@chromium.org, Jan 18 2018

Issue description

 Bug 644525  leads to failure messages like these:

C:/b/depot_tools/python276_bin/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py event.proto events_chunk.proto --protoc ./../Release/protoc --proto-in-dir ../../components/tracing/proto --plugin proto_zero_plugin.exe --plugin-out-dir gen/components/tracing/proto --plugin-options wrapper_namespace=pbzero
--plugin_out: protoc-gen-plugin: Plugin failed with status code 3221225477.

Protoc has returned non-zero status: 1 .


C:/b/depot_tools/python276_bin/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py profile_reset_report.proto --protoc ./../Release/protoc --proto-in-dir ../../chrome/browser/profile_resetter --cc-out-dir gen/chrome/browser/profile_resetter --py-out-dir pyproto/chrome/browser/profile_resetter
Protoc has returned non-zero status: -1073741819 .


c:/src/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../build/gn_run_binary.py genperf.exe ../../third_party/yasm/source/patched-yasm/modules/arch/x86/x86regtmod.gperf gen/third_party/yasm/x86regtmod.c
genperf.exe failed with exit code -1073741819




It is not at all obvious that -1073741819 and 3221225477 are the same number. It is also not obvious that 0xC0000005 is also the same number. That's a shame because 0xC0000005 is much easier to read and is more easily identifiable and searchable as meaning Access Violation.

When we notice that we have a large negative error code we should print it out as a four-digit hex number. Note that there are multiple places where this fix will be needed, including gn_run_binary.py, protoc_wrapper.py, and in the protoc plugin manager as well. If we can at least get the more common ones that would be a good start.

 

See also these confusing errors from bug 803210:

=== testStartAgentTracing ===

ERROR:

Description = [2/3] telemetry.internal.platform.tracing_agent.cpu_tracing_agent_unittest.CpuTracingAgentTest.testStartAgentTracing failed unexpectedly 0.7750s:
  wmic failed with error code -1073738817 when running command, which gave output: 
  Traceback (most recent call last):
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent_unittest.py", line 61, in testStartAgentTracing
      self.assertTrue(self._agent.StartAgentTracing(self._config, 0))
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 289, in StartAgentTracing
      self._collector.Init()
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 116, in Init
      self._GetProcessesAsStrings()
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py", line 143, in _GetProcessesAsStrings
      self._GET_PERF_DATA_SHELL_COMMAND).strip().split('\n')[2:]
    File "c:\infra-system\bin\lib\subprocess.py", line 219, in check_output
      raise CalledProcessError(retcode, cmd, output=output)
  CalledProcessError: Command '['wmic', 'path', 'Win32_PerfFormattedData_PerfProc_Process', 'get', 'CreatingProcessID,IDProcess,Name,PercentProcessorTime,WorkingSet']' returned non-zero exit status -1073738817
More recently this error code was hit:

[34/32000] ACTION //chrome/browser:conflicts_module_list_proto_gen(//build/toolchain/win:win_clang_x64)
FAILED: gen/chrome/browser/conflicts/proto/module_list.pb.h gen/chrome/browser/conflicts/proto/module_list.pb.cc pyproto
/chrome/browser/conflicts/proto/module_list_pb2.py
C:/python_27_amd64/files/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py module_list.proto --protoc ./protoc.exe
 --proto-in-dir ../../chrome/browser/conflicts/proto --cc-out-dir gen/chrome/browser/conflicts/proto --py-out-dir pyprot
o/chrome/browser/conflicts/proto
Protoc has returned non-zero status: -1073740791 .

-1073740791 (0xC0000409) is easily confused with -1073741819 (0xC0000005) in decimal but is clearly different in hex. Fixing this will help with bug reporting and triage. There are several print strings to look for:

    a) --plugin_out: protoc-gen-plugin: Plugin failed with status code 3221225477.
    b) Protoc has returned non-zero status: -1073741819 .
    c) 'CreatingProcessID,IDProcess,Name,PercentProcessorTime,WorkingSet']' returned non-zero exit status -1073738817
    d) genperf.exe failed with exit code -1073741819
    e) wmic failed with error code -1073738817 when running command, which gave output:

I might as well fixed the inconsistent trailing punctuation while I'm at it :-)

The sources of the five confusing types of error messages shown above appear to be:

    a) third_party\protobuf\src\google\protobuf\compiler\subprocess.cc
    b) tools\protoc_wrapper\protoc_wrapper.py
    c) ???
    d) build\gn_run_binary.py
    e) third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\cpu_tracing_agent.py

b) and d) seem to be the most important so I'll create a CL to address them.

Project Member

Comment 3 by bugdroid1@chromium.org, Feb 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8c50b3ed049c736406943ac24dc40fc8ac1e782c

commit 8c50b3ed049c736406943ac24dc40fc8ac1e782c
Author: Bruce Dawson <brucedawson@chromium.org>
Date: Thu Feb 15 00:08:36 2018

Print Windows crash codes in hex

Failure codes such as STATUS_ACCESS_VIOLATION are easily recognizable
and easily differentiated when printed in hex (0xC0000005) but are
cryptic and conflated when printed as decimal (-1073741819). This change
teaches two of our wrapper scripts to print large negative numbers as
hex so that those skilled in the Windows arts can automatically say
"access violation" or "not access violation."

I also removed an inelegant trailing period, for consistency.

In testing with artificially inserted error codes the output is:
  Protoc has returned non-zero status: -99
  Protoc has returned non-zero status: 0xC0000005
  genperf.exe failed with exit code -99
  re2c.exe failed with exit code 0xC0000005

Bug:  803617 , 644525 
Change-Id: I627754976ff04e334010d36e5734d73421523e47
Reviewed-on: https://chromium-review.googlesource.com/917101
Commit-Queue: Bruce Dawson <brucedawson@chromium.org>
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Cr-Commit-Position: refs/heads/master@{#536899}
[modify] https://crrev.com/8c50b3ed049c736406943ac24dc40fc8ac1e782c/build/gn_run_binary.py
[modify] https://crrev.com/8c50b3ed049c736406943ac24dc40fc8ac1e782c/tools/protoc_wrapper/protoc_wrapper.py

Status: Fixed (was: Assigned)
Filed an internal bug (73395471) for protobuf's plugin error codes (version a).

Version c is an error formatted by wmic so it is out of our control, and it is covered to a large extent by issue e.

Filed this issue for catapult (version e):

https://github.com/catapult-project/catapult/issues/4264

So that's it and I'm going to close this as fixed given that the remaining issues are reported, external, and low priority.

Status: Started (was: Fixed)
Reopening this bug because a new instance was found:

Task ran but no result was found: shard 0 test output was missing
some shards did not complete: 0
step returned non-zero exit code: -1073741515

-1073741515 is cryptic and unrecognizable. -1073741515 maps to 0xC0000135 which is searchable and recognizable. This error printing comes from infra\infra\recipes-py\recipe_engine\run.py. I will fix this next.

Project Member

Comment 6 by bugdroid1@chromium.org, Feb 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/recipes-py/+/c12134ce50260d37d0b39aee92d1fc53c6565a0c

commit c12134ce50260d37d0b39aee92d1fc53c6565a0c
Author: Bruce Dawson <brucedawson@chromium.org>
Date: Thu Feb 22 01:02:59 2018

Print Windows crash codes in hex

Failure codes such as STATUS_ACCESS_VIOLATION are easily recognizable
and easily differentiated when printed in hex (0xC0000005) but are
cryptic and conflated when printed as decimal (-1073741819). This change
teaches run.py to print large negative numbers as hex so that those
skilled in the Windows arts can automatically say "access violation" or
"not access violation."

Bug:  803617 
Change-Id: If21428af470fa0d513f7f8a9ece6e3096d09567d
Reviewed-on: https://chromium-review.googlesource.com/927800
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: Erik Staab <estaab@chromium.org>
Commit-Queue: Bruce Dawson <brucedawson@chromium.org>

[modify] https://crrev.com/c12134ce50260d37d0b39aee92d1fc53c6565a0c/recipe_engine/run.py

Status: Fixed (was: Started)
The good news is that this build break is a great test of my change to print Windows error codes in much more readable hex:

[36434/36541] ACTION //tools/v8_context_snapshot:generate_v8_context_snapshot(//build/toolchain/win:x86)
FAILED: v8_context_snapshot.bin 
C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../build/gn_run_binary.py ./v8_context_snapshot_generator --output_file=v8_context_snapshot.bin
./v8_context_snapshot_generator failed with exit code 0xC0000005
[36435/36541] LINK(DLL) chrome.dll chrome.dll.lib chrome.dll.pdb
ninja: build stopped: subcommand failed.
step returned non-zero exit code: 1

http://build.chromium.org/p/chromium.fyi/builders/Chromium%20Win%20PGO%20Builder/builds/14854

The bad news is I have no idea why this failed with an access violation. Previously I would have assumed that this was  crbug.com/644525  but that bug is fixed. Random failure? PGO code-gen bug? ???

Anyway, I'm closing this bug as fixed and I'll open a bug for the PGO failure if it happens again, although PGO builders (VC++) are not long for this world.

Sign in to add a comment