Incorrect encoding when symbolizing a trace on Windows |
||
Issue description
There is an error that 'may' occurs when symbolizing a trace on Windows.
A trace collected with chrome://tracing and the command-line flag '--enable-heap-profiling=native' contains StackFrame (raw address). The symbolization script is used to convert these addresses to function names by using the information in the PDB.
Unfortunately, it's common to encounter an utf-8 encoding error.
The same trace may or may not produce an error.
Updating trace file...
Traceback (most recent call last):
File "third_party\catapult\tracing\bin\symbolize_trace", line 519, in <module>
main()
File "third_party\catapult\tracing\bin\symbolize_trace", line 513, in main
json.dump(trace, trace_file)
File "c:\src\depot_tools\python276_bin\lib\json\__init__.py", line 189, in dump
for chunk in iterable:
File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 434, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 408, in _iterencode_dict
for chunk in chunks:
File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 332, in _iterencode_list
for chunk in chunks:
File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 408, in _iterencode_dict
for chunk in chunks:
File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 408, in _iterencode_dict
for chunk in chunks:
File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 408, in _iterencode_dict
for chunk in chunks:
File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 390, in _iterencode_dict
yield _encoder(value)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xaf in position 2000: invalid start byte
,
Apr 20 2017
I've found the bug (I hope).
The 'random' character is at position 2000.
Which is strangely the same than MAX_SYM_NAME (debghelp):
#define MAX_SYM_NAME 2000
The python script is using addr2line-pdb, which is using the WinDbg API to resolve the symbol:
https://cs.chromium.org/chromium/src/third_party/tcmalloc/chromium/src/windows/addr2line-pdb.c?l=155
pSymbol->SizeOfStruct = sizeof(SYMBOL_INFO);
pSymbol->MaxNameLen = MAX_SYM_NAME;
if (print_function_name) {
if (SymFromAddr(process, (DWORD64)absaddr, NULL, pSymbol)) {
printf("%s\n", pSymbol->Name);
} else {
printf("??\n");
}
}
As I get it from MS doc, it is possible that the '\0' is not present:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms680686(v=vs.85).aspx
The PDB classification. These values are defined in Dbghelp.h in the SymTagEnum enumeration type.
NameLen
The length of the name, in characters, not including the null-terminating character.
MaxNameLen
The size of the Name buffer, in characters. If this member is 0, the Name member is not used.
Name
The name of the symbol. The name can be undecorated if the SYMOPT_UNDNAME option is used with the SymSetOptions function.
,
Apr 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0ff178d157b08717ac405502ad648e6f7bfecd1d commit 0ff178d157b08717ac405502ad648e6f7bfecd1d Author: etienneb <etienneb@chromium.org> Date: Thu Apr 20 19:58:02 2017 Fix potential missing nul character on resolved symbol names The symbol name returned by SymFromName may not contains a NUL character when the symbol name is exactly the size of the buffer. It believe this may also happen when the symbol name is too long and truncated. The original code is based on: https://msdn.microsoft.com/en-us/library/windows/desktop/ms680580(v=vs.85).aspx A right implementation can be found here: https://cs.chromium.org/chromium/src/base/debug/stack_trace_win.cc?l=145&rcl=f4ecb9e37e9e2d59e32b8b96f23ac4a1e33b9552 As described here: https://msdn.microsoft.com/en-us/library/windows/desktop/ms680686(v=vs.85).aspx NameLen The length of the name, in characters, not including the null-terminating character. MaxNameLen The size of the Name buffer, in characters. If this member is 0, the Name member is not used. This issue was causing the catapult symbolisation script to encode incorrect (random) characters into the symbol names. See the example in the bug. R=wfh@chromium.org, chrisha@chromium.org, erikchen@chromium.org, ajwong@chromium.org BUG= 713741 Review-Url: https://codereview.chromium.org/2832643004 Cr-Commit-Position: refs/heads/master@{#466098} [modify] https://crrev.com/0ff178d157b08717ac405502ad648e6f7bfecd1d/third_party/tcmalloc/README.chromium [modify] https://crrev.com/0ff178d157b08717ac405502ad648e6f7bfecd1d/third_party/tcmalloc/chromium/src/windows/addr2line-pdb.c
,
May 1 2017
|
||
►
Sign in to add a comment |
||
Comment 1 by etienneb@chromium.org
, Apr 20 2017