New issue
Advanced search Search tips

Issue 713741 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Incorrect encoding when symbolizing a trace on Windows

Project Member Reported by etienneb@chromium.org, Apr 20 2017

Issue description

There is an error that 'may' occurs when symbolizing a trace on Windows.

A trace collected with chrome://tracing and the command-line flag '--enable-heap-profiling=native' contains StackFrame (raw address). The symbolization script is used to convert these addresses to function names by using the information in the PDB.

Unfortunately, it's common to encounter an utf-8 encoding error.
The same trace may or may not produce an error. 

Updating trace file...
Traceback (most recent call last):
  File "third_party\catapult\tracing\bin\symbolize_trace", line 519, in <module>
    main()
  File "third_party\catapult\tracing\bin\symbolize_trace", line 513, in main
    json.dump(trace, trace_file)
  File "c:\src\depot_tools\python276_bin\lib\json\__init__.py", line 189, in dump
    for chunk in iterable:
  File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 434, in _iterencode
    for chunk in _iterencode_dict(o, _current_indent_level):
  File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 332, in _iterencode_list
    for chunk in chunks:
  File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "c:\src\depot_tools\python276_bin\lib\json\encoder.py", line 390, in _iterencode_dict
    yield _encoder(value)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xaf in position 2000: invalid start byte
 
The error is happening when encoding the resulting json file to disk.

The following stack-frame can't be encoded. Turn out there is a invalid character at the end.

base::internal::Invoker<base::internal::BindState<void (__cdecl QuotaPolicyChannelIDStore::*)(base::Callback<void __cdecl(std::unique_ptr<std::vector<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> >,std::allocator<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> > > >,std::default_delete<std::vector<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> >,std::allocator<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> > > > > >),1,1> const & __ptr64,std::unique_ptr<std::vector<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> >,std::allocator<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> > > >,std::default_delete<std::vector<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> >,std::allocator<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> > > > > >) __ptr64,scoped_refptr<QuotaPolicyChannelIDStore>,base::Callback<void __cdecl(std::unique_ptr<std::vector<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> >,std::allocator<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> > > >,std::default_delete<std::vector<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> >,std::allocator<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> > > > > >),1,1> >,void __cdecl(std::unique_ptr<std::vector<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelID> >,std::allocator<std::unique_ptr<net::ChannelIDStore::ChannelID,std::default_delete<net::ChannelIDStore::ChannelĀ»
I've found the bug (I hope).

The 'random' character is at position 2000.
Which is strangely the same than MAX_SYM_NAME (debghelp):

    #define MAX_SYM_NAME            2000

The python script is using addr2line-pdb, which is using the WinDbg API to resolve the symbol:

https://cs.chromium.org/chromium/src/third_party/tcmalloc/chromium/src/windows/addr2line-pdb.c?l=155

    pSymbol->SizeOfStruct = sizeof(SYMBOL_INFO);
    pSymbol->MaxNameLen = MAX_SYM_NAME;
    if (print_function_name) {
      if (SymFromAddr(process, (DWORD64)absaddr, NULL, pSymbol)) {
        printf("%s\n", pSymbol->Name);
      } else {
        printf("??\n");
      }
    }


As I get it from MS doc, it is possible that the '\0' is not present:
  https://msdn.microsoft.com/en-us/library/windows/desktop/ms680686(v=vs.85).aspx


The PDB classification. These values are defined in Dbghelp.h in the SymTagEnum enumeration type.
NameLen
The length of the name, in characters, not including the null-terminating character.
MaxNameLen
The size of the Name buffer, in characters. If this member is 0, the Name member is not used.
Name
The name of the symbol. The name can be undecorated if the SYMOPT_UNDNAME option is used with the SymSetOptions function.




Project Member

Comment 3 by bugdroid1@chromium.org, Apr 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0ff178d157b08717ac405502ad648e6f7bfecd1d

commit 0ff178d157b08717ac405502ad648e6f7bfecd1d
Author: etienneb <etienneb@chromium.org>
Date: Thu Apr 20 19:58:02 2017

Fix potential missing nul character on resolved symbol names

The symbol name returned by SymFromName may not contains a NUL character
when the symbol name is exactly the size of the buffer. It believe this
may also happen when the symbol name is too long and truncated.

The original code is based on:
  https://msdn.microsoft.com/en-us/library/windows/desktop/ms680580(v=vs.85).aspx

A right implementation can be found here:
  https://cs.chromium.org/chromium/src/base/debug/stack_trace_win.cc?l=145&rcl=f4ecb9e37e9e2d59e32b8b96f23ac4a1e33b9552

As described here:
  https://msdn.microsoft.com/en-us/library/windows/desktop/ms680686(v=vs.85).aspx

  NameLen
    The length of the name, in characters, not including the null-terminating character.
  MaxNameLen
    The size of the Name buffer, in characters. If this member is 0, the Name member is not used.

This issue was causing the catapult symbolisation script to encode incorrect (random) characters into the symbol names.
See the example in the bug.

R=wfh@chromium.org, chrisha@chromium.org, erikchen@chromium.org, ajwong@chromium.org
BUG= 713741 

Review-Url: https://codereview.chromium.org/2832643004
Cr-Commit-Position: refs/heads/master@{#466098}

[modify] https://crrev.com/0ff178d157b08717ac405502ad648e6f7bfecd1d/third_party/tcmalloc/README.chromium
[modify] https://crrev.com/0ff178d157b08717ac405502ad648e6f7bfecd1d/third_party/tcmalloc/chromium/src/windows/addr2line-pdb.c

Status: Fixed (was: Assigned)

Sign in to add a comment