quipper UnitTests fail with libc++ |
|||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/13923 command line used: cbuildbot --remote -g '574646 510029 509896' --latest-toolchain ${board}-release quipper-0.0.1-r1885: [----------] Global test environment tear-down quipper-0.0.1-r1885: [==========] 148 tests from 16 test cases ran. (11070 ms total) quipper-0.0.1-r1885: [ PASSED ] 141 tests. quipper-0.0.1-r1885: [ FAILED ] 7 tests, listed below: quipper-0.0.1-r1885: [ FAILED ] DsoTest.ReadsBuildId quipper-0.0.1-r1885: [ FAILED ] DsoTest.ReadsBuildId_PrefersGnuBuildid quipper-0.0.1-r1885: [ FAILED ] PerfParserTest.ReadsBuildidsInMountNamespace quipper-0.0.1-r1885: [ FAILED ] PerfParserTest.ReadsBuildidsInMountNamespace_TriesOwningProcess quipper-0.0.1-r1885: [ FAILED ] PerfParserTest.ReadsBuildidsInMountNamespace_TriesRootFs quipper-0.0.1-r1885: [ FAILED ] PerfParserTest.ReadsBuildidsInMountNamespace_TriesRootFsNoInodeToReject quipper-0.0.1-r1885: [ FAILED ] PerfParserTest.OverwriteBuildidIfAlreadyKnown
,
Aug 29 2017
Lakshaman told me that CWP is going to move quipper and corresponding UnitTests to a newer version at https://github.com/google/perf_data_converter . So not much point in debugging these fails.
,
Aug 30 2017
One of the testcases fail with the newer version of quipper when libc++ is used to build. This testcase expects buildids to be overwritten if there are elfs file containing different buildids. During the setup phase of the testcase, expected buildids are written to elf files. Later these files are read to overwrite some buidlids. The issue is with writing to elf files. The buildids provided by the testcase are not written correctly to the elf files and they end up having corrupted data. We found that the issue is not with the testcase or the code that creates an elf object with the expected data. Rather. it looks like the elf library is messing up with the data. FYI, at least the elf library from the internal source code base doesn't have any cpp code.
,
Aug 30 2017
elf library should not matter. It is already present in chroot, is not being rebuilt and therefore, using libc++ should not change its behavior.
Looking at quipper source code, My guess is UnitTest is relying on a library dependent iteration order of map/unordered_map. Obviously, I could be wrong but this is something to verify.
bool PerfParser::FillInDsoBuildIds() {
std::map<string, string> filenames_to_build_ids;
reader_->GetFilenamesToBuildIDs(&filenames_to_build_ids);
std::map<string, string> new_buildids;
for (std::pair<const string, DSOInfo>& kv : name_to_dso_) { // This iteration order is library dependent. I believe there are more instances of these iterations.
DSOInfo& dso_info = kv.second;
e.g. If a map has elements [ 2, 4, 5, 7] , libc++ may iterate as [ 5, 7, 2, 4] while libstdc++(gcc's c++ library) order may be [ 4, 7, 2, 5 ].
,
Aug 30 2017
my above comment is only for std::unordered_set/unordered_map. Iterating on std::map should be same in both libraries.
,
Aug 30 2017
I wrote a small program that reproduces the exact issue. So, we could understand it better. This program writes the string "\xfe\xfe\xfe\xfe" twice to an elf file at the path "/tmp/quipper_elf_file". Copy the program into quipper source directory. Run the quipper make command and build this program. A sample command to build the program: g++ -Imybase -Icompat/ext -I../third_party -I. -I.. -I../third_party/protobuf/src -I../third_party/googletest/googletest/include -std=c++11 -g -Wall -Werror -Wall -Wno-error -o dso_test_utils_check dso_test_utils_check.cc compat/ext/detail/log_level.o mybase/base/logging.o string_utils.o -lpthread -lgcov -lelf -lssl -lcrypto ../third_party/protobuf/src/.libs/libprotobuf.a -lz -I../third_party/googletest/googletest ../third_party/googletest/googletest/src/gtest-all.cc -lcap This program would print out the data to be written at the location 0x00000040 in the file. In both the builds with/without libc++ the data printed out to stdout are the same (i.e., correct). But if you do a `hexdump -x <file>` of the generated elf file, the data written in the row beginning with 0000040 and 0000050 are different. File generated by the build without libc++ would have the data printed out to stdout. But the file generated by the build with libc++ would have garbage value at this location. So, the data provided to the elf object are the same in both the builds. But the elf file generated by the elf library has incorrect data.
,
Aug 30 2017
Thanks Lakshman, I am surprised that the generated file is different. I'll try to debug one my end and report back my observations.
,
Sep 21 2017
I looked at the quipper fails today and was able to root cause the bug.
Root cause: elf data is pointing to a string data content in ElfDataCache data_cache object. data_cache object is local to WriteElfWithMultipleBuildids function and is destroyed after the call.
void WriteElfWithMultipleBuildids(
...
ElfDataCache data_cache;
...
Elf_Data *data = data_cache.AddDataToSection(section, data_str); // The contents of data->d_buf are invalid after data_cache Object goes out of scope.
...
}
As a result, data pointer is pointing to a piece of deallocated memory.
Later calls to read the ReadBuildId fail because the memory address pointed by data->d_buf in elf is invalid.
Following change to use heap memory fixes the underlying issue.
--- a/chromiumos-wide-profiling/dso_test_utils.cc
+++ b/chromiumos-wide-profiling/dso_test_utils.cc
@@ -54,7 +54,9 @@ class ElfDataCache {
string &d = cache_.back();
Elf_Data *data = elf_newdata(section);
CHECK(data) << elf_errmsg(-1);
- data->d_buf = const_cast<char*>(d.data()); /// d_buf is pointing to string contents. This does not work once the vector object goes out of scope.
+// data->d_buf = const_cast<char*>(d.data());
+ data->d_buf = new char[d.size()]; // Create memory on heap and copy the contents
+ memcpy(data->d_buf, d.data(), d.size());
data->d_size = d.size();
return data;
}
Assigning to Lakshaman to fix the memory issue.
,
Sep 21 2017
This is a working patch to fix the issue.
diff --git a/chromiumos-wide-profiling/dso_test_utils.cc b/chromiumos-wide-profiling/dso_test_utils.cc
index d83e5842f..8e63d2615 100644
--- a/chromiumos-wide-profiling/dso_test_utils.cc
+++ b/chromiumos-wide-profiling/dso_test_utils.cc
@@ -45,22 +45,28 @@ class ElfStringTable {
};
// Helper to control the scope of data in Elf_Data.d_buf added to a Elf_Scn.
-// Basically, makes sure that the .data() set to d_buf remains valid, and
+// Basically, makes sure that the d_buf pointer remains valid, and
// holds on to a copy of your buffer so you don't have to.
class ElfDataCache {
public:
Elf_Data *AddDataToSection(Elf_Scn *section, const string &data_str) {
- cache_.emplace_back(data_str);
- string &d = cache_.back();
Elf_Data *data = elf_newdata(section);
CHECK(data) << elf_errmsg(-1);
- data->d_buf = const_cast<char*>(d.data());
- data->d_size = d.size();
+ char *data_storage = new char[data_str.size() + 1]; // avoid zero memory allocation
+ cache_.emplace_back(data_storage);
+ memcpy(data_storage, data_str.data(), data_str.size());
+ data->d_buf = data_storage;
+ data->d_size = data_str.size();
return data;
}
+ ~ElfDataCache() {
+ for(auto& s: cache_) {
+ delete[] s;
+ }
+ }
private:
- std::vector<string> cache_;
+ std::vector<char*> cache_;
};
} // namespace
@@ -88,7 +94,7 @@ void WriteElfWithMultipleBuildids(
CHECK(elf_update(elf, ELF_C_NULL) > 0) << elf_errmsg(-1);
ElfStringTable string_table;
- ElfDataCache data_cache;
+ static ElfDataCache data_cache; // Static so that cache object is not destroyed after the function call.
// Note section(s)
for (const auto& entry : section_buildids) {
,
Oct 31 2017
Verified that the problem is fixed. |
|||
►
Sign in to add a comment |
|||
Comment 1 by manojgupta@chromium.org
, Aug 24 2017