stressapptest can crash when trying to print Hardware Errors |
||||||
Issue description
Running:
stressapptest -M 1500 -s 100000
...on hardware with memory errors crashes stressapptest rather than printing the errors.
---
tl;dr: We need this patch to stressapptest:
$ diff -Naur os.cc.bak os.cc
--- os.cc.bak 2017-01-30 10:33:56.570855707 -0800
+++ os.cc 2017-01-30 10:34:34.947362047 -0800
@@ -63,6 +63,7 @@
dynamic_mapped_shmem_ = false;
mmapped_allocation_ = false;
shmid_ = 0;
+ channels_ = NULL;
time_initialized_ = 0;
---
An example crash was:
[168032.493255] stressapptest[21400]: unhandled level 2 translation fault (11) at 0xa0a24ff5, esr 0x92000006
[168032.503417] pgd = ffffffc09c8f6000
[168032.506963] [a0a24ff5] *pgd=00000000792c3003, *pud=00000000792c3003, *pmd=0000000000000000
[168032.515534]
[168032.517221] CPU: 2 PID: 21400 Comm: stressapptest Tainted: G W 4.4.21 #501
[168032.525331] Hardware name: Google Kevin (DT)
[168032.529770] task: ffffffc0ca223800 ti: ffffffc0785d8000 task.ti: ffffffc0785d8000
[168032.537383] PC is at 0xab0012b0
[168032.540671] LR is at 0x33333333
[168032.543935] pc : [<00000000ab0012b0>] lr : [<0000000033333333>] pstate: 200d0030
[168032.551490] sp : 00000000f3cb8d58
[168032.554948] x12: 0000000055555555
[168032.558515] x11: 0000000001010101 x10: 00000000010f0f0f
[168032.564001] x9 : 00000000f3cb8de0 x8 : 0000000000000100
[168032.569519] x7 : 00000000f3cb8d78 x6 : 0000000000000000
[168032.575061] x5 : 0000000000000000 x4 : 0000000006060603
[168032.580550] x3 : 0000000000000000 x2 : 000000000002ac6c
[168032.586036] x1 : 00000000a0a24ff5 x0 : 0000000010a228f8
[168032.591529]
Segmentation fault (core dumped)
---
My debugging was:
As far as I can tell, it's always crashing dereferencing r1 and r1 is always a0a24ff5.
I downloaded the stressapptest.debug symbols from the server, then combined:
eu-unstrip /b/tip/tmp/stressapptest /b/tip/tmp/9202_symb/stressapptest.debug
I then objdumped:
armv7a-cros-linux-gnueabi-objdump -drSF /b/tip/tmp/9202_symb/stressapptest.debug | less
I then looked for PC ending with 2b0 where we were dereferencing r1. That gave me one hit:
---
00004238 <_ZN7OsLayer8FindDimmEyPci> (File Offset: 0x4238):
4238: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
423c: af03 add r7, sp, #12
423e: b085 sub sp, #20
4240: 494d ldr r1, [pc, #308] ; (4378 <_ZN7OsLayer8FindDimmEyPci+0x140> (File Offset: 0x4378))
4242: 4479 add r1, pc
4244: 6809 ldr r1, [r1, #0]
4246: 6809 ldr r1, [r1, #0]
4248: 9104 str r1, [sp, #16]
424a: 6bc1 ldr r1, [r0, #60] ; 0x3c
424c: e9d7 9806 ldrd r9, r8, [r7, #24]
4250: 2900 cmp r1, #0
4252: d063 beq.n 431c <_ZN7OsLayer8FindDimmEyPci+0xe4> (File Offset: 0x431c)
4254: e9d0 6510 ldrd r6, r5, [r0, #64] ; 0x40
4258: f04f 3c55 mov.w ip, #1431655765 ; 0x55555555
425c: f04f 3e33 mov.w lr, #858993459 ; 0x33333333
4260: f640 7a0f movw sl, #3855 ; 0xf0f
4264: f04f 3b01 mov.w fp, #16843009 ; 0x1010101
4268: 6c80 ldr r0, [r0, #72] ; 0x48
426a: f2c0 1a0f movt sl, #271 ; 0x10f
426e: 4016 ands r6, r2
4270: 401d ands r5, r3
4272: ea0c 0456 and.w r4, ip, r6, lsr #1
4276: 1b36 subs r6, r6, r4
4278: ea0e 0496 and.w r4, lr, r6, lsr #2
427c: f026 36cc bic.w r6, r6, #3435973836 ; 0xcccccccc
4280: 4434 add r4, r6
4282: ea0c 0655 and.w r6, ip, r5, lsr #1
4286: 1bae subs r6, r5, r6
4288: eb04 1414 add.w r4, r4, r4, lsr #4
428c: ea0e 0596 and.w r5, lr, r6, lsr #2
4290: f026 36cc bic.w r6, r6, #3435973836 ; 0xcccccccc
4294: ea04 040a and.w r4, r4, sl
4298: 442e add r6, r5
429a: fb04 f40b mul.w r4, r4, fp
429e: eb06 1616 add.w r6, r6, r6, lsr #4
42a2: ea06 060a and.w r6, r6, sl
42a6: fb06 f60b mul.w r6, r6, fp
42aa: 4066 eors r6, r4
42ac: f3c6 6600 ubfx r6, r6, #24, #1
operator[](size_type __n) _GLIBCXX_NOEXCEPT
{
#if __google_stl_debug_vector
_M_range_check(__n);
#endif
return *(this->_M_impl._M_start + __n);
42b0: 6809 ldr r1, [r1, #0]
42b2: ea46 0646 orr.w r6, r6, r6, lsl #1
---
I ran:
armv7a-cros-linux-gnueabi-gdb /b/tip/tmp/9202_symb/stressapptest.debug
...and then gdb can deal with this better with "disass /s 0x4238", showing us as being in "OsLayer::FindDimm", more specifically:
279 uint32 high = static_cast<uint32>((addr & channel_hash_) >> 32);
280 vector<string>& channel = (*channels_)[
281 __builtin_parity(high) ^ __builtin_parity(low)];
0x00004272 <+58>: and.w r4, r12, r6, lsr #1
0x00004276 <+62>: subs r6, r6, r4
0x00004278 <+64>: and.w r4, lr, r6, lsr #2
0x0000427c <+68>: bic.w r6, r6, #3435973836 ; 0xcccccccc
0x00004280 <+72>: add r4, r6
0x00004282 <+74>: and.w r6, r12, r5, lsr #1
0x00004286 <+78>: subs r6, r5, r6
0x00004288 <+80>: add.w r4, r4, r4, lsr #4
0x0000428c <+84>: and.w r5, lr, r6, lsr #2
0x00004290 <+88>: bic.w r6, r6, #3435973836 ; 0xcccccccc
0x00004294 <+92>: and.w r4, r4, r10
0x00004298 <+96>: add r6, r5
0x0000429a <+98>: mul.w r4, r4, r11
0x0000429e <+102>: add.w r6, r6, r6, lsr #4
0x000042a2 <+106>: and.w r6, r6, r10
0x000042a6 <+110>: mul.w r6, r6, r11
0x000042aa <+114>: eors r6, r4
0x000042ac <+116>: ubfx r6, r6, #24, #1
/usr/bin/../lib/gcc/armv7a-cros-linux-gnueabi/4.9.x/include/g++-v4/bits/stl_vector.h:
866 return *(this->_M_impl._M_start + __n);
0x000042b0 <+120>: ldr r1, [r1, #0]
0x000042b2 <+122>: orr.w r6, r6, r6, lsl #1
728 return size_type(this->_M_impl._M_finish - this->_M_impl._M_start);
0x000042b6 <+126>: ldr.w r4, [r1, r6, lsl #2]
---
So if I understand correctly we are trying to dereference "channels_" ? I think that will be uninitialized if SetDramMappingParams() hasn't been called. My C++ is rusty, but I _think_ that there's no guarantee that uninitialized member variables are 0, right?
...and it looks as if SetDramMappingParams() is only called if:
if (channels_.size() > 0) {
,
Jan 30 2017
Dunno if we need a local copy or not. I was thinking we could just fix it "upstream" and then submit a roll to portage to pick it up, then get the newer portage package. That seems like the cleanest way...
,
Jan 31 2017
yeah, we want to stop building any random tarballs+source out of the autotools repo and instead have dedicated ebuilds in chromiumos-overlay (or portage-stable if possible) if you want to post a CL that fixes the ebuild, we can get that into Gentoo
,
Jan 31 2017
Here's the fixed upstream: https://github.com/stressapptest/stressapptest/releases (1.0.8) vapier, can you pull the new version to gentoo, then chrome os?
,
Jan 31 2017
ah, easy enough! :) now in Gentoo: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=8c2a6c26aa3f4902ea1b86a0fac9b1d27d51874d you should be able to pull it into portage-stable via cros_portage_upgrade
,
Feb 3 2017
Should be checked in now.
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by nsanders@chromium.org
, Jan 30 2017