New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 2 users

Issue metadata

Status: Fixed
Closed: Mar 29
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 0
Type: Bug

Sign in to add a comment

moblab-paladin flake: ERROR: Unhandled UpstartServiceNotRunning: Upstart service moblab-gsoffloader-init not in running state.

Project Member Reported by, Mar 7 Back to list

Issue description

This is hitting both moblab-generic-vm-paladin and guado_moblab-paladin
But guado_moblab is marked experimental for hardware reasons so it's visible on moblab-generic-vm

This is hitting too often (to the point where paladin needs immediate attention / marked experimental) and started recently.

I suspect there should be an obvious CL recently that made gs_offloader not come up consistently.
haddowk: I'm deputy so it's hard for me to focus on one thing this week.
Can you help find the culprit?

I think it should be a CL in the last few consistently green builds:

Last CQ run failed just because of this. Gotta mark this experimental :(

Updated tree status.
Status: Started
One thing I see in /var/log/messages for the bad runs but not the good run I am comparing against is:

2018-03-07T05:39:46.231493+00:00 INFO kernel: [   31.769299] crossystem[3301]: segfault at 22 ip 000055c0b9496e2e sp 00007fffbf5b1d50 error 4 in crossystem[55c0b9494000+9000]

From the crash dump looks like crossystem is crashing

Figured out how to do stack dump with symbols

Operating system: Linux
                  0.0.0 Linux 4.4.120-13171-g478a13d28edc #1 SMP PREEMPT Wed Mar 7 09:04:25 PST 2018 x86_64
CPU: amd64
     family 6 model 42 stepping 1
     8 CPUs


Crash reason:  SIGSEGV
Crash address: 0x0
Process uptime: not available

Thread 0 (crashed)
 0  crossystem!vb2_get_nv_storage + 0x2c
    rax = 0x0000000000000000   rdx = 0x00005a61898590c0
    rcx = 0x0000000000000000   rbx = 0x0000000000000013
    rsi = 0x0000000000000000   rdi = 0x00005a6189859140
    rbp = 0x00007ffe755d2660   rsp = 0x00007ffe755d2650
     r8 = 0x0000000000000000    r9 = 0x00000000000000ef
    r10 = 0x00007ba172d71530   r11 = 0x0000000000000246
    r12 = 0x00007ffe755d66a8   r13 = 0x0000000000000000
    r14 = 0x0000000000000000   r15 = 0x0000000000000000
    rip = 0x00005a6189851e2e
    Found by: given as instruction pointer in context
 1  crossystem!VbGetSystemPropertyInt.part.2 + 0x50
    rbx = 0x00005a6189854f48   rbp = 0x00007ffe755d6690
    rsp = 0x00007ffe755d2670   r12 = 0x00007ffe755d66a8
    r13 = 0x0000000000000000   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x00005a618985263d
    Found by: call frame info
 2  crossystem!PrintAllParams + 0x60
    rbx = 0x00005a6189858360   rbp = 0x00007ffe755d86d0
    rsp = 0x00007ffe755d66a0   r12 = 0x00007ffe755d66a8
    r13 = 0x0000000000000000   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x00005a6189851a43
    Found by: call frame info
 3!__libc_start_main [libc-start.c : 289 + 0x1a]
    rbx = 0x0000000000000000   rbp = 0x00007ffe755d87a0
    rsp = 0x00007ffe755d86e0   r12 = 0x00005a6189854d60
    r13 = 0x00007ffe755d87c0   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x00007ba172c5d736
    Found by: call frame info
 4  crossystem!_start + 0x29
    rbx = 0x0000000000000000   rbp = 0x0000000000000000
    rsp = 0x00007ffe755d87b0   r12 = 0x00005a61898515d0
    r13 = 0x00007ffe755d87c0   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x00005a61898515f9
    Found by: call frame info
 5  0x7ffe755d87b8
    rbx = 0x0000000000000000   rbp = 0x0000000000000000
    rsp = 0x00007ffe755d87b8   r12 = 0x00005a61898515d0
    r13 = 0x00007ffe755d87c0   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x00007ffe755d87b8
    Found by: call frame info
 6  crossystem!main + 0x290
    rsp = 0x00007ffe755d8900   rip = 0x00005a61898515d0
    Found by: stack scanning

Loaded modules:
0x5a618984f000 - 0x5a6189857fff  crossystem  ???  (main)
0x7ba172c3d000 - 0x7ba172dddfff  ???
0x7ba172fe8000 - 0x7ba17300bfff  ???
0x7ffe755e8000 - 0x7ffe755e9fff  ???
2018-03-07 13:40:08: INFO: Minidump closing minidump

Adding the last two people who made changes in sigsegv - not that these changes are likely to have caused the issue but in case they might be able to help me figure out why crossystem is crashing.
s/who made changes in sigsegv/who made changes in crossystem/
The good news it is reproducible on the VM - not on actual guados

moblab@localhost /var/log/bootup $ crossystem
arch                   = x86                            # Platform architecture
Segmentation fault (core dumped)

 Issue 819576  has been merged into this issue.
It is unclear why the crossystem makes the gs_offloader fail - on my VM gs_offloader started, however after a segv there are crash reporters running etc so I am not sure how reliable the system is at that point.

The autotest code that calls crossystem should be hardened to deal more gracefully with crossystem failures

I am trying to build a debug crossystem to try to get a good traceback to the problem and try to find a culprit CL - or just add in some failure handling to the code to prevent the segv
i guess it only crashes when run inside the VM ?  odd that it only crashes on moblab as we do quite a bit of VM tests elsewhere.

running `crossystem` on my VM from vboot_reference-1.0-r1465 runs fine.
This CL is the cause of the failure

Reverting it gets crossystem to work again on the moblab vm
Double-checking my change...
I think the problem is this:

VbSharedDataHeader* sh = VbSharedDataRead();
if (sh->flags & VBSD_NVDATA_V2)

It works on a real system, but I'm not checking the return value before I dereference it, so if it returns NULL on the VM it'll die.  The other places where that's called all check if (!sh).

Fix coming imminently.

Thanks for testing the fix.

Not sure if you want to keep the bug open to cover hardening autotest so it'll report a more graceful error the next time I break crossystem.
Project Member

Comment 20 by, Mar 8

The following revision refers to this bug:

commit bff4a078938035f865b320fcecb9f456f866c7da
Author: Prathmesh Prabhu <>
Date: Thu Mar 08 08:37:51 2018

chromeos_config: Mark more flaky paladins experimental.

These are currently marked experimental via tree status. They have
failed more than 2 builds in the last 24 hours. Bugs are being actively
worked on. Tree status is an unreliable place for this, so mark them

BUG= chromium:819695 

Change-Id: I8c2d3a9339dfd32e9123bf66a5a0adb0107d5032
Commit-Ready: ChromeOS CL Exonerator Bot <>
Tested-by: Prathmesh Prabhu <>
Reviewed-by: Prathmesh Prabhu <>
Reviewed-by: Richard Barnette <>


Labels: -Pri-0 Pri-2
Downgrading the bug - the fix is stuck in the CQ - but coming.

Leaving bug open to try to do better job of handling errors when crossystem crashes.
Thanks much rspangler@ and haddowk@ for digging into this one.
It was a pesky flaky failure, so thanks for going after it.
Project Member

Comment 23 by, Mar 8

The following revision refers to this bug:

commit 0bdb8713be40abfe963d9ef625dbb67961068840
Author: Randall Spangler <>
Date: Thu Mar 08 19:33:26 2018

crossystem: Fix null pointer dereference on VMs

Check the result of VbSharedDataRead() before dereferencing it.

BUG=chromium:789276, chromium:819695 
TEST=make runtests

Change-Id: I1b1cc90bdc2fca61a9aad6b02e8b7e1f6a919797
Signed-off-by: Randall Spangler <>
Commit-Ready: Keith Haddow <>
Reviewed-by: Keith Haddow <>
Reviewed-by: Mike Frysinger <>


Labels: -Pri-2 Pri-0
This does not seem to have been the only problem - looking again
Working on why devserver is catching a SIGTERM and shutting down

[10/Mar/2018:08:09:15] ENGINE Listening for SIGHUP.
[10/Mar/2018:08:09:15] ENGINE Listening for SIGTERM.
[10/Mar/2018:08:09:15] ENGINE Listening for SIGUSR1.
[10/Mar/2018:08:09:15] ENGINE Bus STARTING
[10/Mar/2018:08:09:15] ENGINE Started monitor thread '_TimeoutMonitor'.
[10/Mar/2018:08:09:15] ENGINE Serving on :::8080
[10/Mar/2018:08:09:15] ENGINE Bus STARTED
[10/Mar/2018:08:10:53] ENGINE Caught signal SIGTERM.
[10/Mar/2018:08:10:53] ENGINE Bus STOPPING
[10/Mar/2018:08:10:53] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('::', 8080)) shut down
[10/Mar/2018:08:10:53] ENGINE Stopped thread '_TimeoutMonitor'.
[10/Mar/2018:08:10:53] ENGINE Bus STOPPED
[10/Mar/2018:08:10:53] ENGINE Bus EXITING
[10/Mar/2018:08:10:53] ENGINE Bus EXITED
[10/Mar/2018:08:10:53] ENGINE Waiting for child threads to terminate...
Project Member

Comment 26 by, Mar 11

The following revision refers to this bug:

commit db292eb9a421630f2881e33ba969e9bbe5660c17
Author: Keith Haddow <>
Date: Sun Mar 11 05:12:30 2018

[moblab] Add sleep before trying to detect USB drive.

On the VM it seems like readlink command will hang if called too
early in the boot, this stops the USB drive being mounted and
results in the dev server crashing.

For now add a sleep - however the real issues is that on VM when
cros-disks is started the filesystem still seems unstable.

BUG= chromium:819695 
TEST=local tests on moblab vm

Change-Id: I1f6125a62f5e29f6b4f16a46f23f50e6c261c60c
Commit-Ready: Keith Haddow <>
Tested-by: Keith Haddow <>
Reviewed-by: Keith Haddow <>


Project Member

Comment 27 by, Mar 12

The following revision refers to this bug:

commit 5923868cc36379039823cf1fb836b70ec866c41d
Author: Keith Haddow <>
Date: Mon Mar 12 09:45:25 2018

[moblab] Improve debugging in the upstart scripts

Move all the init script to start with moblab*

Add new upstart handler that logs all the events and return
codes to moblab*

Set -e on scripts so we try to exit as quickly as possible on
script errors

Add ethtool command because -e will not exit when in an if or
in a piped command - but we want it to fail quickly.

BUG= chromium:819695 
TEST=tryjobs on both vm and guado_moblab

Change-Id: Iaba4f11c0ce14c2cbe7e1944bc0f451fe9ad1881
Commit-Ready: Keith Haddow <>
Tested-by: Keith Haddow <>
Reviewed-by: Keith Haddow <>


Project Member

Comment 28 by, Mar 19

The following revision refers to this bug:

commit 5bc6f2a2ef1d3c849783565209ffa16af11a9b77
Author: Keith Haddow <>
Date: Mon Mar 19 21:19:07 2018

[chromite] Make moblab-vm important again

Build has been green for some time

BUG= chromium:819695 

Change-Id: I8b0ddcf992b0a06e1d27d45ca5e29eb4f113b4fd
Commit-Ready: Keith Haddow <>
Tested-by: Keith Haddow <>
Reviewed-by: Prathmesh Prabhu <>


Project Member

Comment 29 by, Mar 20

The following revision refers to this bug:

commit bb4fdab05c5a9dccbf93ee2f598d081316c54890
Author: <>
Date: Tue Mar 20 00:15:24 2018

Roll src/third_party/chromite/ 8d50e94a5..5978d7dd7 (2 commits)

$ git log 8d50e94a5..5978d7dd7 --date=short --no-merges --format='%ad %ae %s'
2018-03-14 dgarrett builder_status_lib: Remove GetBuilderStatusFromCIDB.
2018-03-13 haddowk [chromite] Make moblab-vm important again

Created with:
  roll-dep src/third_party/chromite
BUG= chromium:821986 , chromium:819695 

The AutoRoll server is located here:

Documentation for the AutoRoller is here:

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

Change-Id: Ie1aad5cf3d65e808393afe189cd50940170a2762
Commit-Queue: Chromite Chromium Autoroll <>
Reviewed-by: Chromite Chromium Autoroll <>
Cr-Commit-Position: refs/heads/master@{#544225}

Status: Fixed

Sign in to add a comment