New issue
Advanced search Search tips

Issue 785509 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

use helper process in a sandbox to clean up in imageloader

Project Member Reported by xiaochu@chromium.org, Nov 15 2017

Issue description

This is working (call helper process in a sandbox) when manually executing imageloader as root user. but in autotest, minijail crashes after imageloader returns successfully.

The autotest issues:

return subprocess.check_output([
                '/usr/sbin/imageloader', '--dry_run', '--unmount_all',
        ]).splitlines()
I could clearly see imageloader message sent back and force between imageloader and helper process and imageloader eventually exits successfully. However, minijail crashes after imageloader exits which leaves autotest in a state where it is not able to recover. 

suspected causes:
autotest already does sandboxing and sandboxing again in imageloader causes conflicts.

 
Sorry, what's the problem? Can you provide more background in the description?
Description: Show this description
This issue is also discussed in our code review: https://chromium-review.googlesource.com/c/chromiumos/platform/imageloader/+/767024
So imageloader prints all the paths out and then crashes? You can paste an example or something?
Manually run imageloader (call helper process to CleanupAll):

localhost ~ # imageloader --unmount_all --dry_run                                                                                                                                                                 
/run/imageloader/cros-termina/10081.0.2
(works properly)

Run autotest on DuT:

Autotest stops here since subprocess never returns the result: 
return subprocess.check_output([
                '/usr/sbin/imageloader', '--dry_run', '--unmount_all',
        ]).splitlines()

/var/log/message:
2017-11-15T15:41:15.738867-08:00 NOTICE dbus[599]: [system] Activating service name='org.chromium.ImageLoader' (using servicehelper)
2017-11-15T15:41:15.785694-08:00 NOTICE dbus[599]: [system] Successfully activated service 'org.chromium.ImageLoader'
2017-11-15T15:41:37.277888-08:00 INFO kernel: [82789.022062] imageloader[19574]: segfault at 76f21ebb1ac0 ip 000076f21ebb1ac0 sp 00007ffd21414418 error 15
2017-11-15T15:41:37.307499-08:00 INFO crash_reporter[19619]: libminijail[19619]: mount /dev/log -> /dev/log type ''
2017-11-15T15:41:37.307530-08:00 INFO crash_reporter[19619]: libminijail[19619]: mount /dev/pstore -> /dev/pstore type ''
2017-11-15T15:41:37.610540-08:00 DEBUG kernel: [82789.355193] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
2017-11-15T15:41:37.615545-08:00 DEBUG kernel: [82789.360069] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
2017-11-15T15:41:37.629911-08:00 WARNING crash_reporter[19619]: Could not load the device policy file.
2017-11-15T15:41:37.630303-08:00 WARNING crash_reporter[19619]: [user] Received crash notification for imageloader[19574] sig 11, user 220 (developer build - not testing - always dumping)
2017-11-15T15:41:37.632704-08:00 INFO crash_reporter[19619]: State of crashed process [19574]: S (sleeping)
2017-11-15T15:41:37.667771-08:00 INFO crash_reporter[19619]: Stored minidump to /var/spool/crash/imageloader.20171115.154137.19574.dmp
2017-11-15T15:41:37.668204-08:00 INFO crash_reporter[19619]: Leaving core file at /var/spool/crash/imageloader.20171115.154137.19574.core due to developer image
2017-11-15T15:41:37.668959-08:00 WARNING crash_reporter[19619]: [ARC] Received crash notification for imageloader[19574] sig 11, user 220 (ignoring - crash origin is not ARC)

crash trace dump:
#0  0x000076f21ebb1ac0 in ?? () from /usr/lib64/libstdc++.so.6
No symbol table info available.
#1  0x00005d7ff8023ebb in std::default_delete<brillo::dbus_utils::ExportedObjectManager>::operator() (__ptr=0x76f21ebafd98 <std::wclog+216>, this=<optimized out>)
    at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/4.9.x/include/g++-v4/bits/unique_ptr.h:76
No locals.
#2  std::unique_ptr<brillo::dbus_utils::ExportedObjectManager, std::default_delete<brillo::dbus_utils::ExportedObjectManager> >::~unique_ptr (this=0x7ffd21414968)
    at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/4.9.x/include/g++-v4/bits/unique_ptr.h:236
No locals.
#3  brillo::DBusServiceDaemon::~DBusServiceDaemon (this=0x7ffd214146b0) at ../../../../../../../../usr/include/brillo/daemons/dbus_daemon.h:52
No locals.
#4  0x00005d7ff802009e in main (argc=1, argv=0x7ffd21414b58) at ../../../imageloader-9999/platform/imageloader/imageloader_main.cc:221

Noting that imageloader_main.cc has total 221 loc in my change. It looks like minijail crashes due to imageloader(or helper process) crash. In this case the root case isn't in minijail.
Ok, we should root cause the crash before submitting your CL in case it's related to your CL specifically.
Ok, so autotest is using subprocess.check_output to run imageloader which is wrong. The correct way to run command in autotest is:

utils.system_output(...);

And I have no idea why subprocess.check_output in autotest fails to cleanup the service process which causes crash in DBusServiceDaemon destructor.

I will send a CL to fix autotest.
Thanks for root causing that.
I observed another interesting case. It looks like there is always an imageloader crash after each dbus call. The reason is following:

1. Each dbus call brings up imageloader service (imageloader_wrapper) and imageloader executes following code:

imageloader::ImageLoader daemon(std::move(config), std::move(helper_process));
daemon.Run();

2. imageloader blocks on daemon.Run() for several seconds. Then imageloader exits but daemon crashes. It looks like daemon is not properly cleaning up itself (or not initializing itself properly). 

crash trace dump:

#0  0x0000746d38efcac0 in ?? () from /usr/lib64/libstdc++.so.6
No symbol table info available.
#1  0x00005fafa07ffebb in std::default_delete<brillo::dbus_utils::ExportedObjectManager>::operator() (__ptr=0x746d38efad98 <std::wclog+216>, this=<optimized out>)
    at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/4.9.x/include/g++-v4/bits/unique_ptr.h:76
No locals.
#2  std::unique_ptr<brillo::dbus_utils::ExportedObjectManager, std::default_delete<brillo::dbus_utils::ExportedObjectManager> >::~unique_ptr (this=0x7fff98b29c08)
    at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/4.9.x/include/g++-v4/bits/unique_ptr.h:236
No locals.
#3  brillo::DBusServiceDaemon::~DBusServiceDaemon (this=0x7fff98b29950) at ../../../../../../../../usr/include/brillo/daemons/dbus_daemon.h:52
No locals.
#4  0x00005fafa07fc09e in main (argc=1, argv=0x7fff98b29df8) at ../../../imageloader-9999/platform/imageloader/imageloader_main.cc:194
Project Member

Comment 10 by bugdroid1@chromium.org, Nov 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/74554cc517ff14531ef5aa71a13101b4eeb4e0ca

commit 74554cc517ff14531ef5aa71a13101b4eeb4e0ca
Author: Xiaochu Liu <xiaochu@chromium.org>
Date: Sat Nov 18 04:18:42 2017

platform_ImageLoader: use utils.system_output()

Uses utils.system_output(...) instead of subprocess.check_output(...).

subprocess.check_output fails to cleanup DBusServbiceDaemon and causes
crash.

BUG= chromium:785509 
TEST=autotest

Change-Id: Iba59c90ffca323f59fa54d6fb358f49e7c54fdc3
Reviewed-on: https://chromium-review.googlesource.com/775032
Commit-Ready: Xiaochu Liu <xiaochu@chromium.org>
Tested-by: Greg Kerr <kerrnel@chromium.org>
Tested-by: Xiaochu Liu <xiaochu@chromium.org>
Reviewed-by: Greg Kerr <kerrnel@chromium.org>

[modify] https://crrev.com/74554cc517ff14531ef5aa71a13101b4eeb4e0ca/client/site_tests/platform_ImageLoader/platform_ImageLoader.py

Thanks for the analysis. Is the crash in comment #9 the crash fixed by the CL in comment #10, is there a separate crash that I should look into?
Both are caused by the same reason.

I observed flakeness of autotest after my imageloader change: https://wmatrix.googleplex.com/unfiltered?hide_missing=True&releases=tot&tests=platform_ImageLoaderServer

The crash happens on other 'subprocess.call()' in autotest which I ought to fix in my last CL. I'm working on a patch to fix this.
Ok, to be clear, does this crash affect actual users or is it just a crash in the autotest?
Yes, it crashes for actual users. 
So there's a crash that's totally separate from the subprocess.check_output() issue?
subprocess.check_output() is the effect of that crash. 

In my observation, each execution of 'daemon.Run();' in imageloader_main causes crash after it exits. I suspect this crash was there before my imageloader change but only observed in autotest after my change (my change invokes helper process which invokes imageloader_wrapper and imageloader_main).
Project Member

Comment 17 by bugdroid1@chromium.org, Nov 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/fb2e7420ab1882ac1bab4f499996072aafa98bf8

commit fb2e7420ab1882ac1bab4f499996072aafa98bf8
Author: Xiaochu Liu <xiaochu@chromium.org>
Date: Tue Nov 21 07:25:46 2017

platform_ImageLoader: use utils.system()

Uses utils.system(...) instead of subprocess.call(...).

This autotest is flaky (crashes at subprocess.call()) on certain boards.

BUG= chromium:785509 
TEST=autotest, print return value of utils.system to verify its return
value is retcode (0 on success).

Change-Id: I8e74bd5e352dbbdee3e765b0626e2f2948ea0d45
Reviewed-on: https://chromium-review.googlesource.com/779634
Commit-Ready: Xiaochu Liu <xiaochu@chromium.org>
Tested-by: Xiaochu Liu <xiaochu@chromium.org>
Reviewed-by: Greg Kerr <kerrnel@chromium.org>

[modify] https://crrev.com/fb2e7420ab1882ac1bab4f499996072aafa98bf8/client/site_tests/platform_ImageLoader/platform_ImageLoader.py

Hmm. I ran imageloader, without sending it any commands, and it exits clean:

localhost ~ # imageloader_wrapper                                                                                                               
localhost ~ # echo $?                                                                                                                           
0

Nothing in the logs reports a crash. Let me try sending it some commands and then I'll look for a crash.
This might be relevant. I checked ps aux after running the autotest, and it looks all the helper process's are sticking around. Is it possible those aren't be reaped correctly? 
localhost ~ # ps aux | grep imageloader                                                                                                         
root      3114  0.0  0.7  30676  7872 ?        Ss   11:10   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root      3131  0.0  0.7  30604  7416 ?        Ss   11:10   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root      3209  0.0  0.7  30604  7460 ?        Ss   11:10   0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/CorruptTestFlashComponent1/23.0.0.207 --mount_helper_fd=2
root      3220  0.0  0.7  30604  7432 ?        Ss   11:10   0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/FlashLoadedAtPath/23.0.0.207 --mount_helper_fd=2
root      3230  0.0  0.7  30604  7472 ?        Ss   11:10   0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/TestFlashComponent/24.0.0.186 --mount_helper_fd=2
root      3240  0.0  0.7  30604  7264 ?        Ss   11:10   0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2

Ok that's a separate issue I caused when I first wrote imageloader, I'll file a separate bug report. I can't reproduce the crash though, can you still repro it?
yes, I guess main function of all these commands exited correctly but for whatever reason the service process it invoked failed. 

Another way to reproduce this is to call a dbus API (e.g. LoadComponent) and then there should be a crash after imageloader_wrapper exits.
Can you copy and paste exact reproduction instructions and the crash? Where do you see the crash (logs, command line, etc)? Because I can't repro this and I have your latest change. What board does the crash happen on?
Status: Fixed (was: Untriaged)

Comment 24 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Comment 25 by dchan@chromium.org, Jan 23 2018

Status: Fixed (was: Archived)

Sign in to add a comment