New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 889552 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 5
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

[Eve, Nocturne] chromeos-firmwareupdate: updater4.sh seccomp failure with readlink syscall

Reported by asst.eve...@gmail.com, Sep 26

Issue description

IMPORTANT: Your crash has already been automatically reported to our crash system. Please file this bug only if you can provide more information about it.


Chrome Version: 71.0.3562.0
Operating System: Linux 11101.0.0

URL (if applicable) where crash occurred:

Can you reproduce this crash?yes

What steps will reproduce this crash? (If it's not reproducible, what were you doing just before the crash?)
1.voice query"who is Martin Luther King"
2.tap on suggestion chip
3.

****DO NOT CHANGE BELOW THIS LINE****
Crash ID: crash/a9e96337ddbac17d

 
Labels: OS-Chrome
The crash report is a SIGSYS @ 0x00000000 inside libc off the coreutils(!) binary 

Cc: igo@chromium.org
Components: OS>Firmware
Labels: ReleaseBlock-Beta M-71
Status: Untriaged (was: Unconfirmed)
My chromebox just hung and had to be rebooted, and on started it apparently encountered this crash eight(!) times.

I see reports as far back as September 25th, and according to Crash this signature first appeared around September ~21st.
Cc: w...@chromium.org morlovich@chromium.org
Adding a couple people to CC that have commented. We need to find an owner for this issue as it marked as RBB. Thanks
Cc: -morlovich@chromium.org
Cc: aaboagye@chromium.org gwendal@chromium.org dnojiri@chromium.org
Adding a couple more to CC to help find an owner as this is marked RBB.

crash/ info link
https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+stable_signature%3D%2783c89e4e_5292d683_f8ff28b9_65f9c78f_8812b52e%27
Cc: bleung@chromium.org
not sure this is firmware?
Components: -OS>Firmware UI>Shell>Assistant
Agreed. Not low level firmware issue.
Cc: muyuanli@chromium.org updowndota@chromium.org xiaoh...@chromium.org omrilio@chromium.org
Adding a couple people from UI>Shell>Assistant to help find an owner.
Cc: -aaboagye@chromium.org
As comment #1 mentioned, this is a low level crash in libc/coreutils. I sampled through some of the crash reports, many of them are not assistant UI related because they are from different boards that do not have assistant enabled.

I believe this is a different crash from b/116173111. We need to find some platform experts to shed some light on what this stack trace means.
Cc: yunlian@chromium.org vapier@chromium.org xzhou@chromium.org
I dig a little bit more on this.  Since the stack trace has ld in it, it must be loading some .so. Assistant feature does use a shared library and we started building libassistant.so on many boards from Sep 20th, which suspiciously correlates to the uptick of when this crash signature started to ramp up. https://screenshot.googleplex.com/H0NUFPoDUAG.png

Another piece of information is the signal in the stack, SIGSYS.  According to the man page it's "Bad system call" and related to secomp(2). 
http://man7.org/linux/man-pages/man7/signal.7.html

Looking at seccomp(2) http://man7.org/linux/man-pages/man2/seccomp.2.html it seems to mean that certain sys calls can be allowed/blocked by a predefined set of rules. And it a syscall is blocked, it would likely cause a SIGSYS. 

I know we sandbox chrome process on ChromeOS. Maybe this is why there is a SIGSYS when ld is trying to load some .so and issued some syscall that's not on the whitelist.

Again I am not an expert on this, we will need some better minds to further diagnose this issue.  I am ccing people that I know might have a better idea.
Components: -UI>Shell>Assistant OS>Systems OS>Firmware
Labels: -Restrict-View-EditIssue Pri-2 Type-Bug
Summary: chromeos-firmwareupdate: updater4.sh seccomp failure with readlink syscall (was: crashes when tap on any suggestion chip-bug id:116116863)
fairly certain this has nothing to do with assistant.  this isn't "ld" as in the linker from binutils.  ld-xxx.so is simply the runtime program loader which every program uses.  so libassociation.so doesn't factor in here.

along those lines, pretty sure none of these crashes are related to anything UI-wise.  if you want to track some UI crash, you'll want to file a new bug.  i'm focusing this on the crash/ reports linked here.

unfortunately, the relevant details aren't exposed in crash/.  i filed b/113029256 to fix that, but they haven't made progress.  so triaging this requires looking at the minidump directly.

if we run `minidump_dump` on the minidump in crash/a9e96337ddbac17d, we see:

Stream MD_LINUX_CMD_LINE:
/usr/bin/coreutils\0
--coreutils-prog-shebang=readlink\0
/usr/bin/readlink\0
/tmp/tmp.j7rDBPf9A5/bin/flashrom\0

Stream MD_LINUX_ENVIRON:
OLDPWD=/\0
TARGET_FWID=Google_Eve.9584.171.0\0
TARGET_PLATFORM=Google_Eve\0
UNIBUILD=\0
PATH=/tmp/tmp.j7rDBPf9A5/bin:/usr/bin:/usr/sbin:/sbin:/bin:/usr/local/sbin:/usr/local/bin:/tmp/tmp.j7rDBPf9A5\0
TARGET_SCRIPT=updater4.sh\0
TARGET_ECID=eve_v1.1.6585-859d2ea99\0
TARGET_RO_FWID=Google_Eve.9584.107.0\0
PWD=/tmp/tmp.j7rDBPf9A5\0
TARGET_PDID=\0

as noted, the failure is SIGSYS, and that pretty much only shows up from seccomp failures.

we know from the syscall(2) man page (Architecture calling conventions) that on x86_64, the syscall # is passed through rax.
  http://man7.org/linux/man-pages/man2/syscall.2.html

so we look at the registers from the dump:
MDRawContextAMD64
...
  rax           = 0x59
...

on x86_64, we look at unistd_64.h to see what syscall 0x59 (89) is, and we see it's __NR_readlink.  which makes sense when `readlink` is the program that's crashing.

my guess is started because of changes landed in chromite's lddtree.py like:
  https://chromium-review.googlesource.com/1195164
which caused us to run `readlink` when we weren't before.

going by the env, it seems pretty likely that this is the updater4.sh shell script running its bundled copy of flashrom.
  https://chromium.googlesource.com/chromiumos/platform/firmware/+/release-R70-11021.B/pack_stub

what isn't clear is who ran chromeos-firmwareupdate in the first place.  that tool doesn't use any minijail/seccomp filters which means whoever ran chromeos-firmwareupdate imposed a restrictive seccomp failure.

unfortunately, crash reports don't show process trees, only the crashing process itself.

the '/tmp/.org.chromium.Chromium.z5Qjmn/' usage is a bit weird.  that path shows up from libbase's tempfile handling (TempFileName).  so i think some CrOS C++ platform code is executing chromeos-firmwareupdate.

my instinct is to say something mosys related because of recent crashes we've seen over the last few releases with it turning on seccomp filters and then executing random shell scripts/tools.  but i'm not familiar enough with the various ways mosys gets run to say for sure.

otherwise i don't have an ideas of who might be running this in a seccomp filter it shouldn't be ...

the reason these crashes don't have symbols is because they're in an lddtree created env which pulls libs that are diff of what is shipped in the rootfs.  prob not the best approach, but we're doing away with all of that anyways with the move to updater5/futility based update in  issue 882445 .  so prob just to let the symbols stay missing and not worry about it.
Mike, cr50 update scripts seem to call chromeos-firmwareupdate: src/platform2/crosh/removable.d/50-crosh.sh

I can look further tomorrow. Do you know if we are minijailing scripts like that?
removable.d is only loaded when booted off a USB stick in dev mode, so it shouldn't show up on any verified system that uploads crash reports.
plus that'd require a user to open up crosh and manually run something, so hopefully if they were doing that, they'd let us know.
plus systems in dev-mode should tag themselves as such in the uploaded crash reports, and i don't see that tag in these.
Labels: -Pri-2 Pri-1
Can we assign an owner to this issue? Beta is coming up very soon and needs a fix in asap. Thanks
BTW, it was already planned to remove bundled programs from updater, and also get updater4.sh replaced by native programs 'futility' (go/updater5).

If this is a serious issue then we may consider speeding up the migration.
the problem is in m71 which is branched now, so migration is probably out. plus, we still have a scenario where someone is applying a seccomp filter to a program it doesn't maintain which is bad.
Per #15, This is a P1 RBB requiring an owner and consideration for shifting to RBS, etc.  Thanks
Owner: bleung@chromium.org
Owner: ----
Cc: ivanpe@chromium.org
Labels: Restrict-View-Google
Owner: vapier@chromium.org
Status: Assigned (was: Untriaged)
Re #17: Mike, I tagged this RBB on the basis that eight crash reports on every boot is likely to cause problems, not least by masking other crashes due to throttling.  Are there other issues that these crashes will cause, besides crash-spam?

Tagging RVG and adding ivanpe@ to advise on whether the crash-spam will be an issue; if the crash spam is somehow acceptable and there's no user-facing impact then we can remove RB label.
Owner: ----
Status: Untriaged (was: Assigned)
we don't have throttles locally for creating crash reports.  we do have a spool-dir max as documented:
https://chromium.googlesource.com/chromiumos/platform2/+/master/crash-reporter/#crash-report-storage

wrt uploading throttles, we already deal with that locally in crash-sender as documented:
https://chromium.googlesource.com/chromiumos/platform2/+/master/crash-reporter/#uploading-crashes

so there shouldn't be any server side issues
Owner: f...@chromium.org
+fqj who recently made lots of changes to sepolicy and see if he has some ideas for how this can be fixed, or should we revert / disable some changes done in M71 to let seccomp filter work like before.
I really don't think there's anything selinux related here. seccomp is a completely unrelated subsystem.

is there no one with a device reproducing the issue locally?
I hit this locally, as per comment #2, but checking chrome://crashes now, I don't see any instances of this issue any more.  Perhaps it's specific to when we e.g. have firmware updates pending?
Owner: ----
Sorry, seems like seccomp and sepolicy are different. Removed fqj@.

And regarding who called firmware updater,

the update engine will call updater if
 (1) Signed AU is received;
 (2) On updater3 family, call chromeos-boot-update if AU scheduled that;
 (3) Every 60 seconds after each boot, call chromeos-setgoodfirmware =>
    board-setgoodfirmware => on SKL & KBL family, this leads to a special ME hack
    => "cros_spi_descriptor lock" => call chromeos-firmwareupdate --sb_extract,
    which does not call updater4.

So it is more possible updater4.sh is called due to AU?

Re#25 which device do you see this issue?
Re #26: I observed this issue on a Teemo device.
i've posted a CL to have crash-reporter gather parent process tree details:
  https://crrev.com/c/1296089

that should help figure out who is calling us here.
Following up on comment #21 - do we know if this issue is user facing? We need to make a call to delay beta due to this issue or remove RBB label. Thanks
If this was reported as long ago as Sept 25th per #2 it's likely not a M71 regression, and thus not a beta blocker for M71.

Has anyone been able to bisect?
i saw it on my pixelbook, so i spent time putting the device into dev mode and see if i could reproduce.  and i can.  so what i did:
- grab 11187.0.0 recovery image:
  gs://chromeos-releases/canary-channel/eve/11187.0.0/chromeos_11187.0.0_eve_recovery_canary-channel_mp.bin
- put pixelbook into dev mode and enable usb boot
- boot that image
- login with test account
- see that steady state has no crashes
- go to chrome://settings/help and trigger an update (and got 11189.0.0)
- see crashes eventually show up

attaching my /var/log/messages and UE logs.  lets look at snippets (ignoring TZ weirdness due to initial system setup).
also attaching the crash report generated from there.  we see it's inside the temp dir noted in the logs and from that timestamp.

from messages:
2018-10-24T19:45:39.094095+00:00 NOTICE temp_logger[6784]: 00:80 02:37 03:36 04:49 05:43 06:80 07:57
2018-10-24T19:45:55.384246+00:00 INFO kernel: [  447.433897] EXT4-fs (mmcblk0p5): mounting ext2 file system using the ext4 subsystem
2018-10-24T19:45:55.385244+00:00 INFO kernel: [  447.434935] EXT4-fs (mmcblk0p5): mounted filesystem without journal. Opts: 
2018-10-24T19:46:16.709647+00:00 WARNING mosys.elf[7243]: libminijail[7243]: not locking any securebits
2018-10-24T19:46:16.709661+00:00 WARNING mosys.elf[7243]: libminijail[7243]: SECURE_NOROOT not set, not dropping bounding set
2018-10-24T19:46:30.605403+00:00 WARNING mosys[7596]: libminijail[7596]: not locking any securebits
2018-10-24T19:46:30.605418+00:00 WARNING mosys[7596]: libminijail[7596]: SECURE_NOROOT not set, not dropping bounding set
2018-10-24T19:46:30.613870+00:00 INFO crash_reporter[7600]: libminijail[7600]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001
2018-10-24T19:46:30.627950+00:00 WARNING crash_reporter[7600]: [user] Received crash notification for coreutils[7599] sig 31, user 0 group 0 (handling)
2018-10-24T19:46:30.628875+00:00 INFO crash_reporter[7600]: State of crashed process [7599]: S (sleeping)
2018-10-24T19:46:30.629126+00:00 INFO crash_reporter[7600]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/6'
2018-10-24T19:46:30.634246+00:00 INFO crash_reporter[7600]: Stored minidump to /var/spool/crash/coreutils.20181024.154630.7599.dmp

from UE, we see that crash happened right in the middle:
[1024/124555:INFO:action_processor.cc(116)] ActionProcessor: finished FilesystemVerifierAction with code ErrorCode::kSuccess
[1024/124555:INFO:action_processor.cc(143)] ActionProcessor: starting PostinstallRunnerAction
[1024/124555:INFO:postinstall_runner_action.cc(172)] Performing postinst (postinst at /tmp/.org.chromium.Chromium.UDVWUG/postinst) installed on device /dev/mmcblk0p5 and mountable device /dev/mmcblk0p5
[1024/124555:INFO:postinstall_runner_action.cc(179)] Format file for new postinst is: data
[1024/124659:INFO:subprocess.cc(157)] Subprocess output:
dm:dm bht[DEBUG] Setting block_count 588800
dm:dm bht[DEBUG] Setting depth to 3.
dm:dm bht[DEBUG] depth: 0 entries: 1
dm:dm bht[DEBUG] depth: 1 entries: 36
dm:dm bht[DEBUG] depth: 2 entries: 4600
PostInstall Configured: (B, /dev/mmcblk0p5, /dev/mmcblk0p4, /dev/mmcblk0p12)

 Current Kernel Info: sysname(Linux) nodename(localhost) release(4.4.161-15348-ge2d0d39f1235) version(#1 SMP PREEMPT Tue Oct 23 20:28:42 PDT 2018) machine(x86_64)

lsb-release inside the new rootfs:
CHROMEOS_ARC_ANDROID_SDK_VERSION=28
CHROMEOS_ARC_VERSION=5087515
CHROMEOS_AUSERVER=https://tools.google.com/service/update2
CHROMEOS_BOARD_APPID={01906EA2-3EB2-41F1-8F62-F0B7120EFD2E}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
CHROMEOS_DEVSERVER=
CHROMEOS_RELEASE_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
CHROMEOS_RELEASE_BOARD=eve-signed-mpkeys
CHROMEOS_RELEASE_BRANCH_NUMBER=0
CHROMEOS_RELEASE_BUILDER_PATH=eve-release/R72-11189.0.0
CHROMEOS_RELEASE_BUILD_NUMBER=11189
CHROMEOS_RELEASE_BUILD_TYPE=Official Build
CHROMEOS_RELEASE_CHROME_MILESTONE=72
CHROMEOS_RELEASE_DESCRIPTION=11189.0.0 (Official Build) canary-channel eve 
CHROMEOS_RELEASE_KEYSET=mp
CHROMEOS_RELEASE_NAME=Chrome OS
CHROMEOS_RELEASE_PATCH_NUMBER=0
CHROMEOS_RELEASE_TRACK=canary-channel
CHROMEOS_RELEASE_VERSION=11189.0.0
DEVICETYPE=CHROMEBOOK
GOOGLE_RELEASE=11189.0.0

Set boot target to /dev/mmcblk0p5: Partition 5, Slot B
SetImage
KERNEL_CONFIG: console= loglevel=7 init=/sbin/init cros_secure oops=panic panic=-1 root=/dev/dm-0 rootwait ro dm_verity.error_behavior=3 dm_verity.max_bios=-1 dm_verity.dev_wait=1 dm="1 vroot none ro 1,0 4710400 verity payload=PARTUUID=%U/PARTNROFF=1 hashtree=PARTUUID=%U/PARTNROFF=1 hashstart=4710400 alg=sha1 root_hexdigest=73ae7f3db69dfd124ad0114589ead0a1b85ec4f0 salt=a4e18be2fe283eeccfee8d38620e4ceddde314270a11f9f44cda122f2ed27b3b" noinitrd vt.global_cursor_default=0 kern_guid=%U add_efi_memmap boot=local noresume noswap i915.modeset=1 tpm_tis.force=1 tpm_tis.interrupts=0 nmi_watchdog=panic,lapic disablevmx=off  
Setting up verity.
Finished after 20 seconds.
Clearing network driver boot cache: /var/lib/preload-network-drivers.
Syncing filesystems before changing boot order...
Finished after 1 seconds.
Updating Partition Table Attributes using CgptManager...
Updated kernel 4 with Successful = 0 and NumTriesLeft = 6
Checking /mnt/stateful_partition/unencrypted permission.
Permission is ok.
Unlinked file /var/lib/ureadahead/pack
Starting board post install script (//usr/sbin/board-postinst /)
Command: //usr/sbin/board-postinst /
Finished after 0 seconds.
Board post install succeeded
Starting firmware updater (//usr/sbin/chromeos-firmwareupdate --mode=autoupdate)
Command: //usr/sbin/chromeos-firmwareupdate --mode=autoupdate
Starting Google_Eve firmware updater v4 (autoupdate)...
 - Updater package: [RO:Google_Eve.9584.107.0 RW:Google_Eve.9584.174.0 / EC:eve_v1.1.6585-859d2ea99]
 - Current system:  [RO:Google_Eve.9584.107.0 , ACT:Google_Eve.9584.174.0 / EC:eve_v1.1.6643-e18316ecc]
 - Write protection: Hardware: ON, Software: Main=off
Read the current FLMSTR values...
flashrom v0.9.9  : 02aea36 : Oct 05 2018 22:43:50 UTC on Linux 4.4.161-15348-ge2d0d39f1235 (x86_64)
Calibrating delay loop... OK.
coreboot table found at 0x7aabe000.
WARNING: SPI Configuration Lockdown activated.
Reading flash... SUCCESS
No need to modify image file.
 * invoke: flashrom -p host -r _current/bios.bin
Firmware update available: Google_Eve.9584.174.0. 
 * invoke: flashrom -p host --fast-verify -w bios.bin -i RW_SECTION_B
On reboot EC update may occur.
Firmware update (autoupdate) completed.
Finished after 18 seconds.
Firmware update completed.
Starting command: //usr/share/cros/cr50-set-board-id.sh unknown
Command: //usr/share/cros/cr50-set-board-id.sh unknown
ERROR: Board ID and flag have already been set.
ignored: cr50-set-board-id failure (2).
Finished after 0 seconds.
Failed Command: //usr/share/cros/cr50-set-board-id.sh unknown - Exit Code 2
Starting command: //usr/share/cros/cr50-update.sh /
Command: //usr/share/cros/cr50-update.sh /
Finished after 15 seconds.
cr50 setup complete.
ChromeosChrootPostinst complete
Syncing filesystem at end of postinst...
messages.xz
36.6 KB Download
update_engine.xz
9.6 KB Download
coreutils.20181024.154630.7599.meta.xz
184 bytes Download
coreutils.20181024.154630.7599.dmp.xz
4.1 KB Download
once i have 11187.0.0 on the system, it's easy to keep reproducing:
- invalidate the update by corrupting the new kernel:
  dd if=/dev/zero of=/dev/mmcblk0p4
- clear the crashes:
  rm -rf /var/spool/crash/*
- clear the logs:
  > /var/log/messages
  > /var/log/update_engine.log
- reboot
- check that you're still in 11187.0.0
- in a new shell run:
  cd /var/log
  tail -f messages update_engine.log
- trigger a system update again
- watch the log files.  see that the timestamps line up and that the crashes start showing up in the middle of the postinst run.
Owner: gmeinke@chromium.org
here's an strace of the failing UE run.  i've filtered out a lot of noise, but hopefully not too much to show the issue.

we see the first SIGSYS failure here:
5049  --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0x795e0fff76e7, si_syscall=__NR_readlink, si_arch=AUDIT_ARCH_X86_64} ---
5049  +++ killed by SIGSYS (core dumped) +++

pid 5049 is indeed readlink:
5049  execve("/usr/bin/readlink", ["readlink", "/tmp/tmp.Be4FUlcWox/bin/flashrom"], 0x5c3c642df810 /* 10 vars */) = 0

so lets look up to see where that call came from.  we see:
- pid 5049 was forked (cloned) from pid 5048
- pid 5048 is the flashrom shell wrapper:
  5048  execve("/tmp/tmp.Be4FUlcWox/bin/flashrom", ["flashrom", "-p", "host", "-i", "RW_NVRAM:/tmp/flashrom_r2icQw", "-r"], 0x7fff90e20f38 /* 10 vars */) = 0
- pid 5048 was forked (cloned) from pid 5046
- pid 5046 is system mosys:
  5046  execve("/usr/sbin/mosys", ["/usr/sbin/mosys", "nvram", "vboot", "write", "601000000102001a00feff0000ffffe6"], 0x7ffea67a3878 /* 10 vars */) = 0
- mosys locks itself down:
  5046  prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) = 0
  5046  open("/usr/share/policy/mosys-seccomp.policy", O_RDONLY|O_CLOEXEC) = 4
  5046  seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, {len=133, filter=0x55c5586eb000}) = 0

mosys likes to install a restrictive seccomp filter and then run a lot of programs underneath it.  we know this has been problematic, but that's where we're at.  it's a bug in mosys.

for posterity, the mosys call came from:
5044  execve("/tmp/tmp.Be4FUlcWox/bin/crossystem", ["crossystem", "fw_try_next=B"], 0x56a111b34960 /* 10 vars */) = 0

which i think came from the updater4.sh logic:
# Update Engine - Received Update
mode_autoupdate() {
...
  if [ -n "$try_next" ]; then
    cros_set_prop "fw_try_next=$try_next"
  fi

but i'm done debugging this :p
strace.log.xz
818 KB Download
vapier@; thanks for the work!  I need to determine if this was already pushed to Beta or Stable for M70 or earlier for consideration of M71.  Given the earlier reports I assume it was introduced prior to M71.  Any way to confirm?  I'd rather not block Beta on an issue already out there.  Thanks.
Labels: -Restrict-View-Google
if it's because of the CL i noted in comment #12, that landed in 11032.0.0, so it prob broke around that time
Project Member

Comment 36 by bugdroid1@chromium.org, Oct 25

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7f854db22fd877ac891904cc8f7f743e25cd9ed2

commit 7f854db22fd877ac891904cc8f7f743e25cd9ed2
Author: Mike Frysinger <vapier@chromium.org>
Date: Thu Oct 25 03:07:20 2018

crash: improve error message for unknown files

The UserCrashTest._run_crasher_process_and_analyze helper goes through
the crash spool directory and makes sure all files created with reports
are known/expected.  If it finds any it doesn't know about, it fails.

Unfortunately, the "unknown file" code path is geared towards a failure
mode that no longer really comes up: it assumes that if a file is new,
it's because breakpad itself created a minidump and left it behind.  So
any time crash-reporter creates a new file, we get the confusing error:
  ERROR: crasher_nobreakpad did generate breakpad minidump
It doesn't tell us what this new file is (could be a log file), and the
grammar being slightly broken adds to the confusion.

Further, if crash-reporter creates two new files, we get the error:
  ERROR: Breakpad wrote multiple minidumps
Which doesn't make sense if crash-reporter is creating logs and not any
breakpad related files.

Lets nuke all this special case logic and just add a generic "unknown
file" whenever we see files we don't expect.  Now we'll get an explicit
error about the file that we don't handle:
  ERROR: Crash reporter created an unknown file: crasher_nobreakpad.20181023.131230.12664.proclog

BUG= chromium:889552 
TEST=`test_that ... logging_UserCrash` still passes, and writes a better error message w/CL:1296089

Change-Id: Ieedd59b8b21264e669da4df18a96ac8773dace1e
Reviewed-on: https://chromium-review.googlesource.com/1297191
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Commit-Ready: Mike Frysinger <vapier@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Ben Chan <benchan@chromium.org>

[modify] https://crrev.com/7f854db22fd877ac891904cc8f7f743e25cd9ed2/client/cros/crash/user_crash_test.py

Thanks for #35...

This appears to be limited to Nocturne when reviewing the crash report:
https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+product.Version%3D%2711151.4.0%27+AND+stable_signature%3D%2783c89e4e_5292d683_f8ff28b9_65f9c78f_8812b52e%27

If that's the case let's add [Nocturne] to the title; if'll help tremendously for the sake of release planning.
Summary: [Nocturne] chromeos-firmwareupdate: updater4.sh seccomp failure with readlink syscall (was: chromeos-firmwareupdate: updater4.sh seccomp failure with readlink syscall)
Summary: chromeos-firmwareupdate: updater4.sh seccomp failure with readlink syscall (was: [Nocturne] chromeos-firmwareupdate: updater4.sh seccomp failure with readlink syscall)
it isn't limited to Nocturne.  i reproduced on eve, and a bunch of the reports i saw above were for eve.  there's nothing in the CLs listed that are board specific.
Status: Started (was: Untriaged)
I have reproduced on my eve and working on it.
Labels: Restrict-View-Google
Okay, I'm seeing this for Eve as well.  The earlier query must have filtered on the board.  The better query is here:

https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+product.Version%3D%2711151.4.0%27+AND+stable_signature%3D%2783c89e4e_5292d683_f8ff28b9_65f9c78f_8812b52e%27

This isn't a low hitter crash, but it's not esp heavy either.  Right now this would be the only beta blocker so not sure if it's worth that block.

Restricting the view to Googlers as we do with crash analysis.

> Restricting the view to Googlers as we do with crash analysis.

i don't think that's the policy.  RVG is used when the info is either known or unknown to be sensitive, or has security implications.  we know in this case the info isn't sensitive and there are no security implications.  so RVG is unnecessary.
Re #21, it is not clear to me what would be the net daily total increase in crashes.
crrev.com/c/1299955 in flight should fix the problem
Per #42, I did that to mimik the crash template; fine if we remove it.
FYI, my plan is to skip Eve and Nocturne on the Dev / Beta RCs but proceed with other boards; thanks.
Labels: -Restrict-View-Google
Summary: [Eve, Nocturne] chromeos-firmwareupdate: updater4.sh seccomp failure with readlink syscall (was: chromeos-firmwareupdate: updater4.sh seccomp failure with readlink syscall)
Adding [Eve, Nocturne] to summary to assist with release strategy.
Project Member

Comment 49 by bugdroid1@chromium.org, Oct 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/mosys/+/70b8fae91316e742cc928c3d1f60b075bdeb7d39

commit 70b8fae91316e742cc928c3d1f60b075bdeb7d39
Author: Gregory Meinke <gmeinke@chromium.org>
Date: Sat Oct 27 04:10:42 2018

Add readlink to mosys-seccomp.policy

To test added a readlink call to mosys (and then removed) to verify the
seccomp policy would cause the crash. With readlink added to the policy
mosys did not crash.

BUG= chromium:889552 
TEST=local device tests

Change-Id: I20cd5e7a85a84b01feed67b5d893d83af1914019
Reviewed-on: https://chromium-review.googlesource.com/1299955
Commit-Ready: Gregory Meinke <gmeinke@chromium.org>
Tested-by: Gregory Meinke <gmeinke@chromium.org>
Reviewed-by: C Shapiro <shapiroc@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/70b8fae91316e742cc928c3d1f60b075bdeb7d39/seccomp/mosys-seccomp-arm64.policy
[modify] https://crrev.com/70b8fae91316e742cc928c3d1f60b075bdeb7d39/seccomp/mosys-seccomp-arm.policy
[modify] https://crrev.com/70b8fae91316e742cc928c3d1f60b075bdeb7d39/seccomp/mosys-seccomp-amd64.policy

Project Member

Comment 50 by bugdroid1@chromium.org, Oct 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/16823561072f35fc6245b90c41e5491ba7ed6011

commit 16823561072f35fc6245b90c41e5491ba7ed6011
Author: Mike Frysinger <vapier@chromium.org>
Date: Sat Oct 27 04:10:42 2018

crash: overhaul report file processing

The current code hardcodes each file type that we know might be part
of a crash report which makes it hard to add more -- it requires a
good amount of copy & paste.

Lets rewrite the logic to have a set for required & optional file
types in each crash report, and have a generic loop to gather all
the files before we detect missing/unknown files.

Now if we want to add more required or optional files, we only have
to update the two class sets.

BUG= chromium:889552 
TEST=`test_that ... logging_UserCrash` still passes

Change-Id: I8bbebdc8be607c58e130dfdc710ed2584b793d94
Reviewed-on: https://chromium-review.googlesource.com/1297192
Commit-Ready: Mike Frysinger <vapier@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Ben Chan <benchan@chromium.org>

[modify] https://crrev.com/16823561072f35fc6245b90c41e5491ba7ed6011/client/cros/crash/user_crash_test.py

Project Member

Comment 51 by bugdroid1@chromium.org, Oct 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f47c5e62851e8d8733bd452a0688326a0a18f30e

commit f47c5e62851e8d8733bd452a0688326a0a18f30e
Author: Mike Frysinger <vapier@chromium.org>
Date: Sat Oct 27 11:34:23 2018

crash: allow crash reports to include "proclog" files

We're starting to create a "proclog" file for each user crash report.
Don't flag this as an error.

BUG= chromium:889552 
TEST=`test_that ... logging_UserCrash` passes w/CL:1296089

Change-Id: I2dcdb18b05d436c23f5e837208592d45646dfbd9
Reviewed-on: https://chromium-review.googlesource.com/1297193
Commit-Ready: Mike Frysinger <vapier@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Ben Chan <benchan@chromium.org>

[modify] https://crrev.com/f47c5e62851e8d8733bd452a0688326a0a18f30e/client/cros/crash/user_crash_test.py

Status: Fixed (was: Started)
Labels: Merge-TBD
[Auto-generated comment by a script] We noticed that this issue is targeted for M-71; it appears the fix may have landed after branch point, meaning a merge might be required. Please confirm if a merge is required here - if so add Merge-Request-71 label, otherwise remove Merge-TBD label. Thanks.
Labels: -Merge-TBD Merge-Request-71
Status: Started (was: Fixed)
pretty sure you'll want to cherry pick this to R71
Project Member

Comment 55 by sheriffbot@chromium.org, Oct 29

Labels: -Merge-Request-71 Hotlist-Merge-Review Merge-Review-71
This bug requires manual review: M71 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
I need some context on testing the changes on ToT before I can merge approve on M71.    Also, is the user impact limited to the crash?  Working to identify risk against the blocking status.  Thanks
Also, I'm seeing a big drop for this crash with last week's DEV release; but that could be because we skipped Eve.  Coincidence, or anticipated result?

Note the first entry

https://crash.corp.google.com/browse?q=product.name%3D%27ChromeOS%27&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&compProp=product.Version&v1=11151.4.0&v2=11151.11.0
Ping on #24.  This is a high impact P1; need to move it along.  Thanks :-)
Project Member

Comment 59 by bugdroid1@chromium.org, Oct 31

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/aecfa24bfa878cff73328a95b49f045a8c8075d6

commit aecfa24bfa878cff73328a95b49f045a8c8075d6
Author: Mike Frysinger <vapier@chromium.org>
Date: Wed Oct 31 15:17:29 2018

crash: gather process tree info for every user crash

When a process crashes, knowing its execution environment (who spawned
it and with what caps/seccomp/uid/gid settings) can help debug.  We have
such a crash now where its parent sets restrictive security settings it
shouldn't have and we don't know who is spawning it.

I've used "proclog" as we have some places currently that are sensitive
to extensions like logging_UserCrash which only sees "log" when using
".proc.log", and the upload code uses "proclog" as a key for arbitrary
files on the server.

There shouldn't be any PII issues here as we're gathering from each
parent process:
- cmdline (the same as `top`)
- status (contains numeric security/signal/memory stats)
Plus we run the log through the common StripSensitiveData in case any
parent process includes GUIDs or similar things in its command line.

BUG= chromium:889552 
TEST=unittests pass
TEST=crashed a proc and checked the log to see it walk back up
CQ-DEPEND=CL:1297193

Change-Id: Iba8ae8cbedaf8a1be56fc7d15809c12f62d5bea2
Reviewed-on: https://chromium-review.googlesource.com/1296089
Commit-Ready: Mike Frysinger <vapier@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Ben Chan <benchan@chromium.org>

[modify] https://crrev.com/aecfa24bfa878cff73328a95b49f045a8c8075d6/crash-reporter/user_collector_base.cc
[modify] https://crrev.com/aecfa24bfa878cff73328a95b49f045a8c8075d6/crash-reporter/crash_collector_test.cc
[modify] https://crrev.com/aecfa24bfa878cff73328a95b49f045a8c8075d6/crash-reporter/crash_collector.cc
[modify] https://crrev.com/aecfa24bfa878cff73328a95b49f045a8c8075d6/crash-reporter/crash_collector.h

I don't know what you're pinging about on comment #24

eng has requested meeting to 71 for just the mosys seccomp changes and is waiting for approval on that.
I was pinging on #24 to ensure we had adequate test coverage, esp in those instances where we could have reproduced and we could check the delta.

I'm still asking if we've been able to verify prior to the approval...
we have no unittest coverage in this area and it's pretty unlikely to happen any time soon.  so it's just manual verification and seeing if the crash/ rates in this space have gone down/disappeared since his CL's landed.
Labels: -Merge-Review-71 Merge-Approved-71
Okay, cool.  Thanks for the details.

Approving merge to M71 Chrome OS.
Project Member

Comment 64 by sheriffbot@chromium.org, Nov 5

Cc: kbleicher@google.com
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Patching initial CL:1318709 in now. Will need to also patch in CL:1306534.
Project Member

Comment 66 by bugdroid1@chromium.org, Nov 5

Labels: merge-merged-release-R71-11151.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/mosys/+/942f163122808f35400e8a8e7d6a5473a5700121

commit 942f163122808f35400e8a8e7d6a5473a5700121
Author: Gregory Meinke <gmeinke@chromium.org>
Date: Mon Nov 05 22:50:39 2018

Add readlink to mosys-seccomp.policy

To test added a readlink call to mosys (and then removed) to verify the
seccomp policy would cause the crash. With readlink added to the policy
mosys did not crash.

BUG= chromium:889552 
TEST=local device tests

Change-Id: I20cd5e7a85a84b01feed67b5d893d83af1914019
Previous-Reviewed-on: https://chromium-review.googlesource.com/1299955
(cherry picked from commit 70b8fae91316e742cc928c3d1f60b075bdeb7d39)
Previous-Reviewed-on: https://chromium-review.googlesource.com/1318709
(cherry picked from commit ee214757c100f1adb694344a1f9c13483669dfce)
Previous-Reviewed-on: https://chromium-review.googlesource.com/1318709
(cherry picked from commit 6d7175d69564b7929fe1520e33333a851839c54b)
Reviewed-on: https://chromium-review.googlesource.com/c/1318709
Reviewed-by: C Shapiro <shapiroc@chromium.org>
Commit-Queue: Gregory Meinke <gmeinke@chromium.org>
Tested-by: Gregory Meinke <gmeinke@chromium.org>
Trybot-Ready: Gregory Meinke <gmeinke@chromium.org>

[modify] https://crrev.com/942f163122808f35400e8a8e7d6a5473a5700121/seccomp/mosys-seccomp-arm64.policy
[modify] https://crrev.com/942f163122808f35400e8a8e7d6a5473a5700121/seccomp/mosys-seccomp-arm.policy
[modify] https://crrev.com/942f163122808f35400e8a8e7d6a5473a5700121/seccomp/mosys-seccomp-amd64.policy

https://chromium-review.googlesource.com/c/chromiumos/platform/mosys/+/1318710 has also landed in R71, it didn't reference this bug so it didn't auto-update.
Status: Fixed (was: Started)
Not sure what labels to remove, so just marking as fixed until I can verify.
Labels: -Merge-Approved-71
remove Merge-Approved-71 label ... any other hints would be greatly appreciated.

Sign in to add a comment