[Eve, Nocturne] chromeos-firmwareupdate: updater4.sh seccomp failure with readlink syscall
Reported by
asst.eve...@gmail.com,
Sep 26
|
|||||||||||||||||||||||||||||||||||
Issue descriptionIMPORTANT: Your crash has already been automatically reported to our crash system. Please file this bug only if you can provide more information about it. Chrome Version: 71.0.3562.0 Operating System: Linux 11101.0.0 URL (if applicable) where crash occurred: Can you reproduce this crash?yes What steps will reproduce this crash? (If it's not reproducible, what were you doing just before the crash?) 1.voice query"who is Martin Luther King" 2.tap on suggestion chip 3. ****DO NOT CHANGE BELOW THIS LINE**** Crash ID: crash/a9e96337ddbac17d
,
Oct 12
My chromebox just hung and had to be rebooted, and on started it apparently encountered this crash eight(!) times. I see reports as far back as September 25th, and according to Crash this signature first appeared around September ~21st.
,
Oct 16
Adding a couple people to CC that have commented. We need to find an owner for this issue as it marked as RBB. Thanks
,
Oct 16
,
Oct 17
Adding a couple more to CC to help find an owner as this is marked RBB. crash/ info link https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+stable_signature%3D%2783c89e4e_5292d683_f8ff28b9_65f9c78f_8812b52e%27
,
Oct 17
not sure this is firmware?
,
Oct 17
Agreed. Not low level firmware issue.
,
Oct 17
Adding a couple people from UI>Shell>Assistant to help find an owner.
,
Oct 17
,
Oct 17
As comment #1 mentioned, this is a low level crash in libc/coreutils. I sampled through some of the crash reports, many of them are not assistant UI related because they are from different boards that do not have assistant enabled. I believe this is a different crash from b/116173111. We need to find some platform experts to shed some light on what this stack trace means.
,
Oct 17
I dig a little bit more on this. Since the stack trace has ld in it, it must be loading some .so. Assistant feature does use a shared library and we started building libassistant.so on many boards from Sep 20th, which suspiciously correlates to the uptick of when this crash signature started to ramp up. https://screenshot.googleplex.com/H0NUFPoDUAG.png Another piece of information is the signal in the stack, SIGSYS. According to the man page it's "Bad system call" and related to secomp(2). http://man7.org/linux/man-pages/man7/signal.7.html Looking at seccomp(2) http://man7.org/linux/man-pages/man2/seccomp.2.html it seems to mean that certain sys calls can be allowed/blocked by a predefined set of rules. And it a syscall is blocked, it would likely cause a SIGSYS. I know we sandbox chrome process on ChromeOS. Maybe this is why there is a SIGSYS when ld is trying to load some .so and issued some syscall that's not on the whitelist. Again I am not an expert on this, we will need some better minds to further diagnose this issue. I am ccing people that I know might have a better idea.
,
Oct 18
fairly certain this has nothing to do with assistant. this isn't "ld" as in the linker from binutils. ld-xxx.so is simply the runtime program loader which every program uses. so libassociation.so doesn't factor in here. along those lines, pretty sure none of these crashes are related to anything UI-wise. if you want to track some UI crash, you'll want to file a new bug. i'm focusing this on the crash/ reports linked here. unfortunately, the relevant details aren't exposed in crash/. i filed b/113029256 to fix that, but they haven't made progress. so triaging this requires looking at the minidump directly. if we run `minidump_dump` on the minidump in crash/a9e96337ddbac17d, we see: Stream MD_LINUX_CMD_LINE: /usr/bin/coreutils\0 --coreutils-prog-shebang=readlink\0 /usr/bin/readlink\0 /tmp/tmp.j7rDBPf9A5/bin/flashrom\0 Stream MD_LINUX_ENVIRON: OLDPWD=/\0 TARGET_FWID=Google_Eve.9584.171.0\0 TARGET_PLATFORM=Google_Eve\0 UNIBUILD=\0 PATH=/tmp/tmp.j7rDBPf9A5/bin:/usr/bin:/usr/sbin:/sbin:/bin:/usr/local/sbin:/usr/local/bin:/tmp/tmp.j7rDBPf9A5\0 TARGET_SCRIPT=updater4.sh\0 TARGET_ECID=eve_v1.1.6585-859d2ea99\0 TARGET_RO_FWID=Google_Eve.9584.107.0\0 PWD=/tmp/tmp.j7rDBPf9A5\0 TARGET_PDID=\0 as noted, the failure is SIGSYS, and that pretty much only shows up from seccomp failures. we know from the syscall(2) man page (Architecture calling conventions) that on x86_64, the syscall # is passed through rax. http://man7.org/linux/man-pages/man2/syscall.2.html so we look at the registers from the dump: MDRawContextAMD64 ... rax = 0x59 ... on x86_64, we look at unistd_64.h to see what syscall 0x59 (89) is, and we see it's __NR_readlink. which makes sense when `readlink` is the program that's crashing. my guess is started because of changes landed in chromite's lddtree.py like: https://chromium-review.googlesource.com/1195164 which caused us to run `readlink` when we weren't before. going by the env, it seems pretty likely that this is the updater4.sh shell script running its bundled copy of flashrom. https://chromium.googlesource.com/chromiumos/platform/firmware/+/release-R70-11021.B/pack_stub what isn't clear is who ran chromeos-firmwareupdate in the first place. that tool doesn't use any minijail/seccomp filters which means whoever ran chromeos-firmwareupdate imposed a restrictive seccomp failure. unfortunately, crash reports don't show process trees, only the crashing process itself. the '/tmp/.org.chromium.Chromium.z5Qjmn/' usage is a bit weird. that path shows up from libbase's tempfile handling (TempFileName). so i think some CrOS C++ platform code is executing chromeos-firmwareupdate. my instinct is to say something mosys related because of recent crashes we've seen over the last few releases with it turning on seccomp filters and then executing random shell scripts/tools. but i'm not familiar enough with the various ways mosys gets run to say for sure. otherwise i don't have an ideas of who might be running this in a seccomp filter it shouldn't be ... the reason these crashes don't have symbols is because they're in an lddtree created env which pulls libs that are diff of what is shipped in the rootfs. prob not the best approach, but we're doing away with all of that anyways with the move to updater5/futility based update in issue 882445 . so prob just to let the symbols stay missing and not worry about it.
,
Oct 18
Mike, cr50 update scripts seem to call chromeos-firmwareupdate: src/platform2/crosh/removable.d/50-crosh.sh I can look further tomorrow. Do you know if we are minijailing scripts like that?
,
Oct 18
removable.d is only loaded when booted off a USB stick in dev mode, so it shouldn't show up on any verified system that uploads crash reports. plus that'd require a user to open up crosh and manually run something, so hopefully if they were doing that, they'd let us know. plus systems in dev-mode should tag themselves as such in the uploaded crash reports, and i don't see that tag in these.
,
Oct 18
Can we assign an owner to this issue? Beta is coming up very soon and needs a fix in asap. Thanks
,
Oct 21
BTW, it was already planned to remove bundled programs from updater, and also get updater4.sh replaced by native programs 'futility' (go/updater5). If this is a serious issue then we may consider speeding up the migration.
,
Oct 21
the problem is in m71 which is branched now, so migration is probably out. plus, we still have a scenario where someone is applying a seccomp filter to a program it doesn't maintain which is bad.
,
Oct 22
Per #15, This is a P1 RBB requiring an owner and consideration for shifting to RBS, etc. Thanks
,
Oct 22
,
Oct 22
,
Oct 22
Re #17: Mike, I tagged this RBB on the basis that eight crash reports on every boot is likely to cause problems, not least by masking other crashes due to throttling. Are there other issues that these crashes will cause, besides crash-spam? Tagging RVG and adding ivanpe@ to advise on whether the crash-spam will be an issue; if the crash spam is somehow acceptable and there's no user-facing impact then we can remove RB label.
,
Oct 22
we don't have throttles locally for creating crash reports. we do have a spool-dir max as documented: https://chromium.googlesource.com/chromiumos/platform2/+/master/crash-reporter/#crash-report-storage wrt uploading throttles, we already deal with that locally in crash-sender as documented: https://chromium.googlesource.com/chromiumos/platform2/+/master/crash-reporter/#uploading-crashes so there shouldn't be any server side issues
,
Oct 22
+fqj who recently made lots of changes to sepolicy and see if he has some ideas for how this can be fixed, or should we revert / disable some changes done in M71 to let seccomp filter work like before.
,
Oct 22
I really don't think there's anything selinux related here. seccomp is a completely unrelated subsystem. is there no one with a device reproducing the issue locally?
,
Oct 22
I hit this locally, as per comment #2, but checking chrome://crashes now, I don't see any instances of this issue any more. Perhaps it's specific to when we e.g. have firmware updates pending?
,
Oct 22
Sorry, seems like seccomp and sepolicy are different. Removed fqj@.
And regarding who called firmware updater,
the update engine will call updater if
(1) Signed AU is received;
(2) On updater3 family, call chromeos-boot-update if AU scheduled that;
(3) Every 60 seconds after each boot, call chromeos-setgoodfirmware =>
board-setgoodfirmware => on SKL & KBL family, this leads to a special ME hack
=> "cros_spi_descriptor lock" => call chromeos-firmwareupdate --sb_extract,
which does not call updater4.
So it is more possible updater4.sh is called due to AU?
Re#25 which device do you see this issue?
,
Oct 22
Re #26: I observed this issue on a Teemo device.
,
Oct 23
i've posted a CL to have crash-reporter gather parent process tree details: https://crrev.com/c/1296089 that should help figure out who is calling us here.
,
Oct 23
Following up on comment #21 - do we know if this issue is user facing? We need to make a call to delay beta due to this issue or remove RBB label. Thanks
,
Oct 24
If this was reported as long ago as Sept 25th per #2 it's likely not a M71 regression, and thus not a beta blocker for M71. Has anyone been able to bisect?
,
Oct 24
i saw it on my pixelbook, so i spent time putting the device into dev mode and see if i could reproduce. and i can. so what i did: - grab 11187.0.0 recovery image: gs://chromeos-releases/canary-channel/eve/11187.0.0/chromeos_11187.0.0_eve_recovery_canary-channel_mp.bin - put pixelbook into dev mode and enable usb boot - boot that image - login with test account - see that steady state has no crashes - go to chrome://settings/help and trigger an update (and got 11189.0.0) - see crashes eventually show up attaching my /var/log/messages and UE logs. lets look at snippets (ignoring TZ weirdness due to initial system setup). also attaching the crash report generated from there. we see it's inside the temp dir noted in the logs and from that timestamp. from messages: 2018-10-24T19:45:39.094095+00:00 NOTICE temp_logger[6784]: 00:80 02:37 03:36 04:49 05:43 06:80 07:57 2018-10-24T19:45:55.384246+00:00 INFO kernel: [ 447.433897] EXT4-fs (mmcblk0p5): mounting ext2 file system using the ext4 subsystem 2018-10-24T19:45:55.385244+00:00 INFO kernel: [ 447.434935] EXT4-fs (mmcblk0p5): mounted filesystem without journal. Opts: 2018-10-24T19:46:16.709647+00:00 WARNING mosys.elf[7243]: libminijail[7243]: not locking any securebits 2018-10-24T19:46:16.709661+00:00 WARNING mosys.elf[7243]: libminijail[7243]: SECURE_NOROOT not set, not dropping bounding set 2018-10-24T19:46:30.605403+00:00 WARNING mosys[7596]: libminijail[7596]: not locking any securebits 2018-10-24T19:46:30.605418+00:00 WARNING mosys[7596]: libminijail[7596]: SECURE_NOROOT not set, not dropping bounding set 2018-10-24T19:46:30.613870+00:00 INFO crash_reporter[7600]: libminijail[7600]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001 2018-10-24T19:46:30.627950+00:00 WARNING crash_reporter[7600]: [user] Received crash notification for coreutils[7599] sig 31, user 0 group 0 (handling) 2018-10-24T19:46:30.628875+00:00 INFO crash_reporter[7600]: State of crashed process [7599]: S (sleeping) 2018-10-24T19:46:30.629126+00:00 INFO crash_reporter[7600]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/6' 2018-10-24T19:46:30.634246+00:00 INFO crash_reporter[7600]: Stored minidump to /var/spool/crash/coreutils.20181024.154630.7599.dmp from UE, we see that crash happened right in the middle: [1024/124555:INFO:action_processor.cc(116)] ActionProcessor: finished FilesystemVerifierAction with code ErrorCode::kSuccess [1024/124555:INFO:action_processor.cc(143)] ActionProcessor: starting PostinstallRunnerAction [1024/124555:INFO:postinstall_runner_action.cc(172)] Performing postinst (postinst at /tmp/.org.chromium.Chromium.UDVWUG/postinst) installed on device /dev/mmcblk0p5 and mountable device /dev/mmcblk0p5 [1024/124555:INFO:postinstall_runner_action.cc(179)] Format file for new postinst is: data [1024/124659:INFO:subprocess.cc(157)] Subprocess output: dm:dm bht[DEBUG] Setting block_count 588800 dm:dm bht[DEBUG] Setting depth to 3. dm:dm bht[DEBUG] depth: 0 entries: 1 dm:dm bht[DEBUG] depth: 1 entries: 36 dm:dm bht[DEBUG] depth: 2 entries: 4600 PostInstall Configured: (B, /dev/mmcblk0p5, /dev/mmcblk0p4, /dev/mmcblk0p12) Current Kernel Info: sysname(Linux) nodename(localhost) release(4.4.161-15348-ge2d0d39f1235) version(#1 SMP PREEMPT Tue Oct 23 20:28:42 PDT 2018) machine(x86_64) lsb-release inside the new rootfs: CHROMEOS_ARC_ANDROID_SDK_VERSION=28 CHROMEOS_ARC_VERSION=5087515 CHROMEOS_AUSERVER=https://tools.google.com/service/update2 CHROMEOS_BOARD_APPID={01906EA2-3EB2-41F1-8F62-F0B7120EFD2E} CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1} CHROMEOS_DEVSERVER= CHROMEOS_RELEASE_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1} CHROMEOS_RELEASE_BOARD=eve-signed-mpkeys CHROMEOS_RELEASE_BRANCH_NUMBER=0 CHROMEOS_RELEASE_BUILDER_PATH=eve-release/R72-11189.0.0 CHROMEOS_RELEASE_BUILD_NUMBER=11189 CHROMEOS_RELEASE_BUILD_TYPE=Official Build CHROMEOS_RELEASE_CHROME_MILESTONE=72 CHROMEOS_RELEASE_DESCRIPTION=11189.0.0 (Official Build) canary-channel eve CHROMEOS_RELEASE_KEYSET=mp CHROMEOS_RELEASE_NAME=Chrome OS CHROMEOS_RELEASE_PATCH_NUMBER=0 CHROMEOS_RELEASE_TRACK=canary-channel CHROMEOS_RELEASE_VERSION=11189.0.0 DEVICETYPE=CHROMEBOOK GOOGLE_RELEASE=11189.0.0 Set boot target to /dev/mmcblk0p5: Partition 5, Slot B SetImage KERNEL_CONFIG: console= loglevel=7 init=/sbin/init cros_secure oops=panic panic=-1 root=/dev/dm-0 rootwait ro dm_verity.error_behavior=3 dm_verity.max_bios=-1 dm_verity.dev_wait=1 dm="1 vroot none ro 1,0 4710400 verity payload=PARTUUID=%U/PARTNROFF=1 hashtree=PARTUUID=%U/PARTNROFF=1 hashstart=4710400 alg=sha1 root_hexdigest=73ae7f3db69dfd124ad0114589ead0a1b85ec4f0 salt=a4e18be2fe283eeccfee8d38620e4ceddde314270a11f9f44cda122f2ed27b3b" noinitrd vt.global_cursor_default=0 kern_guid=%U add_efi_memmap boot=local noresume noswap i915.modeset=1 tpm_tis.force=1 tpm_tis.interrupts=0 nmi_watchdog=panic,lapic disablevmx=off Setting up verity. Finished after 20 seconds. Clearing network driver boot cache: /var/lib/preload-network-drivers. Syncing filesystems before changing boot order... Finished after 1 seconds. Updating Partition Table Attributes using CgptManager... Updated kernel 4 with Successful = 0 and NumTriesLeft = 6 Checking /mnt/stateful_partition/unencrypted permission. Permission is ok. Unlinked file /var/lib/ureadahead/pack Starting board post install script (//usr/sbin/board-postinst /) Command: //usr/sbin/board-postinst / Finished after 0 seconds. Board post install succeeded Starting firmware updater (//usr/sbin/chromeos-firmwareupdate --mode=autoupdate) Command: //usr/sbin/chromeos-firmwareupdate --mode=autoupdate Starting Google_Eve firmware updater v4 (autoupdate)... - Updater package: [RO:Google_Eve.9584.107.0 RW:Google_Eve.9584.174.0 / EC:eve_v1.1.6585-859d2ea99] - Current system: [RO:Google_Eve.9584.107.0 , ACT:Google_Eve.9584.174.0 / EC:eve_v1.1.6643-e18316ecc] - Write protection: Hardware: ON, Software: Main=off Read the current FLMSTR values... flashrom v0.9.9 : 02aea36 : Oct 05 2018 22:43:50 UTC on Linux 4.4.161-15348-ge2d0d39f1235 (x86_64) Calibrating delay loop... OK. coreboot table found at 0x7aabe000. WARNING: SPI Configuration Lockdown activated. Reading flash... SUCCESS No need to modify image file. * invoke: flashrom -p host -r _current/bios.bin Firmware update available: Google_Eve.9584.174.0. * invoke: flashrom -p host --fast-verify -w bios.bin -i RW_SECTION_B On reboot EC update may occur. Firmware update (autoupdate) completed. Finished after 18 seconds. Firmware update completed. Starting command: //usr/share/cros/cr50-set-board-id.sh unknown Command: //usr/share/cros/cr50-set-board-id.sh unknown ERROR: Board ID and flag have already been set. ignored: cr50-set-board-id failure (2). Finished after 0 seconds. Failed Command: //usr/share/cros/cr50-set-board-id.sh unknown - Exit Code 2 Starting command: //usr/share/cros/cr50-update.sh / Command: //usr/share/cros/cr50-update.sh / Finished after 15 seconds. cr50 setup complete. ChromeosChrootPostinst complete Syncing filesystem at end of postinst...
,
Oct 24
once i have 11187.0.0 on the system, it's easy to keep reproducing: - invalidate the update by corrupting the new kernel: dd if=/dev/zero of=/dev/mmcblk0p4 - clear the crashes: rm -rf /var/spool/crash/* - clear the logs: > /var/log/messages > /var/log/update_engine.log - reboot - check that you're still in 11187.0.0 - in a new shell run: cd /var/log tail -f messages update_engine.log - trigger a system update again - watch the log files. see that the timestamps line up and that the crashes start showing up in the middle of the postinst run.
,
Oct 24
here's an strace of the failing UE run. i've filtered out a lot of noise, but hopefully not too much to show the issue.
we see the first SIGSYS failure here:
5049 --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0x795e0fff76e7, si_syscall=__NR_readlink, si_arch=AUDIT_ARCH_X86_64} ---
5049 +++ killed by SIGSYS (core dumped) +++
pid 5049 is indeed readlink:
5049 execve("/usr/bin/readlink", ["readlink", "/tmp/tmp.Be4FUlcWox/bin/flashrom"], 0x5c3c642df810 /* 10 vars */) = 0
so lets look up to see where that call came from. we see:
- pid 5049 was forked (cloned) from pid 5048
- pid 5048 is the flashrom shell wrapper:
5048 execve("/tmp/tmp.Be4FUlcWox/bin/flashrom", ["flashrom", "-p", "host", "-i", "RW_NVRAM:/tmp/flashrom_r2icQw", "-r"], 0x7fff90e20f38 /* 10 vars */) = 0
- pid 5048 was forked (cloned) from pid 5046
- pid 5046 is system mosys:
5046 execve("/usr/sbin/mosys", ["/usr/sbin/mosys", "nvram", "vboot", "write", "601000000102001a00feff0000ffffe6"], 0x7ffea67a3878 /* 10 vars */) = 0
- mosys locks itself down:
5046 prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) = 0
5046 open("/usr/share/policy/mosys-seccomp.policy", O_RDONLY|O_CLOEXEC) = 4
5046 seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, {len=133, filter=0x55c5586eb000}) = 0
mosys likes to install a restrictive seccomp filter and then run a lot of programs underneath it. we know this has been problematic, but that's where we're at. it's a bug in mosys.
for posterity, the mosys call came from:
5044 execve("/tmp/tmp.Be4FUlcWox/bin/crossystem", ["crossystem", "fw_try_next=B"], 0x56a111b34960 /* 10 vars */) = 0
which i think came from the updater4.sh logic:
# Update Engine - Received Update
mode_autoupdate() {
...
if [ -n "$try_next" ]; then
cros_set_prop "fw_try_next=$try_next"
fi
but i'm done debugging this :p
,
Oct 25
vapier@; thanks for the work! I need to determine if this was already pushed to Beta or Stable for M70 or earlier for consideration of M71. Given the earlier reports I assume it was introduced prior to M71. Any way to confirm? I'd rather not block Beta on an issue already out there. Thanks.
,
Oct 25
if it's because of the CL i noted in comment #12, that landed in 11032.0.0, so it prob broke around that time
,
Oct 25
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7f854db22fd877ac891904cc8f7f743e25cd9ed2 commit 7f854db22fd877ac891904cc8f7f743e25cd9ed2 Author: Mike Frysinger <vapier@chromium.org> Date: Thu Oct 25 03:07:20 2018 crash: improve error message for unknown files The UserCrashTest._run_crasher_process_and_analyze helper goes through the crash spool directory and makes sure all files created with reports are known/expected. If it finds any it doesn't know about, it fails. Unfortunately, the "unknown file" code path is geared towards a failure mode that no longer really comes up: it assumes that if a file is new, it's because breakpad itself created a minidump and left it behind. So any time crash-reporter creates a new file, we get the confusing error: ERROR: crasher_nobreakpad did generate breakpad minidump It doesn't tell us what this new file is (could be a log file), and the grammar being slightly broken adds to the confusion. Further, if crash-reporter creates two new files, we get the error: ERROR: Breakpad wrote multiple minidumps Which doesn't make sense if crash-reporter is creating logs and not any breakpad related files. Lets nuke all this special case logic and just add a generic "unknown file" whenever we see files we don't expect. Now we'll get an explicit error about the file that we don't handle: ERROR: Crash reporter created an unknown file: crasher_nobreakpad.20181023.131230.12664.proclog BUG= chromium:889552 TEST=`test_that ... logging_UserCrash` still passes, and writes a better error message w/CL:1296089 Change-Id: Ieedd59b8b21264e669da4df18a96ac8773dace1e Reviewed-on: https://chromium-review.googlesource.com/1297191 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Commit-Ready: Mike Frysinger <vapier@chromium.org> Tested-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Ben Chan <benchan@chromium.org> [modify] https://crrev.com/7f854db22fd877ac891904cc8f7f743e25cd9ed2/client/cros/crash/user_crash_test.py
,
Oct 25
Thanks for #35... This appears to be limited to Nocturne when reviewing the crash report: https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+product.Version%3D%2711151.4.0%27+AND+stable_signature%3D%2783c89e4e_5292d683_f8ff28b9_65f9c78f_8812b52e%27 If that's the case let's add [Nocturne] to the title; if'll help tremendously for the sake of release planning.
,
Oct 25
,
Oct 25
it isn't limited to Nocturne. i reproduced on eve, and a bunch of the reports i saw above were for eve. there's nothing in the CLs listed that are board specific.
,
Oct 25
I have reproduced on my eve and working on it.
,
Oct 25
Okay, I'm seeing this for Eve as well. The earlier query must have filtered on the board. The better query is here: https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+product.Version%3D%2711151.4.0%27+AND+stable_signature%3D%2783c89e4e_5292d683_f8ff28b9_65f9c78f_8812b52e%27 This isn't a low hitter crash, but it's not esp heavy either. Right now this would be the only beta blocker so not sure if it's worth that block. Restricting the view to Googlers as we do with crash analysis.
,
Oct 25
> Restricting the view to Googlers as we do with crash analysis. i don't think that's the policy. RVG is used when the info is either known or unknown to be sensitive, or has security implications. we know in this case the info isn't sensitive and there are no security implications. so RVG is unnecessary.
,
Oct 25
Re #21, it is not clear to me what would be the net daily total increase in crashes.
,
Oct 25
crrev.com/c/1299955 in flight should fix the problem
,
Oct 25
Per #42, I did that to mimik the crash template; fine if we remove it.
,
Oct 25
FYI, my plan is to skip Eve and Nocturne on the Dev / Beta RCs but proceed with other boards; thanks.
,
Oct 25
,
Oct 26
Adding [Eve, Nocturne] to summary to assist with release strategy.
,
Oct 27
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/mosys/+/70b8fae91316e742cc928c3d1f60b075bdeb7d39 commit 70b8fae91316e742cc928c3d1f60b075bdeb7d39 Author: Gregory Meinke <gmeinke@chromium.org> Date: Sat Oct 27 04:10:42 2018 Add readlink to mosys-seccomp.policy To test added a readlink call to mosys (and then removed) to verify the seccomp policy would cause the crash. With readlink added to the policy mosys did not crash. BUG= chromium:889552 TEST=local device tests Change-Id: I20cd5e7a85a84b01feed67b5d893d83af1914019 Reviewed-on: https://chromium-review.googlesource.com/1299955 Commit-Ready: Gregory Meinke <gmeinke@chromium.org> Tested-by: Gregory Meinke <gmeinke@chromium.org> Reviewed-by: C Shapiro <shapiroc@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/70b8fae91316e742cc928c3d1f60b075bdeb7d39/seccomp/mosys-seccomp-arm64.policy [modify] https://crrev.com/70b8fae91316e742cc928c3d1f60b075bdeb7d39/seccomp/mosys-seccomp-arm.policy [modify] https://crrev.com/70b8fae91316e742cc928c3d1f60b075bdeb7d39/seccomp/mosys-seccomp-amd64.policy
,
Oct 27
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/16823561072f35fc6245b90c41e5491ba7ed6011 commit 16823561072f35fc6245b90c41e5491ba7ed6011 Author: Mike Frysinger <vapier@chromium.org> Date: Sat Oct 27 04:10:42 2018 crash: overhaul report file processing The current code hardcodes each file type that we know might be part of a crash report which makes it hard to add more -- it requires a good amount of copy & paste. Lets rewrite the logic to have a set for required & optional file types in each crash report, and have a generic loop to gather all the files before we detect missing/unknown files. Now if we want to add more required or optional files, we only have to update the two class sets. BUG= chromium:889552 TEST=`test_that ... logging_UserCrash` still passes Change-Id: I8bbebdc8be607c58e130dfdc710ed2584b793d94 Reviewed-on: https://chromium-review.googlesource.com/1297192 Commit-Ready: Mike Frysinger <vapier@chromium.org> Tested-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Ben Chan <benchan@chromium.org> [modify] https://crrev.com/16823561072f35fc6245b90c41e5491ba7ed6011/client/cros/crash/user_crash_test.py
,
Oct 27
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f47c5e62851e8d8733bd452a0688326a0a18f30e commit f47c5e62851e8d8733bd452a0688326a0a18f30e Author: Mike Frysinger <vapier@chromium.org> Date: Sat Oct 27 11:34:23 2018 crash: allow crash reports to include "proclog" files We're starting to create a "proclog" file for each user crash report. Don't flag this as an error. BUG= chromium:889552 TEST=`test_that ... logging_UserCrash` passes w/CL:1296089 Change-Id: I2dcdb18b05d436c23f5e837208592d45646dfbd9 Reviewed-on: https://chromium-review.googlesource.com/1297193 Commit-Ready: Mike Frysinger <vapier@chromium.org> Tested-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Ben Chan <benchan@chromium.org> [modify] https://crrev.com/f47c5e62851e8d8733bd452a0688326a0a18f30e/client/cros/crash/user_crash_test.py
,
Oct 29
,
Oct 29
[Auto-generated comment by a script] We noticed that this issue is targeted for M-71; it appears the fix may have landed after branch point, meaning a merge might be required. Please confirm if a merge is required here - if so add Merge-Request-71 label, otherwise remove Merge-TBD label. Thanks.
,
Oct 29
pretty sure you'll want to cherry pick this to R71
,
Oct 29
This bug requires manual review: M71 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 30
I need some context on testing the changes on ToT before I can merge approve on M71. Also, is the user impact limited to the crash? Working to identify risk against the blocking status. Thanks
,
Oct 30
Also, I'm seeing a big drop for this crash with last week's DEV release; but that could be because we skipped Eve. Coincidence, or anticipated result? Note the first entry https://crash.corp.google.com/browse?q=product.name%3D%27ChromeOS%27&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&compProp=product.Version&v1=11151.4.0&v2=11151.11.0
,
Oct 31
Ping on #24. This is a high impact P1; need to move it along. Thanks :-)
,
Oct 31
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/aecfa24bfa878cff73328a95b49f045a8c8075d6 commit aecfa24bfa878cff73328a95b49f045a8c8075d6 Author: Mike Frysinger <vapier@chromium.org> Date: Wed Oct 31 15:17:29 2018 crash: gather process tree info for every user crash When a process crashes, knowing its execution environment (who spawned it and with what caps/seccomp/uid/gid settings) can help debug. We have such a crash now where its parent sets restrictive security settings it shouldn't have and we don't know who is spawning it. I've used "proclog" as we have some places currently that are sensitive to extensions like logging_UserCrash which only sees "log" when using ".proc.log", and the upload code uses "proclog" as a key for arbitrary files on the server. There shouldn't be any PII issues here as we're gathering from each parent process: - cmdline (the same as `top`) - status (contains numeric security/signal/memory stats) Plus we run the log through the common StripSensitiveData in case any parent process includes GUIDs or similar things in its command line. BUG= chromium:889552 TEST=unittests pass TEST=crashed a proc and checked the log to see it walk back up CQ-DEPEND=CL:1297193 Change-Id: Iba8ae8cbedaf8a1be56fc7d15809c12f62d5bea2 Reviewed-on: https://chromium-review.googlesource.com/1296089 Commit-Ready: Mike Frysinger <vapier@chromium.org> Tested-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Ben Chan <benchan@chromium.org> [modify] https://crrev.com/aecfa24bfa878cff73328a95b49f045a8c8075d6/crash-reporter/user_collector_base.cc [modify] https://crrev.com/aecfa24bfa878cff73328a95b49f045a8c8075d6/crash-reporter/crash_collector_test.cc [modify] https://crrev.com/aecfa24bfa878cff73328a95b49f045a8c8075d6/crash-reporter/crash_collector.cc [modify] https://crrev.com/aecfa24bfa878cff73328a95b49f045a8c8075d6/crash-reporter/crash_collector.h
,
Oct 31
I don't know what you're pinging about on comment #24 eng has requested meeting to 71 for just the mosys seccomp changes and is waiting for approval on that.
,
Nov 1
I was pinging on #24 to ensure we had adequate test coverage, esp in those instances where we could have reproduced and we could check the delta. I'm still asking if we've been able to verify prior to the approval...
,
Nov 1
we have no unittest coverage in this area and it's pretty unlikely to happen any time soon. so it's just manual verification and seeing if the crash/ rates in this space have gone down/disappeared since his CL's landed.
,
Nov 2
Okay, cool. Thanks for the details. Approving merge to M71 Chrome OS.
,
Nov 5
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 5
Patching initial CL:1318709 in now. Will need to also patch in CL:1306534.
,
Nov 5
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/mosys/+/942f163122808f35400e8a8e7d6a5473a5700121 commit 942f163122808f35400e8a8e7d6a5473a5700121 Author: Gregory Meinke <gmeinke@chromium.org> Date: Mon Nov 05 22:50:39 2018 Add readlink to mosys-seccomp.policy To test added a readlink call to mosys (and then removed) to verify the seccomp policy would cause the crash. With readlink added to the policy mosys did not crash. BUG= chromium:889552 TEST=local device tests Change-Id: I20cd5e7a85a84b01feed67b5d893d83af1914019 Previous-Reviewed-on: https://chromium-review.googlesource.com/1299955 (cherry picked from commit 70b8fae91316e742cc928c3d1f60b075bdeb7d39) Previous-Reviewed-on: https://chromium-review.googlesource.com/1318709 (cherry picked from commit ee214757c100f1adb694344a1f9c13483669dfce) Previous-Reviewed-on: https://chromium-review.googlesource.com/1318709 (cherry picked from commit 6d7175d69564b7929fe1520e33333a851839c54b) Reviewed-on: https://chromium-review.googlesource.com/c/1318709 Reviewed-by: C Shapiro <shapiroc@chromium.org> Commit-Queue: Gregory Meinke <gmeinke@chromium.org> Tested-by: Gregory Meinke <gmeinke@chromium.org> Trybot-Ready: Gregory Meinke <gmeinke@chromium.org> [modify] https://crrev.com/942f163122808f35400e8a8e7d6a5473a5700121/seccomp/mosys-seccomp-arm64.policy [modify] https://crrev.com/942f163122808f35400e8a8e7d6a5473a5700121/seccomp/mosys-seccomp-arm.policy [modify] https://crrev.com/942f163122808f35400e8a8e7d6a5473a5700121/seccomp/mosys-seccomp-amd64.policy
,
Nov 5
https://chromium-review.googlesource.com/c/chromiumos/platform/mosys/+/1318710 has also landed in R71, it didn't reference this bug so it didn't auto-update.
,
Nov 5
Not sure what labels to remove, so just marking as fixed until I can verify.
,
Nov 5
remove Merge-Approved-71 label ... any other hints would be greatly appreciated. |
|||||||||||||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||||||||||||
Comment 1 by morlovich@chromium.org
, Sep 26