Crash - raise-33dde497, memd, __vasprintf_chk |
||||||||||||||||||||||
Issue descriptionraise-33dde497 accounting for 44% of all ChromeOS crashes on 11151.4.0. Crash/ ifno: https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+product.Version%3D%2711151.4.0%27+AND+stable_signature%3D%27raise-33dde497%27
,
Oct 18
Associated with https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1279105 ?
,
Oct 18
,
Oct 18
,
Oct 18
It's not related to that CL. We don't yet surface crash reports from sommelier, since it runs in the crostini guest. For memd, semenzato@ should take a look.
,
Oct 18
Okay, feel free to remove from cc; wasn't a lot to go on here.... Thanks for looking.
,
Oct 18
,
Oct 19
Updating to P1 as this is marked as RBB and beta is coming very soon.
,
Oct 19
Thanks. Looking now.
,
Oct 19
Probably fixed by https://crrev.com/c/1285117, pushed 10/16. I will now make sure it's the same bug.
,
Oct 19
I am not sure it's the same bug because I see 3 crashes on 11151.6.0 (all three on a mighty) with the same signature. https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+product.Version%3D%2711151.6.0%27+AND+EXISTS+%28SELECT+1+FROM+UNNEST%28productdata%29+WHERE+Key%3D%27exec_name%27+AND+Value%3D%27memd%27%29 Unless that same signature can be produced by a different failure. Have to keep looking.
,
Oct 19
I opened issue 897207 for another memd crash: 0x00007f422edcb6b0 (libminijailpreload.so -system.c:367 ) setup_mount_destination probably not very common (from version 11021.51.0) but will look later.
,
Oct 19
These are all happening on ARM platforms, so it's probably an ARM-specific syscall that's not whitelisted.
,
Oct 19
if it was a seccomp failure, i would have expected SIGSYS crash rather than an abort(). it's not like userland can catch SIGSYS signals when they come from seccomp.
,
Oct 19
#14 yes thank you Mike.
,
Oct 19
Reproduced:
2018-10-19T12:05:47.638419-07:00 ERR memd[3725]: memd: panicked at 'memd failed: StringError("missing \'pgalloc_dma\' from vmstats")', libcore/result.rs:945:5
2018-10-19T12:05:48.032157-07:00 WARNING crash_reporter[8434]: [user] Received crash notification for memd[3725] sig 6, user 0 group 0 (developer build - not testing - always dumping)
The code is supposed to look for pgalloc_dma32 on ARM. Not sure why it fails, but I'll have a CL shortly.
,
Oct 21
Two CLs for this: https://crrev.com/c/1292189 exit instead of aborting https://crrev.com/c/1292199 tolerate variations in /proc/vmstat Needs one more CL for the Rust build (added tempdir crate) I am a little worried because this could be happening in R70 as well.
,
Oct 21
Geo, are we good with memd crash rates in R70? Thanks!
,
Oct 21
#17 never mind the ebuild---the tempdir crate is already in the manifest (the right version too) so we just need those two CLs. In fact only the second one is necessary for removing the crashes. The first one improves reporting.
,
Oct 22
This is a beta blocker for M71 and will need to be merged there as well. Can you tag for a M71 merge request after the CLs are identified and tested? Sooner is better. Thanks.
,
Oct 22
The CLs are those in #17. Waiting for review, but I have been testing them, looking good so far.
,
Oct 23
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/3f640c723aea05de531f9ddb27cc5d5f280a16e5 commit 3f640c723aea05de531f9ddb27cc5d5f280a16e5 Author: Luigi Semenzato <semenzato@chromium.org> Date: Tue Oct 23 01:59:36 2018 memd: add crates to ebuild and manifest The new crates all exist on the local mirror already. BUG= chromium:896842 TEST=emerge-xxx Change-Id: I1e80e01ef8396d58a967e7b50b5b62322b77bddb Reviewed-on: https://chromium-review.googlesource.com/c/1295319 Reviewed-by: Luigi Semenzato <semenzato@chromium.org> Commit-Queue: Luigi Semenzato <semenzato@chromium.org> Tested-by: Luigi Semenzato <semenzato@chromium.org> [modify] https://crrev.com/3f640c723aea05de531f9ddb27cc5d5f280a16e5/chromeos-base/memd/memd-9999.ebuild [modify] https://crrev.com/3f640c723aea05de531f9ddb27cc5d5f280a16e5/chromeos-base/memd/Manifest
,
Oct 23
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161 commit c0f9f7063c80ca6e9ba4c56688fb3e65a5785161 Author: Luigi Semenzato <semenzato@chromium.org> Date: Tue Oct 23 02:16:26 2018 metrics: memd: tolerate variations in /proc/vmstat The old code did not account correctly for different names of memory zones depending on architecture and kernel version. This CL protects against such variations. Also accumulates the page allocation numbers across zones, which is usually how we look at these numbers. This also adds a test for reading /proc/vmstat, and, as a side effect, cleans up the test environment setup and teardown. The main change is the use of a unique temp directory for the testing root, which allows all tests to run in parallel. BUG= chromium:896842 TEST=new unit test, plus ran on minnie and cyan Change-Id: I4ad0916485df8ad11aa9176817ca48968b929392 Reviewed-on: https://chromium-review.googlesource.com/c/1294812 Reviewed-by: Luigi Semenzato <semenzato@chromium.org> Commit-Queue: Luigi Semenzato <semenzato@chromium.org> Tested-by: Luigi Semenzato <semenzato@chromium.org> [modify] https://crrev.com/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161/metrics/memd/Cargo.lock [modify] https://crrev.com/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161/metrics/memd/src/main.rs [modify] https://crrev.com/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161/metrics/memd/src/vmstat_content [modify] https://crrev.com/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161/metrics/memd/src/test.rs [modify] https://crrev.com/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161/metrics/memd/Cargo.toml
,
Oct 23
Please note that this was merged PRIOR to requesting merge approval. The TPMs need to review changes prior to merge submission. This was merged to M70 and M71 without our review. What's the associated risk with these changes? By the way, thanks for jumping on the issue and working to resolve it, but we need to guarantee low risk prior to the merge (esp for M70). Thanks
,
Oct 23
Ah yes totally my bad, sorry. I was distracted by M70. Please let me know if there is any further information I can provide or anything else.
,
Oct 23
Like, for the purpose of bookkeeping, should I do a merge-request now?
,
Oct 23
Let's do it.
,
Oct 23
Also, to be clear, I have been in conversation with Geo regarding the R70 merge.
,
Oct 23
For the sake of bookkeeping, can you add detail on testing prior to the merge ;-)
,
Oct 23
This was tested by unit tests and manually on ToT and R70. Memd on R71 has been tracking ToT so I didn't actually test on R71. The external dependencies for memd are: - kernel (memd reads a number of /proc and /sys entries) - anomaly collector (the anomaly collector sends D-Bus messages to memd when it detects OOM kills) - chrome (chrome sends D-Bus messages to memd upon tab discards) I would be aware of any changes to the anomaly collector, and most likely also in the relevant Chrome code if those changes impacted my code directly. Since this all works on ToT, I deemed it unrealistic that R71 was changed and then changed back w.r.t. these functions.
,
Oct 23
More testing details. Memd has a "sleepy" mode when memory pressure is deemed low, and it does very little then (checks for increased pressure every 2 seconds). When memory pressure is high, memd enters a high-speed sampling mode. I test both modes. I test the high-speed sampling mode in two ways: by passing the "always-poll-fast" flag to memd, and by running the platform_MemoryPressure test. After the second test, I verify that memd logs were collected and contain tab discard events. I test at least on one arm and one x86 platform. There are also kernel-by-kernel variations in the /proc and /sys filesystems, which have caused bugs in the past. I do NOT test all combinations of kernels and architectures. I am planning to write autotest to do that.
,
Oct 23
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/74ffc178fe2a940e5530ddb1d263e5436a85328b commit 74ffc178fe2a940e5530ddb1d263e5436a85328b Author: Luigi Semenzato <semenzato@chromium.org> Date: Tue Oct 23 20:01:46 2018 metrics: memd: tolerate variations in /proc/vmstat The old code did not account correctly for different names of memory zones depending on architecture and kernel version. This CL protects against such variations. Also accumulates the page allocation numbers across zones, which is usually how we look at these numbers. This also adds a test for reading /proc/vmstat, and, as a side effect, cleans up the test environment setup and teardown. The main change is the use of a unique temp directory for the testing root, which allows all tests to run in parallel. BUG= chromium:896842 TEST=new unit test, plus ran on minnie and cyan Change-Id: I4ad0916485df8ad11aa9176817ca48968b929392 Reviewed-on: https://chromium-review.googlesource.com/1292199 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/74ffc178fe2a940e5530ddb1d263e5436a85328b/metrics/memd/Cargo.lock [modify] https://crrev.com/74ffc178fe2a940e5530ddb1d263e5436a85328b/metrics/memd/src/main.rs [modify] https://crrev.com/74ffc178fe2a940e5530ddb1d263e5436a85328b/metrics/memd/src/vmstat_content [modify] https://crrev.com/74ffc178fe2a940e5530ddb1d263e5436a85328b/metrics/memd/src/test.rs [modify] https://crrev.com/74ffc178fe2a940e5530ddb1d263e5436a85328b/metrics/memd/Cargo.toml
,
Oct 23
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/c50a87d21ab2b79a0b59e7f80460355af15c20ad commit c50a87d21ab2b79a0b59e7f80460355af15c20ad Author: Sonny Rao <sonnyrao@chromium.org> Date: Tue Oct 23 20:01:41 2018 crash: collect any memd logs from /var/log/messages memd is outputing crash information to syslog when it panics, so this looks at the last 50 lines of /var/log/messages for any lines that contain memd BUG= chromium:896842 TEST=none Change-Id: I4526703f7eea1813fef2eb2c6908e84de0a86700 Reviewed-on: https://chromium-review.googlesource.com/1295574 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/c50a87d21ab2b79a0b59e7f80460355af15c20ad/crash-reporter/crash_reporter_logs.conf
,
Oct 23
Please don't submit CLs without getting merge approval first. Granting approval for M71 since this was already merged. Please tag as fixed if verified.
,
Oct 23
Yes---my apologies again, it won't happen again. Thanks.
,
Oct 26
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/827dcf6b921d03ecdf690a711f05f8952e990905 commit 827dcf6b921d03ecdf690a711f05f8952e990905 Author: Luigi Semenzato <semenzato@chromium.org> Date: Fri Oct 26 01:19:17 2018 metrics: memd: tolerate variations in /proc/vmstat The old code did not account correctly for different names of memory zones depending on architecture and kernel version. This CL protects against such variations. Also accumulates the page allocation numbers across zones, which is usually how we look at these numbers. This also adds a test for reading /proc/vmstat, and, as a side effect, cleans up the test environment setup and teardown. The main change is the use of a unique temp directory for the testing root, which allows all tests to run in parallel. BUG= chromium:896842 TEST=new unit test, plus ran on minnie and cyan Change-Id: I4ad0916485df8ad11aa9176817ca48968b929392 Reviewed-on: https://chromium-review.googlesource.com/c/1294813 Trybot-Ready: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Luigi Semenzato <semenzato@chromium.org> Commit-Queue: Luigi Semenzato <semenzato@chromium.org> Tested-by: Luigi Semenzato <semenzato@chromium.org> [modify] https://crrev.com/827dcf6b921d03ecdf690a711f05f8952e990905/metrics/memd/Cargo.lock [modify] https://crrev.com/827dcf6b921d03ecdf690a711f05f8952e990905/metrics/memd/src/main.rs [modify] https://crrev.com/827dcf6b921d03ecdf690a711f05f8952e990905/metrics/memd/src/vmstat_content [modify] https://crrev.com/827dcf6b921d03ecdf690a711f05f8952e990905/metrics/memd/src/test.rs [modify] https://crrev.com/827dcf6b921d03ecdf690a711f05f8952e990905/metrics/memd/Cargo.toml
,
Oct 26
Please note---this wasn't actually merged on 71. It is now. Sorry for any confusion.
,
Oct 29
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 29
,
Oct 31
We should also merge the change to the crash reporter to get the memd logs to 71 https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1295574 May I merge this to 71 also?
,
Oct 31
This bug requires manual review: M71 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 1
I assume the merge request is for the CL in #40 as an add on. did this get tested? Seems safe enough..
,
Nov 1
re #42 Yes, it's working I've looked at a crashe from after it's merged on ToT and it contains the log snippet
,
Nov 1
Thanks. Approving merge to M71 Chrome OS.
,
Nov 1
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/f62dcb76f4e7fe4ff3fbafd88bdfdbbceefcc7a9 commit f62dcb76f4e7fe4ff3fbafd88bdfdbbceefcc7a9 Author: Sonny Rao <sonnyrao@chromium.org> Date: Thu Nov 01 19:35:39 2018 crash: collect any memd logs from /var/log/messages memd is outputing crash information to syslog when it panics, so this looks at the last 50 lines of /var/log/messages for any lines that contain memd BUG= chromium:896842 TEST=none Change-Id: I4526703f7eea1813fef2eb2c6908e84de0a86700 Reviewed-on: https://chromium-review.googlesource.com/1295574 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> (cherry picked from commit c50a87d21ab2b79a0b59e7f80460355af15c20ad) Reviewed-on: https://chromium-review.googlesource.com/c/1313368 Reviewed-by: Sonny Rao <sonnyrao@chromium.org> Commit-Queue: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/f62dcb76f4e7fe4ff3fbafd88bdfdbbceefcc7a9/crash-reporter/crash_reporter_logs.conf
,
Nov 5
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 5
|
||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||
Comment 1 by kbleicher@google.com
, Oct 18