New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 896842 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 23
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Crash - raise-33dde497, memd, __vasprintf_chk

Project Member Reported by dgagnon@google.com, Oct 18

Issue description

Labels: M-71
Not sure if the dump is helpful:

	0xa9a07b36	(libc-2.23.so + 0x00016b36 )	__libc_do_syscall
0xa9a165b1	(libc-2.23.so -raise.c:54 )	raise
0xa9a17379	(libc-2.23.so -abort.c:89 )	abort
0x0fa26196	(memd + 0x00034196 )	
0x0fa433b6	(memd + 0x000513b6 )	
0x0fa434aa	(memd + 0x000514aa )	
0x0fa3f16d	(memd + 0x0004d16d )	
0x0fa195e2	(memd + 0x000275e2 )	
0x0fa433b6	(memd + 0x000513b6 )	
0x0fa1b312	(memd + 0x00029312 )	
0x0fa28ab6	(memd + 0x00036ab6 )	
0x0fa434aa	(memd + 0x000514aa )	
0x0fa433b6	(memd + 0x000513b6 )	
0x0fa4336e	(memd + 0x0005136e )	
0x0fa2da22	(memd + 0x0003ba22 )	
0x0fa3f16d	(memd + 0x0004d16d )	
0x0fa4336e	(memd + 0x0005136e )	
0x0fa0beb6	(memd + 0x00019eb6 )	
0x0fa2dbd2	(memd + 0x0003bbd2 )	
0xa9a94757	(libc-2.23.so -vasprintf_chk.c:84 )	__vasprintf_chk
0x10e5c876		
0x0fa3ea96	(memd + 0x0004ca96 )	
0x0fa434aa	(memd + 0x000514aa )	
0x0fa3ea96	(memd + 0x0004ca96 )	
0x0fa3e6dc	(memd + 0x0004c6dc )	


Cc: smbar...@chromium.org reve...@chromium.org vapier@chromium.org
Associated with https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1279105 ?


Cc: djmm@chromium.org jkardatzke@chromium.org dgagnon@chromium.org
Labels: ReleaseBlock-Beta
It's not related to that CL. We don't yet surface crash reports from sommelier, since it runs in the crostini guest.

For memd, semenzato@ should take a look.
Okay, feel free to remove from cc; wasn't a lot to go on here....

Thanks for looking.
Cc: -jkardatzke@chromium.org -smbar...@chromium.org -reve...@chromium.org
Owner: semenzato@chromium.org
Labels: -Pri-3 Pri-1
Updating to P1 as this is marked as RBB and beta is coming very soon.
Thanks.  Looking now.
Probably fixed by https://crrev.com/c/1285117, pushed 10/16.  I will now make sure it's the same bug.
I am not sure it's the same bug because I see 3 crashes on 11151.6.0 (all three on a mighty) with the same signature.

https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+product.Version%3D%2711151.6.0%27+AND+EXISTS+%28SELECT+1+FROM+UNNEST%28productdata%29+WHERE+Key%3D%27exec_name%27+AND+Value%3D%27memd%27%29

Unless that same signature can be produced by a different failure.  Have to keep looking.
I opened issue 897207 for another memd crash:

	0x00007f422edcb6b0	(libminijailpreload.so -system.c:367 )	setup_mount_destination

probably not very common (from version 11021.51.0) but will look later.
These are all happening on ARM platforms, so it's probably an ARM-specific syscall that's not whitelisted.
if it was a seccomp failure, i would have expected SIGSYS crash rather than an abort().  it's not like userland can catch SIGSYS signals when they come from seccomp.
Cc: sonnyrao@chromium.org
#14 yes thank you Mike.

Status: Started (was: Untriaged)
Reproduced:

2018-10-19T12:05:47.638419-07:00 ERR memd[3725]: memd: panicked at 'memd failed: StringError("missing \'pgalloc_dma\' from vmstats")', libcore/result.rs:945:5
2018-10-19T12:05:48.032157-07:00 WARNING crash_reporter[8434]: [user] Received crash notification for memd[3725] sig 6, user 0 group 0 (developer build - not testing - always dumping)


The code is supposed to look for pgalloc_dma32 on ARM.  Not sure why it fails, but I'll have a CL shortly.
Two CLs for this:

https://crrev.com/c/1292189  exit instead of aborting
https://crrev.com/c/1292199  tolerate variations in /proc/vmstat

Needs one more CL for the Rust build (added tempdir crate)

I am a little worried because this could be happening in R70 as well.
Cc: geohsu@chromium.org
Geo, are we good with memd crash rates in R70?  Thanks!
#17 never mind the ebuild---the tempdir crate is already in the manifest (the right version too) so we just need those two CLs.  In fact only the second one is necessary for removing the crashes.  The first one improves reporting.
This is a beta blocker for M71 and will need to be merged there as well.  Can you tag for a M71 merge request after the CLs are identified and tested?  Sooner is better.  Thanks.
The CLs are those in #17.  Waiting for review, but I have been testing them, looking good so far.
Project Member

Comment 22 by bugdroid1@chromium.org, Oct 23

Labels: merge-merged-release-R70-11021.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/3f640c723aea05de531f9ddb27cc5d5f280a16e5

commit 3f640c723aea05de531f9ddb27cc5d5f280a16e5
Author: Luigi Semenzato <semenzato@chromium.org>
Date: Tue Oct 23 01:59:36 2018

memd: add crates to ebuild and manifest

The new crates all exist on the local mirror already.

BUG= chromium:896842 
TEST=emerge-xxx

Change-Id: I1e80e01ef8396d58a967e7b50b5b62322b77bddb
Reviewed-on: https://chromium-review.googlesource.com/c/1295319
Reviewed-by: Luigi Semenzato <semenzato@chromium.org>
Commit-Queue: Luigi Semenzato <semenzato@chromium.org>
Tested-by: Luigi Semenzato <semenzato@chromium.org>

[modify] https://crrev.com/3f640c723aea05de531f9ddb27cc5d5f280a16e5/chromeos-base/memd/memd-9999.ebuild
[modify] https://crrev.com/3f640c723aea05de531f9ddb27cc5d5f280a16e5/chromeos-base/memd/Manifest

Project Member

Comment 23 by bugdroid1@chromium.org, Oct 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161

commit c0f9f7063c80ca6e9ba4c56688fb3e65a5785161
Author: Luigi Semenzato <semenzato@chromium.org>
Date: Tue Oct 23 02:16:26 2018

metrics: memd: tolerate variations in /proc/vmstat

The old code did not account correctly for different
names of memory zones depending on architecture and
kernel version.  This CL protects against such variations.
Also accumulates the page allocation numbers across
zones, which is usually how we look at these numbers.

This also adds a test for reading /proc/vmstat, and,
as a side effect, cleans up the test environment setup
and teardown.  The main change is the use of a unique
temp directory for the testing root, which allows all
tests to run in parallel.

BUG= chromium:896842 
TEST=new unit test, plus ran on minnie and cyan

Change-Id: I4ad0916485df8ad11aa9176817ca48968b929392
Reviewed-on: https://chromium-review.googlesource.com/c/1294812
Reviewed-by: Luigi Semenzato <semenzato@chromium.org>
Commit-Queue: Luigi Semenzato <semenzato@chromium.org>
Tested-by: Luigi Semenzato <semenzato@chromium.org>

[modify] https://crrev.com/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161/metrics/memd/Cargo.lock
[modify] https://crrev.com/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161/metrics/memd/src/main.rs
[modify] https://crrev.com/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161/metrics/memd/src/vmstat_content
[modify] https://crrev.com/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161/metrics/memd/src/test.rs
[modify] https://crrev.com/c0f9f7063c80ca6e9ba4c56688fb3e65a5785161/metrics/memd/Cargo.toml

Please note that this was merged PRIOR to requesting merge approval.  The TPMs need to review changes prior to merge submission.  This was merged to M70 and M71 without our review.   What's the associated risk with these changes?

By the way, thanks for jumping on the issue and working to resolve it, but we need to guarantee low risk prior to the merge (esp for M70).  Thanks
Ah yes totally my bad, sorry.  I was distracted by M70.  Please let me know if there is any further information I can provide or anything else.
Like, for the purpose of bookkeeping, should I do a merge-request now?
Labels: Merge-Request-71
Let's do it.

Also, to be clear, I have been in conversation with Geo regarding the R70 merge.
For the sake of bookkeeping, can you add detail on testing prior to the merge ;-)
This was tested by unit tests and manually on ToT and R70.  Memd on R71 has been tracking ToT so I didn't actually test on R71.  The external dependencies for memd are:

- kernel (memd reads a number of /proc and /sys entries)
- anomaly collector (the anomaly collector sends D-Bus messages to memd when it detects OOM kills)
- chrome (chrome sends D-Bus messages to memd upon tab discards)

I would be aware of any changes to the anomaly collector, and most likely also in the relevant Chrome code if those changes impacted my code directly.  Since this all works on ToT, I deemed it unrealistic that R71 was changed and then changed back w.r.t. these functions.

More testing details.  Memd has a "sleepy" mode when memory pressure is deemed low, and it does very little then (checks for increased pressure every 2 seconds).  When memory pressure is high, memd enters a high-speed sampling mode.  I test both modes.  I test the high-speed sampling mode in two ways: by passing the "always-poll-fast" flag to memd, and by running the platform_MemoryPressure test.  After the second test, I verify that memd logs were collected and contain tab discard events.  I test at least on one arm and one x86 platform.  There are also kernel-by-kernel variations in the /proc and /sys filesystems, which have caused bugs in the past.  I do NOT test all combinations of kernels and architectures. I am planning to write autotest to do that.
Project Member

Comment 32 by bugdroid1@chromium.org, Oct 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/74ffc178fe2a940e5530ddb1d263e5436a85328b

commit 74ffc178fe2a940e5530ddb1d263e5436a85328b
Author: Luigi Semenzato <semenzato@chromium.org>
Date: Tue Oct 23 20:01:46 2018

metrics: memd: tolerate variations in /proc/vmstat

The old code did not account correctly for different
names of memory zones depending on architecture and
kernel version.  This CL protects against such variations.
Also accumulates the page allocation numbers across
zones, which is usually how we look at these numbers.

This also adds a test for reading /proc/vmstat, and,
as a side effect, cleans up the test environment setup
and teardown.  The main change is the use of a unique
temp directory for the testing root, which allows all
tests to run in parallel.

BUG= chromium:896842 
TEST=new unit test, plus ran on minnie and cyan

Change-Id: I4ad0916485df8ad11aa9176817ca48968b929392
Reviewed-on: https://chromium-review.googlesource.com/1292199
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/74ffc178fe2a940e5530ddb1d263e5436a85328b/metrics/memd/Cargo.lock
[modify] https://crrev.com/74ffc178fe2a940e5530ddb1d263e5436a85328b/metrics/memd/src/main.rs
[modify] https://crrev.com/74ffc178fe2a940e5530ddb1d263e5436a85328b/metrics/memd/src/vmstat_content
[modify] https://crrev.com/74ffc178fe2a940e5530ddb1d263e5436a85328b/metrics/memd/src/test.rs
[modify] https://crrev.com/74ffc178fe2a940e5530ddb1d263e5436a85328b/metrics/memd/Cargo.toml

Project Member

Comment 33 by bugdroid1@chromium.org, Oct 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/c50a87d21ab2b79a0b59e7f80460355af15c20ad

commit c50a87d21ab2b79a0b59e7f80460355af15c20ad
Author: Sonny Rao <sonnyrao@chromium.org>
Date: Tue Oct 23 20:01:41 2018

crash: collect any memd logs from /var/log/messages

memd is outputing crash information to syslog when it panics, so this
looks at the last 50 lines of /var/log/messages for any lines that
contain memd

BUG= chromium:896842 
TEST=none

Change-Id: I4526703f7eea1813fef2eb2c6908e84de0a86700
Reviewed-on: https://chromium-review.googlesource.com/1295574
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/c50a87d21ab2b79a0b59e7f80460355af15c20ad/crash-reporter/crash_reporter_logs.conf

Labels: -Merge-Request-71 Merge-Approved-71
Please don't submit CLs without getting merge approval first.

Granting approval for M71 since this was already merged.  

Please tag as fixed if verified.
Status: Fixed (was: Started)
Yes---my apologies again, it won't happen again.  Thanks.
Project Member

Comment 36 by bugdroid1@chromium.org, Oct 26

Labels: merge-merged-release-R71-11151.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/827dcf6b921d03ecdf690a711f05f8952e990905

commit 827dcf6b921d03ecdf690a711f05f8952e990905
Author: Luigi Semenzato <semenzato@chromium.org>
Date: Fri Oct 26 01:19:17 2018

metrics: memd: tolerate variations in /proc/vmstat

The old code did not account correctly for different
names of memory zones depending on architecture and
kernel version.  This CL protects against such variations.
Also accumulates the page allocation numbers across
zones, which is usually how we look at these numbers.

This also adds a test for reading /proc/vmstat, and,
as a side effect, cleans up the test environment setup
and teardown.  The main change is the use of a unique
temp directory for the testing root, which allows all
tests to run in parallel.

BUG= chromium:896842 
TEST=new unit test, plus ran on minnie and cyan

Change-Id: I4ad0916485df8ad11aa9176817ca48968b929392
Reviewed-on: https://chromium-review.googlesource.com/c/1294813
Trybot-Ready: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Luigi Semenzato <semenzato@chromium.org>
Commit-Queue: Luigi Semenzato <semenzato@chromium.org>
Tested-by: Luigi Semenzato <semenzato@chromium.org>

[modify] https://crrev.com/827dcf6b921d03ecdf690a711f05f8952e990905/metrics/memd/Cargo.lock
[modify] https://crrev.com/827dcf6b921d03ecdf690a711f05f8952e990905/metrics/memd/src/main.rs
[modify] https://crrev.com/827dcf6b921d03ecdf690a711f05f8952e990905/metrics/memd/src/vmstat_content
[modify] https://crrev.com/827dcf6b921d03ecdf690a711f05f8952e990905/metrics/memd/src/test.rs
[modify] https://crrev.com/827dcf6b921d03ecdf690a711f05f8952e990905/metrics/memd/Cargo.toml

Please note---this wasn't actually merged on 71.  It is now.  Sorry for any confusion.
Project Member

Comment 38 by sheriffbot@chromium.org, Oct 29

Cc: kbleicher@google.com
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Approved-71 Merge-Merged
Labels: -merge-merged-release-R71-11151.B Merge-Request-71
We should also merge the change to the crash reporter to get the memd logs to 71
https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1295574

May I merge this to 71 also?
Project Member

Comment 41 by sheriffbot@chromium.org, Oct 31

Labels: -Merge-Request-71 Hotlist-Merge-Review Merge-Review-71
This bug requires manual review: M71 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
I assume the merge request is for the CL in #40 as an add on.   

did this get tested? Seems safe enough..

re #42 Yes, it's working I've looked at a crashe from after it's merged on ToT and it contains the log snippet
Labels: -Merge-Review-71 Merge-Approved-71
Thanks.  Approving merge to M71 Chrome OS.
Project Member

Comment 45 by bugdroid1@chromium.org, Nov 1

Labels: merge-merged-release-R71-11151.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/f62dcb76f4e7fe4ff3fbafd88bdfdbbceefcc7a9

commit f62dcb76f4e7fe4ff3fbafd88bdfdbbceefcc7a9
Author: Sonny Rao <sonnyrao@chromium.org>
Date: Thu Nov 01 19:35:39 2018

crash: collect any memd logs from /var/log/messages

memd is outputing crash information to syslog when it panics, so this
looks at the last 50 lines of /var/log/messages for any lines that
contain memd

BUG= chromium:896842 
TEST=none

Change-Id: I4526703f7eea1813fef2eb2c6908e84de0a86700
Reviewed-on: https://chromium-review.googlesource.com/1295574
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>
(cherry picked from commit c50a87d21ab2b79a0b59e7f80460355af15c20ad)
Reviewed-on: https://chromium-review.googlesource.com/c/1313368
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>
Commit-Queue: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/f62dcb76f4e7fe4ff3fbafd88bdfdbbceefcc7a9/crash-reporter/crash_reporter_logs.conf

Project Member

Comment 46 by sheriffbot@chromium.org, Nov 5

Cc: sonny...@google.com
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Approved-71

Sign in to add a comment