New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 623116 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug
ker

Blocking:
issue 604267



Sign in to add a comment

[Tricky] OOM-induced kernel panic when running hardware_RamFio

Reported by vpalatin@chromium.org, Jun 24 2016

Issue description

Several recent x86 builds (tricky, edgar, zako?) have failed in the HWtest while executing the hardware_RamFio test :
'hardware_RamFio FAIL: Autotest client terminated unexpectedly: DUT rebooted during the test run.'
e.g.
https://uberchromegw.corp.google.com/i/chromeos/builders/tricky-chrome-pfq/builds/2025/steps/HWTest%20%5Bbvt-cq%5D/logs/stdio

The kernel triggers the OOM Killer after hardware_RamFio has been started, but then 
seems to timeout waiting on page operations and died in a hung-task panic :
https://pantheon.corp.google.com/m/cloudstorage/b/chromeos-autotest-results/o/67636120-chromeos-test/chromeos4-row2-rack3-host8/crashinfo.chromeos4-row2-rack3-host8/kernel.20160624.050808.0.kcrash

Luigi,
are you interested in looking at this ?
else please re-assign to me.
 

Comment 1 by vpalatin@google.com, Jun 24 2016

Blocking: 604267
Very kind of you.  I am taking a look now.

Comment 3 by h...@chromium.org, Jun 24 2016

Cc: ihf@chromium.org h...@chromium.org
The test measures MemFree, multiplies it by 0.95, then multiplies it further by 0.80, then allocates files on a RAM disk using that number, presumably as the max size, but in any case the system runs out of memory and kills a renderer, but unfortunately the memory shortage triggers a file system bug.  The shill process hangs on a write to a regular file (ext4) as well as the rs:main process (whatever that is, but I don't think it matters).  They are both waiting for a page.  Then finally loop0 causes the 2-minute hang at ext4_sync_file() -> jbd2_log_wait_commit() -> schedule().

Comment 5 by ihf@chromium.org, Jun 24 2016

The kernel issues should be fixed, but I really don't think this test is supposed to trigger OOM in the first place. To avoid this I am adding a 'stop ui' to it in https://chromium-review.googlesource.com/#/c/356143
(This test should not hold reving Chrome in the pfq.)
Owner: ihf@chromium.org
Thank you Ilja.

Comment 7 by ihf@chromium.org, Jun 24 2016

But there still is a kernel crash, do you think something can done about it?
Project Member

Comment 8 by bugdroid1@chromium.org, Jun 28 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/aca6f56640d21fbce88053c7b17903c4193a2bcc

commit aca6f56640d21fbce88053c7b17903c4193a2bcc
Author: Ilja H. Friedel <ihf@chromium.org>
Date: Fri Jun 24 18:50:31 2016

hardware_RamFio: refactor test.

Main change is to stop chrome before running the test to avoid its memory
usage as a dependency in activating OOM killer.

BUG= chromium:623116 
TEST=Ran on cyan, lars (pass), veyron_minnie (unrelated fail).

Change-Id: I71281b08b48e037dcb196a5f32d902a2ad454c18
Reviewed-on: https://chromium-review.googlesource.com/356143
Reviewed-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Vincent Palatin <vpalatin@chromium.org>
Reviewed-by: Puthikorn Voravootivat <puthik@chromium.org>
Commit-Queue: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/aca6f56640d21fbce88053c7b17903c4193a2bcc/client/site_tests/hardware_RamFio/hardware_RamFio.py

Cc: jhorwich@chromium.org achuith@chromium.org
Labels: -Pri-2 Pri-1
This failed in the PFQ today on tricky. Should this be marked Started?

https://uberchromegw.corp.google.com/i/chromeos/builders/tricky-chrome-pfq/builds/2127

Comment 10 by ihf@chromium.org, Jul 21 2016

Cc: posciak@chromium.org
Labels: -Pri-1 ker Pri-2
Owner: puthik@chromium.org
Status: Assigned (was: Untriaged)
Summary: [Tricky] OOM-induced kernel panic when running hardware_RamFio (was: OOM-induced kernel panic when running hardware_RamFio)
There was an OOM kill again
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/70348964-chromeos-test/chromeos4-row2-rack4-host3/crashinfo.chromeos4-row2-rack4-host3/

<12>[  678.602651] init: debugd main process (23443) killed by TERM signal
<4>[  687.555253] fio invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=-1000
<5>[  687.555271] Pid: 24003, comm: fio Tainted: G        WC   3.8.11 #1
<5>[  687.555281] Call Trace:
<5>[  687.555296]  [<ffffffffba4bc6b8>] dump_header.isra.11+0x94/0x1d0
<5>[  687.555311]  [<ffffffffba8c9f28>] ? _raw_spin_unlock+0xe/0x10
<5>[  687.555322]  [<ffffffffba4bcfd6>] out_of_memory+0x1bd/0x27b
<5>[  687.555335]  [<ffffffffba4c085e>] __alloc_pages_nodemask+0x602/0x736
<5>[  687.555349]  [<ffffffffba4d97c6>] handle_pte_fault+0x330/0x547
<5>[  687.555361]  [<ffffffffba4da853>] handle_mm_fault+0x16a/0x193
<5>[  687.555373]  [<ffffffffba429835>] __do_page_fault+0x1d4/0x38c
<5>[  687.555386]  [<ffffffffba4638fa>] ? set_next_entity+0x44/0x9b
<5>[  687.555398]  [<ffffffffba400c4c>] ? __switch_to+0x138/0x3b0
<5>[  687.555410]  [<ffffffffba8c9f4f>] ? _raw_spin_unlock_irq+0xe/0x11
<5>[  687.555423]  [<ffffffffba45c2ed>] ? finish_task_switch+0x69/0xa5
<5>[  687.555434]  [<ffffffffba429a1f>] do_page_fault+0xe/0x10
<5>[  687.555445]  [<ffffffffba8ca532>] page_fault+0x22/0x30

But overall the test is super stable now:
https://wmatrix.googleplex.com/unfiltered?hide_missing=True&releases=tot&tests=hardware_RamFio&days_back=100


Notice there were a bunch of other tests running before. Likely one of them leaked the memory
autotest runtest video_ChromeHWDecodeUsed
autotest runtest security_ASLR
autotest runtest build_RootFilesystemSize
autotest runtest video_VideoSanity
autotest runtest sound_infrastructure
autotest runtest platform_CheckCriticalProcesses
autotest runtest kernel_ProtocolCheck

My suggestion is to
a) add a check to the test that sufficient RAM as available before claiming it.
b) watch out for recent memory leaks (possibly in video).
hardware_RamFio.png
64.9 KB View Download
Status: Archived (was: Assigned)

Comment 12 by ketakid@google.com, Mar 18 2017

Labels: Pri-3
Status: Available (was: Archived)
Activating. Please assign to the right owner and the appropriate priority.
Status: WontFix (was: Available)
This is not relevant now. Hardware_RAMFio is all green.

Sign in to add a comment