New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 602515 link

Starred by 10 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocked on:
issue 475444



Sign in to add a comment

Devices get OOM when running kiosk apps (Rise, StratosMedia, Chrome Sign Builder)

Project Member Reported by juanra...@chromium.org, Apr 12 2016

Issue description

CHROMEOS_RELEASE_DESCRIPTION=7978.29.0 dev-channel zako test

HP Zako rebooted while running kiosk App Rise Player 15.10.7.9312 (App id:odjaaghiehpobimgdjjfofmablbaleem) 
 
log-041116-172807.tar.gz
13.1 MB Download
Cc: -juanra...@chromium.org -scunning...@chromium.org sduraisamy@chromium.org blumberg@chromium.org
Components: UI>Shell>Kiosk
Labels: OS-Chrome
Cc: kaznacheev@chromium.org scunning...@chromium.org
Zako's OS keep crashing, device has crashed three times in less than 24 hours while running a Kiosk App in longevity test. Please see attached logs.

log-041516-092319.tar.gz
15.3 MB Download

Comment 3 by xiy...@chromium.org, Apr 15 2016

Did a quick look at the first couple of reboots in the logs attached to #2. I don't see any evidence of Chrome crash. The reboots all happened after a ssh login.

It appears we are running longevity_Tracker auto test. Could the reboots be coming from the test?
Thanks for checking, the longevity_Tracker test runs for 23 hrs and doesn't request any reboots. Please see the traces from the linux terminal after the test is started and a few hours later it crashes.

https://docs.google.com/document/d/1DsrhAaBxM3coJiIdOkXPkwA_Mryj-6HIr9JhMbUzBi4/edit

Comment 5 by xiy...@chromium.org, Apr 15 2016

I cannot co-relate the autotest log with the logs in #2. The timestamp in autotest indicates test starts at 18:47:32, then DUT rebooted at 11:22:27 but I could not map this back to the diagnostic logs.

And there is no crash found in auto test log. So chrome is not crashing. There must be some one (app, chrome, or test scripts?) explicitly requested the device to reboot. 
I feel that I should provide more information about the test that it is running. The Zako device is running a graphic intensive kiosk app (Rise Player, App id: mfpgpdablffhbfofnhlpgmokokbahooi) and the device is also running longevity_Tracker which collects performance data (CPU/memory utilization and temperature) for 23 hots straight and then ends. There has been a license problem with the kiosk where it stops and returns to the sign in screen after a few hours running. 

Comment 7 by xiy...@chromium.org, Apr 15 2016

Are we seeing the license problem here (or any problems that cause the kiosk app to exit)?

Unfortunately, we don't log restart from kiosk apps. But that seems to be plausible from what I see in the logs.
yes, the license problem are making the App to stop running after a few hours of activity, the CPU utilization drops from about 40% and memory around 95% to about 1% respectively after the App quits.

Comment 9 by xiy...@chromium.org, Apr 18 2016

In this case, I'd say this is WAI since kiosk code is doing what it is supposed to do. 
Cc: aghuie@chromium.org
Summary: HP Zako rebooted while running Rise Vision Rise Player kiosk App (was: HP Zako rebooted while running kiosk App)
Matt, Raj, Alex: would you get us the Rise Vision Rise Player license keys we need to continue Longevity testing on the Rise Player?  We need at least two keys. More would be appreciated, as backups, in case one or more of them fails.
From Rise:

We have compiled two schedule types for typical and the more media intensive presentations.  You can input the following Display IDs below.  


Display IDs
Google_Testing_Content_Typical1 W9ZWFXZNRT9D
Google_Testing_Content_Typical2 JRYVP9V62NQC
Google_Testing_Content_Typical3 ZNENQYRPRD3B
Google_Testing_Content_Intense1 UZ3BPGE55KK7
Google_Testing_Content_Intense2 UQ7Z4KHGVGG8



Schedule
Google_Testing_Content_Typical1 W9ZWFXZNRT9D

Google_Testing_Content_Typical2 JRYVP9V62NQC
Google_Testing_Content_Typical3 ZNENQYRPRD3B
Currently mix of Uptime and Content testing presentations

Google_Testing_Content_Intense1 UZ3BPGE55KK7
Google_Testing_Content_Intense2 UQ7Z4KHGVGG8
Uses video content testing items and three instances of a presentation with two image galleries holding 120+ images
Zako and Ninja devices crashed while running Rise Player with intense content:
Zako is running Google_Testing_Content_Intense2 UQ7Z4KHGVGG8
Ninja is running Google_Testing_Content_Intense1 UZ3BPGE55KK7
please see attached logs
log-040416-153534.tar.gz
0 bytes Download
log-042616-174337.tar.gz
0 bytes Download
The logs in #12 are 0 bytes.
the device is running out of space and unable to generate logs, please see attachment
zako_error_generating_logs.txt
68.6 KB View Download
Could the crash be caused by running out of disk space then? Would reboot help?
I rebooted zako and was able to generate logs, please see attached. Thanks 
zako_log_after_runningout_memory_reboot.tar.gz
2.8 MB Download
Are you running some oom test scripts? I saw this line in kernel log, roughtly before the 4-24, 17:44 reboot.

2016-04-27T17:32:03.241903-07:00 WARNING kernel: [ 2810.596145] autotest invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=-1000


DUT seems to be rebooted due to out of memory.

There chrome kills due to oom:
e.g.
2016-04-27T17:39:42.681971-07:00 ERR kernel: [ 3271.275989] Out of memory: Kill process 8224 (chrome) score 462 or sacrifice child
2016-04-27T17:39:42.681973-07:00 ERR kernel: [ 3271.276004] Killed process 8224 (chrome) total-vm:2064260kB, anon-rss:94240kB, file-rss:143396kB

and ext4 process kills:
2016-04-27T17:32:03.242422-07:00 ERR kernel: [ 2810.607633] Out of memory: Kill process 7890 (Compositor) score 497 or sacrifice child
2016-04-27T17:32:03.242424-07:00 ERR kernel: [ 2810.607646] Killed process 7890 (Compositor) total-vm:2210868kB, anon-rss:380900kB, file-rss:269276kB
2016-04-27T17:32:03.242427-07:00 WARNING kernel: [ 2810.607773] Compositor: page allocation failure: order:0, mode:0x20058
2016-04-27T17:32:03.242429-07:00 NOTICE kernel: [ 2810.607786] Pid: 7890, comm: Compositor Tainted: G        WC   3.8.11 #1
2016-04-27T17:32:03.242430-07:00 NOTICE kernel: [ 2810.607796] Call Trace:
2016-04-27T17:32:03.242432-07:00 NOTICE kernel: [ 2810.607810]  [<ffffffff978bdc7e>] warn_alloc_failed+0x135/0x15f
2016-04-27T17:32:03.242434-07:00 NOTICE kernel: [ 2810.607822]  [<ffffffff978c0393>] __alloc_pages_nodemask+0x547/0x692
2016-04-27T17:32:03.242436-07:00 NOTICE kernel: [ 2810.607837]  [<ffffffff978b9cf7>] find_or_create_page+0x49/0x91
2016-04-27T17:32:03.242437-07:00 NOTICE kernel: [ 2810.607849]  [<ffffffff97918675>] __getblk+0x171/0x26d
2016-04-27T17:32:03.242439-07:00 NOTICE kernel: [ 2810.607862]  [<ffffffff97982a0a>] ext4_get_branch+0x78/0x117
2016-04-27T17:32:03.242443-07:00 NOTICE kernel: [ 2810.607874]  [<ffffffff97982b98>] ext4_ind_map_blocks+0xef/0x513
2016-04-27T17:32:03.242445-07:00 NOTICE kernel: [ 2810.607887]  [<ffffffff9786371a>] ? set_next_entity+0x44/0x9b
2016-04-27T17:32:03.242446-07:00 NOTICE kernel: [ 2810.607899]  [<ffffffff9794f7e1>] ext4_map_blocks+0x68/0x22a
2016-04-27T17:32:03.242448-07:00 NOTICE kernel: [ 2810.607910]  [<ffffffff9795174a>] _ext4_get_block+0xd6/0x171
2016-04-27T17:32:03.242449-07:00 NOTICE kernel: [ 2810.607921]  [<ffffffff979517fb>] ext4_get_block+0x16/0x18
2016-04-27T17:32:03.242454-07:00 NOTICE kernel: [ 2810.607933]  [<ffffffff9791f4c7>] do_mpage_readpage+0x1b1/0x50c
...
it is currently running a performance test script (longevity_Tracker.py) to collect performance metrics: CPU and memory utilization and temperature.
Ninja running rise player is also running out of memory and is unable to generate logs. Please see attachment
ninja_running_out_of_memory.txt
55.3 KB View Download
When you say crash, what exactly did you see? The device rebooted? Or the screen freezes? Or the screen goes black?

And is it a consistent repro for how? Do we roughly know how long it runs before crashing?
Cc: xiy...@chromium.org
 Issue 602430  has been merged into this issue.
 Issue 602517  has been merged into this issue.
zako also runs out of memory while running Rise Player
zako_running_out_of_memory_running_Rise.txt
110 KB View Download
Cc: abodenha@chromium.org
Owner: afakhry@chromium.org
Summary: Devices get OOM when running kiosk apps (Rise, StratosMedia, Chrome Sign Builder) (was: HP Zako rebooted while running Rise Vision Rise Player kiosk App)
We run into OOM condition on multiple devices with different players (Rise, StratosMedia, Chrome Sign Builder etc). Chrome, the app or the video playback  might have a memory leak somewhere. A closer look is needed.

The problem is probably not kiosk specific, or device specific. But anyway, here are the devices seen the problem and merged with this issue:
  HP Zako, Veyron-Mickey, AOPEN Sumo

Comment 26 Deleted

From #24: All those 'No space left on device', they seem to imply running out of disk space as well?
Those 'No space left on device' are all for /tmp, which is a tmpfs and in memory. It is actually another incarnation of out of memory.
We are writing perf data to a CSV file in /usr/local/autotest/tmp/.  Could this be causing the OOM? 
/usr/local is fine as that is backed by real disk. The error in #24 happens when we run generate_logs, which uses /tmp to dump and create log tgz file. And when the device in that state, it runs out of memory and /tmp has no free space.

I wonder if we have any tool to ask Chrome to dump its heap and see where did the memory go.
Any news on this?
Hi Alex, can you please notify Rise player that their new content is using up all the device's memory until the App crashes, and if possible get new content for our longevity testing. This may also be happening out in the field. Thanks
Sumo running Rise player Google_Testing_Content_Typical1 W9ZWFXZNRT9D crashes. please see logs
log-050916-095859.tar.gz
7.3 MB Download
Additional display ids from Rise (real world examples)

Google_Content_Testing_ClientContent1
442P4TK6C9RB

Google_Content_Testing_ClientContent2 
8A57YUGEEK37

Google_Content_Testing_ClientContent3 
5VQ94CAWC26


Rise is asking for which displayIDs are crashing, can you let me know which ones are causing frequent OOM issues?
We are using UQ7Z4KHGVGG8 and UZ3BPGE55KK7 (high intensity content) for the longevity test. But crash also happens when using the low-intensity content (e.g., W9ZWFXZNRT9D).
Labels: -Pri-2 Pri-1
this issue is also occurring in M51-Beta build  8172.16.0, 51.0.2704.29. This bug has to be fixed before M51 goes to stable. 
@xiyuan please try to reproduce this bug, let me know if you need any help with that
Labels: M-51
Status: Started (was: Assigned)
Labels: Stability longevity
latest build where this is happening: M51 8172.17.0 51.0.2704.30
I ran the test from JRYVP9V62NQC locally on my peach_pit device and left it running for about 20 hours. Memory usage of the Webview renderer process kept increasing over time. Here are my observations of both the Webview and Browser processes:

[1]:
	Browser -> 188 MB
	Webview -> 842 MB	
[2]:
	Browser -> 189 MB
	Webview -> 850 MB
[3]:
	Browser -> 191 MB
	Webview -> 863 MB
[4]:
	Browser -> 215 MB
	Webview -> 953 MB
[5]:
	Browser -> 215 MB
	Webview -> 995 MB

The file descriptor usage of both processes remained stable.

When I tried to take a snapshot of the heap, the Webview process crashed because the memory usage increased beyond 1.2 GB to handle taking the JS heap snapshot. So, I assume if I left it running for few more hours, the memory usage would have kept increasing and would have crashed eventually.

Next I tried to repro this "seems-like" memory leak in both Desktop Chrome on Linux and Chrome OS Linux build. It seems the issue is not reproducible! Both have been running since this morning till now, and the memory usage of the Webview is fluctuating around 500 MB.

I plan to leave them both running during the weekend to have a definitive answer. But so far, it looks like there's a memory leak but not reproducible on Linux builds. 
I left three tests running over the weekend:

- Google_Content_Testing_ClientContent1 442P4TK6C9RB
   Running on a peach_pit device. Memory usage reached 1132 MB. No crash yet.

- Google_Testing_Content_Typical2 JRYVP9V62NQC
   Running on the Chrome OS Linux build. Memory usage reached 1001 MB. No crash yet.

- Google_Testing_Content_Intense1 UZ3BPGE55KK7
   Running on stable Desktop Chrome on Linux. License was revoked during weekend. Restarted the video this morning, and memory usage reached 800 MB within 30 minutes. No crashes.

I have been unable to see any crashes due to OOM yet. How much memory the test devices have?
This issue is about mickey originally 602517. Mickey rebooted while running Stratosmedia Player 2.0.31. Mickey's info: CHROMEOS_RELEASE_DESCRIPTION=8172.17.0 (Official Build) dev-channel veyron_mickey test.
Please see logs

log-051616-140324.tar.gz
3.7 MB Download
From the posted logs in #43. There are a bunch of oom-killer events, and every time, there's only around 100 MB free memory. But what is interesting is that the killed chrome process (I assume this is the webview renderer) is not using so much memory as I see in my own tests, but the memory runs out because there appears to be other processes using a lot of memory at the same time, in particular the following:

[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[ 6237]  1000  6237   330748     3609     343    15354         -1000 chrome
[ 6322]  1000  6322   263461       75       8      110         -1000 nacl_helper_boo
[ 6351]  1000  6351   161331    80948     253     5780         -1000 chrome
[ 6679]  1000  6679   555497     4175    1100   248190           300 chrome  <<<----(The oom killed process).

How much RAM this veyron_mickey has?
This is from veyron-mickey
mickey ~ # cat /proc/meminfo 
MemTotal:        2067728 kB
MemFree:          475468 kB
MemAvailable:    1190660 kB
Buffers:           23124 kB
Cached:           801072 kB
SwapCached:            0 kB
Active:           535736 kB
Inactive:         658036 kB
Active(anon):     370252 kB
Inactive(anon):   122084 kB
Active(file):     165484 kB
Inactive(file):   535952 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       1317804 kB
HighFree:         245860 kB
LowTotal:         749924 kB
LowFree:          229608 kB
SwapTotal:       2019264 kB
SwapFree:        2019264 kB
Dirty:               152 kB
Writeback:             0 kB
AnonPages:        369672 kB
Mapped:           399288 kB
Shmem:            122764 kB
Slab:              47992 kB
SReclaimable:      32824 kB
SUnreclaim:        15168 kB
KernelStack:        2016 kB
PageTables:         4804 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3053128 kB
Committed_AS:    1445172 kB
VmallocTotal:     245760 kB
VmallocUsed:       15500 kB
VmallocChunk:     218236 kB

Labels: -Pri-1 Pri-2
That's not surprising. The veyron-micky has only 2 GB of RAM. This is too low to run those memory intensive players.

By the way, I have been running three different tests since last Friday. No crashes yet. The memory usage seems to be stable around the 1 GB mark. That's already half of what veyron-micky has!

What about the zako device? I suspect it also has too low memory as well, but please let me know.

I think we should have a minimum requirement of at least 4 GB of RAM for any device that is planned to be used with those players.

I'm lowering the priority of this bug, since we couldn't repro locally, and there doesn't seem to be a memory leak, but rather not enough resources on the test devices.
here's tricky

tricky ~ # cat /proc/meminfo 
MemTotal:        1922916 kB
MemFree:          862244 kB
Buffers:          113792 kB
Cached:           561604 kB
SwapCached:            0 kB
Active:           388524 kB
Inactive:         509652 kB
Active(anon):     223336 kB
Inactive(anon):    94376 kB
Active(file):     165188 kB
Inactive(file):   415276 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       2816768 kB
SwapFree:        2816768 kB
Dirty:                72 kB
Writeback:             0 kB
AnonPages:        222836 kB
Mapped:           150064 kB
Shmem:             94944 kB
Slab:             111060 kB
SReclaimable:      95800 kB
SUnreclaim:        15260 kB
KernelStack:        1520 kB
PageTables:         6180 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3778224 kB
Committed_AS:    1172104 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      366152 kB
VmallocChunk:   34359369788 kB
DirectMap4k:       58892 kB
DirectMap2M:     1978368 kB
DirectMap1G:           0 kB

Cc: semenzato@chromium.org
tricky also has less than 2 GB of RAM, and seems low for what those players need.

+semenzato to help understand these kernel memory accountings better. From the messages logs that I pasted below, it's very obvious we are really running out of memory (avail RAM = 109492 kB, avail swap 197184 kB), however, the memory usage table of the running processes printed after that does not clearly show where all of that memory had gone. Could you please help clarify?

INFO kernel: [235533.976018] entering low_mem (avail RAM = 109544 kB, avail swap 235252 kB) with lowest seen anon mem: 37984 kB
INFO kernel: [235684.368333] entering low_mem (avail RAM = 109492 kB, avail swap 211952 kB) with lowest seen anon mem: 25004 kB
INFO kernel: [235701.874148] entering low_mem (avail RAM = 109492 kB, avail swap 197184 kB) with lowest seen anon mem: 12548 kB
WARNING kernel: [236260.954292] Chrome_IOThread invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=-1000
INFO kernel: [236260.954319] Chrome_IOThread cpuset=/ mems_allowed=0
NOTICE kernel: [236260.954332] CPU: 2 PID: 6340 Comm: Chrome_IOThread Tainted: G        W    3.14.0 #1
NOTICE kernel: [236260.954376] [<c010e47c>] (unwind_backtrace) from [<c010a8a0>] (show_stack+0x20/0x24)
NOTICE kernel: [236260.954396] [<c010a8a0>] (show_stack) from [<c067e834>] (dump_stack+0x7c/0xc0)
NOTICE kernel: [236260.954413] [<c067e834>] (dump_stack) from [<c067e080>] (dump_header.isra.12+0x90/0x1d8)
NOTICE kernel: [236260.954431] [<c067e080>] (dump_header.isra.12) from [<c01d9fdc>] (oom_kill_process+0x84/0x3c4)
NOTICE kernel: [236260.954447] [<c01d9fdc>] (oom_kill_process) from [<c01da7dc>] (out_of_memory+0x298/0x340)
NOTICE kernel: [236260.954461] [<c01da7dc>] (out_of_memory) from [<c01de0d0>] (__alloc_pages_nodemask+0x8c4/0x948)
NOTICE kernel: [236260.954479] [<c01de0d0>] (__alloc_pages_nodemask) from [<c020aa54>] (read_swap_cache_async+0x60/0x1e4)
NOTICE kernel: [236260.954494] [<c020aa54>] (read_swap_cache_async) from [<c020ad64>] (swapin_readahead+0x18c/0x1bc)
NOTICE kernel: [236260.954511] [<c020ad64>] (swapin_readahead) from [<c01fb4d8>] (handle_mm_fault+0x23c/0x800)
NOTICE kernel: [236260.954528] [<c01fb4d8>] (handle_mm_fault) from [<c0115084>] (do_page_fault+0x13c/0x3b0)
NOTICE kernel: [236260.954542] [<c0115084>] (do_page_fault) from [<c01001d8>] (do_DataAbort+0x50/0xcc)
NOTICE kernel: [236260.954556] [<c01001d8>] (do_DataAbort) from [<c010b458>] (__dabt_svc+0x38/0x60)
NOTICE kernel: [236260.954567] Exception stack(0xde4b9e50 to 0xde4b9e98)
NOTICE kernel: [236260.954577] 9e40:                                     00000001 d8576fc0 00000000 ffffffff
NOTICE kernel: [236260.954589] 9e60: 00000000 de4b9eec de54fb8c b7d67e00 d7114840 de4b9f58 00000000 de4b9edc
NOTICE kernel: [236260.954601] 9e80: 00000019 de4b9e98 c0607154 c0259d08 00000113 ffffffff
NOTICE kernel: [236260.954621] [<c010b458>] (__dabt_svc) from [<c0259d08>] (ep_send_events_proc+0xd4/0x1a0)
NOTICE kernel: [236260.954643] [<c0259d08>] (ep_send_events_proc) from [<c025a438>] (ep_scan_ready_list.isra.8+0xac/0x1dc)
NOTICE kernel: [236260.954661] [<c025a438>] (ep_scan_ready_list.isra.8) from [<c025b6bc>] (SyS_epoll_wait+0x25c/0x35c)
NOTICE kernel: [236260.954677] [<c025b6bc>] (SyS_epoll_wait) from [<c0106460>] (ret_fast_syscall+0x0/0x30)
NOTICE kernel: [236260.954691] Mem-info:
NOTICE kernel: [236260.954698] Normal per-cpu:
NOTICE kernel: [236260.954706] CPU    0: hi:  186, btch:  31 usd:   0
NOTICE kernel: [236260.954714] CPU    1: hi:  186, btch:  31 usd:   0
NOTICE kernel: [236260.954724] CPU    2: hi:  186, btch:  31 usd:   0
NOTICE kernel: [236260.954731] CPU    3: hi:  186, btch:  31 usd:   0
NOTICE kernel: [236260.954739] HighMem per-cpu:
NOTICE kernel: [236260.954746] CPU    0: hi:  186, btch:  31 usd:   0
NOTICE kernel: [236260.954754] CPU    1: hi:  186, btch:  31 usd:   0
NOTICE kernel: [236260.954763] CPU    2: hi:  186, btch:  31 usd:   0
NOTICE kernel: [236260.954770] CPU    3: hi:  186, btch:  31 usd:   0
NOTICE kernel: [236260.954784] active_anon:0 inactive_anon:28 isolated_anon:0
NOTICE kernel: [236260.954784]  active_file:8049 inactive_file:10104 isolated_file:0
NOTICE kernel: [236260.954784]  unevictable:0 dirty:0 writeback:0 unstable:0
NOTICE kernel: [236260.954784]  free:11163 slab_reclaimable:2411 slab_unreclaimable:3840
NOTICE kernel: [236260.954784]  mapped:89173 shmem:2 pagetables:2089 bounce:0
NOTICE kernel: [236260.954784]  free_cma:0
NOTICE kernel: [236260.954840] Normal free:44220kB min:3460kB low:4324kB high:5188kB active_anon:0kB inactive_anon:48kB active_file:12784kB inactive_file:19100kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:778240kB managed:749924kB mlocked:0kB dirty:0kB writeback:0kB mapped:192476kB shmem:4kB slab_reclaimable:9644kB slab_unreclaimable:15360kB kernel_stack:1952kB pagetables:8356kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:97 all_unreclaimable? yes
NOTICE kernel: [236260.954886] lowmem_reserve[]: 0 10295 10295
NOTICE kernel: [236260.954913] HighMem free:432kB min:512kB low:2032kB high:3552kB active_anon:0kB inactive_anon:64kB active_file:19412kB inactive_file:21316kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1318828kB managed:1317804kB mlocked:0kB dirty:0kB writeback:0kB mapped:164216kB shmem:4kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
NOTICE kernel: [236260.954945] lowmem_reserve[]: 0 0 0
NOTICE kernel: [236260.954958] Normal: 10612*4kB (UEM) 16*8kB (UR) 9*16kB (R) 2*32kB (R) 2*64kB (R) 1*128kB (R) 1*256kB (R) 0*512kB 1*1024kB (R) 0*2048kB 0*4096kB = 44320kB
NOTICE kernel: [236260.955013] HighMem: 117*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 468kB
NOTICE kernel: [236260.955052] 18180 total pagecache pages
NOTICE kernel: [236260.955060] 5 pages in swap cache
NOTICE kernel: [236260.955067] Swap cache stats: add 73868478, delete 73868473, find 194347/34720597
NOTICE kernel: [236260.955076] Free swap  = 163456kB
NOTICE kernel: [236260.955081] Total swap = 2019264kB
NOTICE kernel: [236260.987517] 524267 pages of RAM
NOTICE kernel: [236260.987535] 11752 free pages
NOTICE kernel: [236260.987540] 7335 reserved pages
NOTICE kernel: [236260.987547] 3543 slab pages
NOTICE kernel: [236260.987553] 1148759 pages shared
NOTICE kernel: [236260.987559] 4 pages swap cached
INFO kernel: [236260.987566] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
INFO kernel: [236260.987600] [  125]     0   125      685       72       4      111         -1000 udevd
INFO kernel: [236260.987617] [  337]   202   337     9710      129      11      212         -1000 rsyslogd
INFO kernel: [236260.987636] [  395]   201   395      682      108       5      170         -1000 dbus-daemon
INFO kernel: [236260.987653] [  446]     0   446      371       57       3       18         -1000 agetty
INFO kernel: [236260.987678] [  510]     0   510      420       44       3       28         -1000 minijail0
INFO kernel: [236260.987703] [  514]   219   514     1319      205       5      182         -1000 wpa_supplicant
INFO kernel: [236260.987721] [  516]   229   516      392       51       3       55         -1000 daisydog
INFO kernel: [236260.987738] [  752]     0   752      420       44       4       28         -1000 minijail0
INFO kernel: [236260.987757] [  759]   228   759     4416      326       8      166         -1000 powerd
INFO kernel: [236260.987775] [ 1309]     0  1309     1536      257       6       88         -1000 firewalld
INFO kernel: [236260.987789] [ 1313]     0  1313      420       44       4       28         -1000 minijail0
INFO kernel: [236260.987805] [ 1322]   230  1322     2010      285       6      195         -1000 permission_brok
INFO kernel: [236260.987825] [ 1337]     0  1337     3154      577       8      415         -1000 shill
INFO kernel: [236260.987839] [ 1362]   202  1362      396       49       3       23         -1000 logger
INFO kernel: [236260.987857] [ 1715]     0  1715      365       44       3       49         -1000 periodic_schedu
INFO kernel: [236260.987872] [ 1722]     0  1722      365       50       4       45         -1000 periodic_schedu
INFO kernel: [236260.987887] [ 1738]     0  1738      420       52       3       29         -1000 minijail0
INFO kernel: [236260.987904] [ 1741]     0  1741      365       46       3       46         -1000 periodic_schedu
INFO kernel: [236260.987918] [ 1742]   226  1742     4015      183       6      129         -1000 mtpd
INFO kernel: [236260.987933] [ 1770]     0  1770      420       44       3       28         -1000 minijail0
INFO kernel: [236260.987947] [ 1845]   241  1845     8426       82       9      155         -1000 ModemManager
INFO kernel: [236260.987962] [ 1851]     0  1851      315        0       2       15         -1000 brcm_patchram_p
INFO kernel: [236260.987976] [ 1852]     0  1852      420       44       4       28         -1000 minijail0
INFO kernel: [236260.987993] [ 1855]     0  1855     2161      229       7      134         -1000 metrics_daemon
INFO kernel: [236260.988007] [ 1876]   218  1876      931      122       5       83         -1000 bluetoothd
INFO kernel: [236260.988022] [ 1916]     0  1916      996       49       4       86         -1000 sshd
INFO kernel: [236260.988036] [ 1927]   600  1927     3315      158       5      154         -1000 cras
INFO kernel: [236260.988052] [ 1974]     0  1974     4545      154       9      203         -1000 disks
INFO kernel: [236260.988065] [ 2131]   238  2131      639      160       5      112         -1000 avahi-daemon
INFO kernel: [236260.988081] [ 2132]   238  2132      639       26       4       51         -1000 avahi-daemon
INFO kernel: [236260.988097] [ 2151]     0  2151     1982      211       6      718         -1000 python
INFO kernel: [236260.988112] [ 2306]     0  2306     2347      288       7      153         -1000 update_engine
INFO kernel: [236260.988130] [ 2351]     0  2351     1686       61       7       89         -1000 warn_collector
INFO kernel: [236260.988145] [ 2374]     0  2374      365       37       3       15         -1000 sh
INFO kernel: [236260.988160] [ 2438]     0  2438      420       53       4       29         -1000 minijail0
INFO kernel: [236260.988175] [ 2473]   232  2473     1530      152       6      123         -1000 netfilter-queue
INFO kernel: [236260.988191] [ 2490]   234  2490      871      129       5       82         -1000 tlsdated
INFO kernel: [236260.988208] [ 2491]     0  2491      351       52       3       18         -1000 logger
INFO kernel: [236260.988225] [ 2505]     0  2505      853       14       4       60         -1000 tlsdated-setter
INFO kernel: [236260.988241] [ 2515]     0  2515      365       45       3       48         -1000 periodic_schedu
INFO kernel: [236260.988260] [ 2808]   224  2808      531      143       3       82         -1000 dhcpcd
INFO kernel: [236260.988275] [12235]   207 12235    10956       76      11      163         -1000 tcsd
INFO kernel: [236260.988291] [12238]   223 12238     8990      250      11      181         -1000 chapsd
INFO kernel: [236260.988307] [12247]     0 12247     5825      230      10      308         -1000 cryptohomed
INFO kernel: [236260.988323] [ 6205]     0  6205     2910      311       8      274         -1000 session_manager
INFO kernel: [236260.988337] [ 6223]     0  6223     1906      167       6       91         -1000 debugd
INFO kernel: [236260.988350] [ 6237]  1000  6237   330748     3609     343    15354         -1000 chrome
INFO kernel: [236260.988364] [ 6321]  1000  6321    42163      507      56     1204         -1000 chrome
INFO kernel: [236260.988377] [ 6322]  1000  6322   263461       75       8      110         -1000 nacl_helper_boo
INFO kernel: [236260.988391] [ 6324]  1000  6324      984        0       4       35         -1000 nacl_helper_non
INFO kernel: [236260.988404] [ 6327]  1000  6327    42163       84      29     1215         -1000 chrome
INFO kernel: [236260.988417] [ 6351]  1000  6351   161331    80948     253     5780         -1000 chrome
INFO kernel: [236260.988430] [ 6385]  1000  6385    25760      123      33     1027         -1000 chrome
INFO kernel: [236260.988443] [ 6679]  1000  6679   555497     4175    1100   248190           300 chrome
INFO kernel: [236260.988462] [13089]     0 13089      602       49       3       25         -1000 sleep
INFO kernel: [236260.988475] [13090]     0 13090      602       49       3       25         -1000 sleep
INFO kernel: [236260.988488] [13094]     0 13094      602       49       3       25         -1000 sleep
INFO kernel: [236260.988501] [13098]     0 13098      602       49       3       25         -1000 sleep
ERR kernel: [236260.988513] Out of memory: Kill process 6679 (chrome) score 547 or sacrifice child
ERR kernel: [236260.988530] Killed process 6679 (chrome) total-vm:2221988kB, anon-rss:0kB, file-rss:16700kB
Yes, the kernel is really OOM and it is working correctly by killing the chrome process as shown in #48.  Note that the available swap number is from 550 seconds earlier than the OOM kill---however, there is little free swap left also then.

You're right that the memory usage doesn't add up.  The total_vm fields add up to about 1.5GB.

Is it possible that the test infrastructure, or something else, is writing large files in /tmp?  It's a RAM-based file system so it could end up using a lot of memory.

It would be useful to monitor /proc/meminfo during the run, see if we notice trends.




juanramon@, as in #49, are we writing files in /tmp while the tests are running?
not anymore, files are written in the /usr/local/autotest/tmp/ folder as a csv file. when the files are written in /tmp folder they get deleted if there is a unexpected reboot.
Another possibility is that memory compression doesn't work well with this workload.

Would it be possible to log the content of these files every minute or so:

/sys/block/zram0/{compr_data_size,orig_data_size,memory_used_total,zero_pages}

Actually we just need a snapshot while the workload is running.  Normally the compressed data size is about 1/3 of the original data size.  If it's a lot worse than that, we have a problem.

can you ssh into the machine? chromeos1-row2-rack11-host6 or chromeos1-row2-rack5-host2
also ip address works: 172.27.213.26 or 172.27.212.211
both machines are currently running the content
I have just ssh'ed into 172.27.213.26. That device is really running low on memory. In particular, process with pid=9485, is using a lot of memory. Can you please open the task manager on that device (Search+Esc) and let me know the title of the task(s) running on the process with that PID?
Also there are about 468 MB worth of files in the (backed-by-memory) /tmp/ directory! That's a lot! The question is who writes those files?

Examples of the biggest files are:

File	                        Size
.com.google.Chrome.JCHSC2	7660186
.com.google.Chrome.VR8DB1	7660186
.com.google.Chrome.b3wNDA	7660186
.com.google.Chrome.n9OTzF	7660186
.com.google.Chrome.tq7WgY	7660186
.com.google.Chrome.N9Kk2T	6350334
.com.google.Chrome.Osv7Tw	6350334
.com.google.Chrome.j5Kck6	6350334
.com.google.Chrome.x7nw1i	6350334
.com.google.Chrome.1Snyse	5717393
.com.google.Chrome.9W6kID	5717393
.com.google.Chrome.Jlepze	5717393
.com.google.Chrome.OQLAAg	5717393
.com.google.Chrome.K8rgZU	5683421
.com.google.Chrome.LUZjGu	5683421
.com.google.Chrome.X8oA92	5683421
.com.google.Chrome.ZjSQIQ	5683421
.com.google.Chrome.ANAaVY	5210085
.com.google.Chrome.Tuwi5u	5210085
.com.google.Chrome.f8qd7H	5210085
.com.google.Chrome.xlbvq7	5210085
.com.google.Chrome.5ISDo2	3508747
.com.google.Chrome.LIGjLz	3508747
.com.google.Chrome.TRFfO1	3508747
.com.google.Chrome.UZ9eiN	3508747
.com.google.Chrome.VKQE6t	3508747
.com.google.Chrome.WEqk7Q	3508747
.com.google.Chrome.lsQgM1	3508747
.com.google.Chrome.ziGrow	3508747
.com.google.Chrome.5Agq8u	3266652
.com.google.Chrome.FTmESA	3266652
.com.google.Chrome.L6rZOH	3266652
.com.google.Chrome.xUx0nc	3266652
.com.google.Chrome.xgzLhj	3266652
.com.google.Chrome.9J1elW	3225784
.com.google.Chrome.GChzfn	3225784
.com.google.Chrome.dShaAX	3225784
.com.google.Chrome.nDTaQt	3225784
.com.google.Chrome.bSTt89	3034029
.com.google.Chrome.hDAw7z	3034029
.com.google.Chrome.pW6ZCM	3034029
.com.google.Chrome.wRCQPC	3034029
.com.google.Chrome.42g4b0	2285843
.com.google.Chrome.LXKqMv	2285843
.com.google.Chrome.lMHnjH	2285843

Aha!  #57 is the culprit.

#55: sorry I just tried this, the first host is compressing 120MB into 40MB, the second host is not compressing, everything else is fine.

I am fairly sure that the problem is the large amount of files in /tmp.
Does anyone have a clue where these types of files are coming from? 
Judging from the names, I believe it's chrome. See https://code.google.com/p/chromium/codesearch#chromium/src/base/files/file_util_posix.cc&q=CreateAndOpenFdForTemporaryFile&sq=package:chromium&type=cs&l=138

There are many callers to base::CreateTemporaryFile(). It's hard to know which one exactly.
Suspect the files are created from base::CreateTemporaryFile. But there are quite some callers of it. Can we peek at the contents of the file and see if that gives us some hints?
Most of the callers seem to be in test files, though! Where do these digital signage tests reside?
I don't think my longevity_Tracker.py (https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/site_tests/longevity_Tracker/longevity_Tracker.py) script is creating them. At least, I don't see any obvious calls to create temporary files.

I propose we delete the files. Then run the script with no Kiosk App, and see if the script creates them on it's own. Then stop the script, and run the Kiosk App on it's own, and see if the App creates them on it's own.
in response to comment 63 I have 100.96.49.93 running the script only without the App, I'll check the the tpm folder to see if these files creep up
the zako with IP 172.27.213.26 running riseplayer and the test script crashed but before it crashed it filled the /tmp folder with a few hundred .com.google.Chrome.xxxx as such

-rw-------  1 chronos chronos       0 May 19 15:53 .com.google.Chrome.00N5IV
-rw-------  1 chronos chronos  245798 May 19 15:48 .com.google.Chrome.02zbTz
-rw-------  1 chronos chronos  233903 May 19 16:20.com.google.Chrome.04X0iY.. etc

by contrast the Sumo with ip 100.96.49.93 running the script does not have any of these files in its /tmp folder

Try to check the contents of these temp files, maybe that can tell you something about their source.
they are still images used during the show and they are being stored in the /tmp directory. 
com_google_Chrome_zs9NpA
998 KB View Download
Oh, I saw this image while running one of the test videos! 
Ok, so in this case, the player itself is caching these images in /tmp/ ?
The player should not have access to /tmp directly. Sounds like Chrome is creating those files on behalf of the player. Do you see these files created on the dev box? If so, would closing the play make them go away?

And if we can make it happen for a debug build, maybe log the call stack to CreateTemporaryFile?

Yes I can see the files in the /tmp directory of the dev machine (ip 100.96.49.97)not running the python script.
Cc: rdsmith@chromium.org
I tested again the Rise Player today. Chrome really creates tons of temp files while this app is running. 

+rdsmith: Who might provide some insights.

Here's the trace:

#1 0x2b30547c807b af::FuncMarker::GetStackTrace()
#2 0x2b30547ca809 base::(anonymous namespace)::TempFileName()
#3 0x2b30547ca186 base::(anonymous namespace)::CreateAndOpenFdForTemporaryFile()
#4 0x2b30547ca27d base::CreateAndOpenTemporaryFileInDir()
#5 0x2b30547f816f base::(anonymous namespace)::CreateAnonymousSharedMemory()
#6 0x2b30547f7610 base::SharedMemory::Create()
#7 0x2b30547f9929 base::SharedMemory::CreateAnonymous()
#8 0x2b30547f73af base::SharedMemory::CreateAndMapAnonymous()
#9 0x2b3057fb8231 content::ResourceBuffer::Initialize()
#10 0x2b3057f92151 content::AsyncResourceHandler::EnsureResourceBufferIsInitialized()
#11 0x2b3057f91cf1 content::AsyncResourceHandler::OnWillRead()
#12 0x2b3057fa2078 content::MimeTypeResourceHandler::OnWillRead()
#13 0x2b3057fa0ece content::LayeredResourceHandler::OnWillRead()
#14 0x2b3057fefe3a content::ResourceLoader::ReadMore()
#15 0x2b3057fee06f content::ResourceLoader::StartReading()
#16 0x2b3057fed9f4 content::ResourceLoader::OnResponseStarted()
#17 0x2b305e46864f net::URLRequest::NotifyResponseStarted()
#18 0x2b305e49b6b7 net::URLRequestJob::NotifyHeadersComplete()
#19 0x2b305e4904a6 net::URLRequestHttpJob::NotifyHeadersComplete()
#20 0x2b305e49216e net::URLRequestHttpJob::SaveCookiesAndNotifyHeadersComplete()
#21 0x2b305e48e166 net::URLRequestHttpJob::OnStartCompleted()
#22 0x2b305dd7babb _ZN4base8internal15RunnableAdapterIMN3net18ClientSocketHandleEFviEE3RunIPS3_JiEEEvOT_DpOT0_
#23 0x2b305e49714e _ZN4base8internal12InvokeHelperILb0EvNS0_15RunnableAdapterIMN3net17URLRequestHttpJobEFviEEEE8MakeItSoIJPS4_iEEEvS7_DpOT_
#24 0x2b305e4970fd _ZN4base8internal7InvokerINS_13IndexSequenceIJLm0EEEENS0_9BindStateINS0_15RunnableAdapterIMN3net17URLRequestHttpJobEFviEEEFvPS7_iEJNS0_17UnretainedWrapperIS7_EEEEENS0_12InvokeHelperILb0EvSA_EEFviEE3RunEPNS0_13BindStateBaseEOi
#25 0x2b305dd7b532 base::Callback<>::Run()
#26 0x2b305e079fe2 net::HttpCache::Transaction::DoLoop()
#27 0x2b305e07814b net::HttpCache::Transaction::OnIOComplete()
#28 0x2b305dd7babb _ZN4base8internal15RunnableAdapterIMN3net18ClientSocketHandleEFviEE3RunIPS3_JiEEEvOT_DpOT0_
#29 0x2b305df97f45 _ZN4base8internal12InvokeHelperILb1EvNS0_15RunnableAdapterIMN10disk_cache11SimpleIndexEFviEEEE8MakeItSoINS_7WeakPtrIS4_EEJiEEEvS7_T_DpOT0_
#30 0x2b305e08a64d _ZN4base8internal7InvokerINS_13IndexSequenceIJLm0EEEENS0_9BindStateINS0_15RunnableAdapterIMN3net9HttpCache11TransactionEFviEEEFvPS8_iEJNS_7WeakPtrIS8_EEEEENS0_12InvokeHelperILb1EvSB_EEFviEE3RunEPNS0_13BindStateBaseEOi
#31 0x2b305dd7b532 base::Callback<>::Run()
#32 0x2b305e09f60e net::HttpNetworkTransaction::DoCallback()
#33 0x2b305e09b1a8 net::HttpNetworkTransaction::OnIOComplete()
#34 0x2b305dd7babb _ZN4base8internal15RunnableAdapterIMN3net18ClientSocketHandleEFviEE3RunIPS3_JiEEEvOT_DpOT0_
#35 0x2b305e0a805e _ZN4base8internal12InvokeHelperILb0EvNS0_15RunnableAdapterIMN3net22HttpNetworkTransactionEFviEEEE8MakeItSoIJPS4_iEEEvS7_DpOT_
#36 0x2b305e0a800d _ZN4base8internal7InvokerINS_13IndexSequenceIJLm0EEEENS0_9BindStateINS0_15RunnableAdapterIMN3net22HttpNetworkTransactionEFviEEEFvPS7_iEJNS0_17UnretainedWrapperIS7_EEEEENS0_12InvokeHelperILb0EvSA_EEFviEE3RunEPNS0_13BindStateBaseEOi
#37 0x2b305dd7b532 base::Callback<>::Run()
#38 0x2b305e3d5847 net::SpdyHttpStream::DoResponseCallback()
#39 0x2b305e3d56c1 net::SpdyHttpStream::OnResponseHeadersUpdated()
#40 0x2b305e425af5 net::SpdyStream::MergeWithResponseHeaders()
#41 0x2b305e4255ef net::SpdyStream::OnInitialResponseHeadersReceived()
#42 0x2b305e3f5b87 net::SpdySession::OnInitialResponseHeadersReceived()
#43 0x2b305e3f8a63 net::SpdySession::OnHeaders()
#44 0x2b305e38bc8b net::BufferedSpdyFramer::OnControlFrameHeaderData()
#45 0x2b305e3bda9a net::SpdyFramer::ProcessControlFrameHeaderBlock()
#46 0x2b305e3c59c1 net::SpdyFramer::DeliverHpackBlockAsSpdy3Block()
#47 0x2b305e3bda38 net::SpdyFramer::ProcessControlFrameHeaderBlock()
#48 0x2b305e3b9b22 net::SpdyFramer::ProcessInput()
#49 0x2b305e38ca24 net::BufferedSpdyFramer::ProcessInput()
#50 0x2b305e3ef141 net::SpdySession::DoReadComplete()
#51 0x2b305e3ee706 net::SpdySession::DoReadLoop()
#52 0x2b305e3e8e2d net::SpdySession::PumpReadLoop()
#53 0x2b305e40c230 _ZN4base8internal15RunnableAdapterIMN3net11SpdySessionEFvNS3_9ReadStateEiEE3RunIPS3_JRKS4_iEEEvOT_DpOT0_
#54 0x2b305e40c14a _ZN4base8internal12InvokeHelperILb1EvNS0_15RunnableAdapterIMN3net11SpdySessionEFvNS4_9ReadStateEiEEEE8MakeItSoINS_7WeakPtrIS4_EEJRKS5_iEEEvS8_T_DpOT0_
#55 0x2b305e40c0cd _ZN4base8internal7InvokerINS_13IndexSequenceIJLm0ELm1EEEENS0_9BindStateINS0_15RunnableAdapterIMN3net11SpdySessionEFvNS7_9ReadStateEiEEEFvPS7_S8_iEJNS_7WeakPtrIS7_EES8_EEENS0_12InvokeHelperILb1EvSB_EEFviEE3RunEPNS0_13BindStateBaseEOi
#56 0x2b305dd7b532 base::Callback<>::Run()
#57 0x2b305dd830ab net::SSLClientSocketImpl::DoReadCallback()
#58 0x2b305dd86427 net::SSLClientSocketImpl::OnRecvComplete()
#59 0x2b305dd871ee net::SSLClientSocketImpl::BufferRecvComplete()
#60 0x2b305dd7babb _ZN4base8internal15RunnableAdapterIMN3net18ClientSocketHandleEFviEE3RunIPS3_JiEEEvOT_DpOT0_
#61 0x2b305dd95cce _ZN4base8internal12InvokeHelperILb0EvNS0_15RunnableAdapterIMN3net19SSLClientSocketImplEFviEEEE8MakeItSoIJPS4_iEEEvS7_DpOT_

Cc: erikc...@chromium.org
I'm not sure why you're pinging me; if you let me know what particular expertise you're expecting of me I might be able to help more :-}.

Glancing at the last stacktrace, what it naively looks like is happening is that shared memory between renderer and browser is being implemented by both renderer and browser mmaping a file and scribbling into it.  If that file is being mmap from a memory backed filesystem and the kernel isn't written correctly, you might have double the needed memory used.  But my guess is that that's not what's going on--my guess is that Chrome isn't sizing the shared memory segments for communication properly and/or isn't cleaning them up properly.  But you should talk to an IPC/memory allocation expert about that.  I've cc'd Erik Chen, since his fingerprints are on the shared memory allocation.

My apologies if you wanted my perspective on some other aspect of this bug--feel free to redirect me.

Re #72: Sorry, I should have been more specific. I CC'd you as a I saw you're an owner of both dirs where net::URLRequestHttpJob::OnStartCompleted() and content::ResourceBuffer::Initialize() are located.

I just wanted to know the need for all these mmaps and why aren't they being cleaned up to the point the app gets killed by the oom killer? Is it a leak in Chrome itself? Or is it that the app is simply using a lot of memory?

Thanks for adding Erik Chen!
Another instance of mickey running Stratosmedia kiosk App unexpectedly rebooting with M51-Beta build 8172.17.0, 51.0.2704.30
log-052416-110125.tar.gz
1.0 MB Download
Status: Assigned (was: Started)
I'm not actively investigating this issue at the moment, so moving back to "Assigned".
I believe I am seeing a similar issue in an app I've created that slowly builds up it's memory usage until it finally runs out of memory. I am using an indexeddb and making many calls in succession to set image src tags in my html file. Within a 15 to 30 second timeframe the app could be setting up to 20 different image tags. The tags are all set in the success event of the get call to the database. The image tags are not created dynamically in the DOM. 

I have one Chromebox that has 4GBs of ram and the app has gotten as high as 1.9GBs of memory usage before crashing. On other boxes with only 2GBs of ram the memory threshold seems to be 700-750mbs of memory usage before crashing.  

I see the same memory usage issue when running the app on a Mac but it doesn't crash due to the 16GBs of ram installed. 

Let me know if I cangive you any more info.
Cc: afakhry@chromium.org
Owner: erikc...@chromium.org
Passing along to erikchen@ for further triage. The cause of the OOM killing of the renderer has already been identified as in comments #57, #58, and #71.
 afakhry: What version of chrome were you using for #71? The stack that you posted should never occur in Chrome Canary.
It was ToT at that time. Probably M52. Sorry, I didn't record the exact version.
Owner: achuith@chromium.org
Oh, this is ChromeOS. I don't know anything about that platform. achuith: Can you find an appropriate owner?
Cc: achuith@chromium.org
Owner: jen...@chromium.org
Jenny maybe?
I've been testing the indexeddb OOM issue in version 44 of ChromeOS and the leaking I was seeing in versions 50 and 51 is not there at all. My app maintains the right amount of memory no matter how long it is running.
Per #71, it looks like some shared memory files are created but not cleaned and it filled up the /tmp/ during the streaming. It is most likely not chromeos/kiosk specific issue, but a more general chrome issue with webrtc? It exposed in kiosk mode for Rise Player since it got the chance to continuously play 24x7. I pinged erikchen, but he said he is not the right person for webrtc code. Albert, do you know who we can ping in the webrtc team for such issue?
Cc: tommi@chromium.org brettw@chromium.org
Components: Blink>WebRTC
tommi@ or brettw@?
Could this be related to crbug.com/623175? As suspected by Erik?
Seems possible.  I see that code was last touched about a year ago which would explain comment #82 that this wasn't an issue in M44. BUT looking at the diff I see that the logic was similar even before that though. I suppose the refactor could have regressed something maybe?
crbug.com/623175 points at a possible improvement to the implementation of Posix shared memory, now that macOS uses a different implementation. The implementation for POSIX has effectively remained unchanged for a couple of years. If this is a recent regression, you'll want to look somewhere else.
We have seen the issue as well with the latest stable version of ChromeOS on Chromeboxes and it's killing our business. We have a chrome app with a start page which is web links which link in websites using the web view. One of the links goes to Sears.com and as you shop and add things to the cart the mem usage goes up and up. Sometime some drops out and more is available as you browse different items and add items to the cart. Prolonged use of maybe 30 min to 1 hr of shopping you end up with the memory full and in worst case situation it will crash to a screen showing a black screen with a frown. It causes you to turn off the Chromebox and restart it. In better cases it reloads the webview but that means if a customer has items in the cart they are now gone and you have to start from scratch. This is frustrating at best. Some stores are reporting they can't get through the checkout process because it dumps out the memory before the credit card screen. We are using 2gb Asus chromeboxes. We've been using them for about 1 1/2 years and it's been rock solid till just the last month and 1/2. So while 2gb is not much memory it's worked for 1 1/2 years and something has changed. If we run on double the ram 4gb the crashes go away but there are 900 of these boxes in stores. It's hard to say all of a sudden I have to replace them.
FYI here are the system logs for a test app I've created that crashed due to out of memory issues. It's a new app so I do not know if it worked in older OS versions but it exhibits similar behavior to #88. It runs for a couple of hours and then crashes to a black screen with the frown face.

Note this is simple HTML5 web app and is not playing back video, but is rendering a rotating 3d earth.
logs_20160707-0411.zip
150 KB Download
#89, the log is not very useful. Could you check whether your device has a lot of /tmp/.com.google.Chrome.* files when OOM happens?
Cc: -xiy...@chromium.org jen...@chromium.org
Owner: xiy...@chromium.org
I can take this one back now.
Cc: -erikc...@chromium.org
Issue 333996 has similar symptoms (leaking fd, but instead of OOM, we run out of disk space).
Blockedon: 475444
Cc: dmu...@chromium.org
It looks like that the /tmp files are not leaked from SharedMemory. Instead, I think that we are seeing  issue 475444 . The problem is with xhr keeps a temp file when fetching a 'blob' and those files are kept around until the renderer is closed.

The sample page in #16 of  issue 475444  can repro the problem in a tab. The backing /tmp files for a blob is kept around until the tab is closed.

+dmurph, could you help to triage this or  issue 475444 ? 
 Issue 590975  tracks v8 work to trigger GC on memory pressure and would help with this issue.
the GC cleans temp files? I've never heard of that.
Let me expand my comment in #94:

The blob response of xhr is handled by RedirectToFileResourceHandler which uses a temp file to hold the downloaded bytes [1] and passes that to renderer as a blob. The blob holds reference to the temp file and stored as a member of xhr [2]. The temp file will be kept around until m_responseBlob of xhr is released (which happens when xhr is gone). So if GC does not reclaim a blob xhr, we have the temp file kept around indefinitely.

[1]: https://cs.chromium.org/chromium/src/content/browser/loader/resource_dispatcher_host_impl.cc?rcl=1468927100&l=1600-1603
[2]: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/xmlhttprequest/XMLHttpRequest.cpp?rcl=1468927100&l=1437
Cc: krishna...@chromium.org
Cc: -scunning...@chromium.org
Cc: -rdsmith@chromium.org

Sign in to add a comment