New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 649116 link

Starred by 14 users

Issue metadata

Status: Duplicate
Merged: issue 702707
Owner:
Last visit > 30 days ago
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug


Participants' hotlists:
Hotlist-4


Sign in to add a comment

Crash likely caused by trouble freeing up memory in low-memory situations (must see an OOM to qualify for this bug)

Project Member Reported by keta...@chromium.org, Sep 21 2016

Issue description

Report ID: b928b99500000000
Client ID - 5D773DF6ACD348358E66830F715F22B0
Device- Nyan


Seeing this crash on Chrome OS.

---
<4>[328793.767556] chrome invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=300
<5>[328793.767571] CPU: 2 PID: 23959 Comm: chrome Tainted: G         C   3.10.18 #1
<5>[328793.767591] [<c020cf9c>] (unwind_backtrace+0x0/0x110) from [<c020a08c>] (show_stack+0x20/0x24)
<5>[328793.767604] [<c020a08c>] (show_stack+0x20/0x24) from [<c07688d0>] (dump_stack+0x20/0x28)
<5>[328793.767615] [<c07688d0>] (dump_stack+0x20/0x28) from [<c0768090>] (dump_header.isra.12+0x88/0x1b4)
<5>[328793.767625] [<c0768090>] (dump_header.isra.12+0x88/0x1b4) from [<c02c83e8>] (oom_kill_process+0x84/0x390)
<5>[328793.767636] [<c02c83e8>] (oom_kill_process+0x84/0x390) from [<c02c8b4c>] (out_of_memory+0x230/0x2d4)
<5>[328793.767645] [<c02c8b4c>] (out_of_memory+0x230/0x2d4) from [<c02cbef8>] (__alloc_pages_nodemask+0x798/0x818)
<5>[328793.767656] [<c02cbef8>] (__alloc_pages_nodemask+0x798/0x818) from [<c02f5f00>] (read_swap_cache_async+0x60/0x138)
<5>[328793.767666] [<c02f5f00>] (read_swap_cache_async+0x60/0x138) from [<c02f605c>] (swapin_readahead+0x84/0xe0)
<5>[328793.767676] [<c02f605c>] (swapin_readahead+0x84/0xe0) from [<c02e5840>] (handle_pte_fault+0x204/0x828)
<5>[328793.767685] [<c02e5840>] (handle_pte_fault+0x204/0x828) from [<c02e6dcc>] (handle_mm_fault+0x120/0x154)
<5>[328793.767694] [<c02e6dcc>] (handle_mm_fault+0x120/0x154) from [<c0213f28>] (do_page_fault+0x12c/0x398)
<5>[328793.767703] [<c0213f28>] (do_page_fault+0x12c/0x398) from [<c02001d0>] (do_DataAbort+0x48/0xc4)
<5>[328793.767712] [<c02001d0>] (do_DataAbort+0x48/0xc4) from [<c0205cb8>] (__dabt_usr+0x38/0x40)
...
<3>[328793.793830] Out of memory: Kill process 23732 (Compositor) score 574 or sacrifice child
<0>[328802.627654] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
---

Dianders@/snanda@ - please recommend someone who this issue can be assigned to.

 
upload_file_kcrash-3f9dd84e00000000 (1).kcrash
5.5 KB Download
Cc: jcliang@chromium.org semenzato@chromium.org katierh@chromium.org
Components: OS>Kernel
Labels: Performance-Memory
Status: Available (was: Untriaged)
The uploaded kcrash is not right, it should be the one from:
https://crash.corp.google.com/browse?stbtiq=b928b99500000000

This machine was seriously out of memory.


It eventually crashes here:

<6>[328793.793261] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
<6>[328793.793282] [  142]     0   142      687       55       4      120         -1000 udevd
<6>[328793.793293] [  918]   202   918     9710       91      10      140         -1000 rsyslogd
<6>[328793.793300] [  946]   201   946      649       95       4      142         -1000 dbus-daemon
<6>[328793.793307] [  955]     0   955      764       45       4       18         -1000 agetty
<6>[328793.793314] [  960]     0   960      371       45       3       18         -1000 agetty
<6>[328793.793321] [  998]     0   998      420       43       4       28         -1000 minijail0
<6>[328793.793329] [ 1003]   219  1003     1310      113       5      108         -1000 wpa_supplicant
<6>[328793.793335] [ 1010]   229  1010      392       46       3       19         -1000 daisydog
<6>[328793.793342] [ 1240]     0  1240      420       43       3       28         -1000 minijail0
<6>[328793.793349] [ 1246]   228  1246     4431      292       8      143         -1000 powerd
<6>[328793.793356] [ 1280]     0  1280     2950      211       8      285         -1000 session_manager
<6>[328793.793364] [ 1303]     0  1303     1919      129       7       91         -1000 debugd
<6>[328793.793370] [ 1305]   231  1305    16759     1058      30     3751         -1000 X
<6>[328793.793377] [ 1307]   207  1307    10967       77      12      124         -1000 tcsd
<6>[328793.793383] [ 1308]   223  1308     8994      153       9      177         -1000 chapsd
<6>[328793.793390] [ 1324]     0  1324     5814      150      10      270         -1000 cryptohomed
<6>[328793.793397] [ 1342]     0  1342     1528      133       6       83         -1000 firewalld
<6>[328793.793404] [ 1343]     0  1343      420       43       3       28         -1000 minijail0
<6>[328793.793411] [ 1346]   230  1346     1989      137       6      117         -1000 permission_brok
<6>[328793.793418] [ 1369]     0  1369     3109      531       9      183         -1000 shill
<6>[328793.793425] [ 1426]   202  1426      396       46       3       23         -1000 logger
<6>[328793.793432] [ 1683]  1000  1683   219366    11046     282    34393             0 chrome
<6>[328793.793439] [ 2039]  1000  2039    43437      425      59     1215             0 chrome
<6>[328793.793446] [ 2092]  1000  2092   263741       54       8      116             0 nacl_helper_boo
<6>[328793.793454] [ 2098]  1000  2098     1110        0       4       36             0 nacl_helper_non
<6>[328793.793461] [ 2137]  1000  2137    43437      162      32     1220             0 chrome
<6>[328793.793468] [ 2239]  1000  2239   160442    28194     287    16909           200 chrome
<6>[328793.793475] [ 2391]  1000  2391    36146      133      38     3110           200 chrome
<6>[328793.793482] [ 3129]     0  3129     1051       62       4       69         -1000 lid_touchpad_he
<6>[328793.793489] [ 3367]     0  3367      365       45       4       19         -1000 periodic_schedu
<6>[328793.793496] [ 3377]     0  3377      365       46       3       19         -1000 periodic_schedu
<6>[328793.793504] [ 3382]     0  3382      420       43       3       29         -1000 minijail0
<6>[328793.793511] [ 3388]     0  3388      365       45       4       19         -1000 periodic_schedu
<6>[328793.793518] [ 3389]   226  3389     4281      125       7      135         -1000 mtpd
<6>[328793.793525] [ 3407]   241  3407     8441       77      11      174         -1000 ModemManager
<6>[328793.793532] [ 3412]     0  3412      420       43       3       28         -1000 minijail0
<6>[328793.793539] [ 3417]   238  3417      639      131       4       61         -1000 avahi-daemon
<6>[328793.793546] [ 3421]   238  3421      639       44       4       51         -1000 avahi-daemon
<6>[328793.793554] [ 3424]     0  3424     2166      191       7       90         -1000 metrics_daemon
<6>[328793.793561] [ 3425]     0  3425      420       43       3       28         -1000 minijail0
<6>[328793.793568] [ 3426]   600  3426     3354      141       6      235         -1000 cras
<6>[328793.793575] [ 3433]   218  3433      969      115       4       89         -1000 bluetoothd
<6>[328793.793582] [ 3444]     0  3444     2855      233       8      617         -1000 update_engine
<6>[328793.793591] [ 3450]     0  3450     4556      146       8      203         -1000 disks
<6>[328793.793602] [ 3478]     0  3478      501       56       3       40         -1000 upstart-socket-
<6>[328793.793610] [ 3898]     0  3898     1697       49       6       84         -1000 warn_collector
<6>[328793.793617] [ 3931]     0  3931      365       29       3       15         -1000 sh
<6>[328793.793624] [ 3948]     0  3948      420       43       3       29         -1000 minijail0
<6>[328793.793631] [ 4016]   232  4016     1540      115       5       67         -1000 netfilter-queue
<6>[328793.793639] [ 4023]   234  4023      872      117       4       59         -1000 tlsdated
<6>[328793.793645] [ 4024]     0  4024      351       45       3       14         -1000 logger
<6>[328793.793652] [ 4043]     0  4043      854       13       4       60         -1000 tlsdated-setter
<6>[328793.793660] [ 4049]     0  4049      365       45       4       19         -1000 periodic_schedu
<6>[328793.793667] [10037]  1000 10037   118006     5125     217    22141           300 chrome
<6>[328793.793675] [10044]  1000 10044   150264     6464     308    43793           300 chrome
<6>[328793.793682] [10057]  1000 10057   115684     4670     187    16089           300 chrome
<6>[328793.793689] [10261]  1000 10261    70473     2701     103     3107           300 chrome
<6>[328793.793696] [10273]  1000 10273   226592     6982     500    67620           300 chrome
<6>[328793.793703] [10334]  1000 10334    68171     2990     102     2648           300 chrome
<6>[328793.793710] [11460]  1000 11460   268067       70      19      860           300 nacl_helper_boo
<6>[328793.793720] [ 1665]  1000  1665    70217     2719     101     2862           300 chrome
<6>[328793.793729] [ 1676]  1000  1676   274311       70      59     8179           300 nacl_helper_boo
<6>[328793.793738] [ 8368]  1000  8368    79643     3509     132     6034           300 chrome
<6>[328793.793747] [ 9432]  1000  9432   107660     8218     168     8681           300 chrome
<6>[328793.793756] [22232]   224 22232      531      131       4       57         -1000 dhcpcd
<6>[328793.793764] [22900]  1000 22900   134957     6316     259    30445           417 chrome
<6>[328793.793770] [23053]  1000 23053   163287     5521     323    31211           358 chrome
<6>[328793.793777] [23084]  1000 23084   126568     5541     248    26618           475 chrome
<6>[328793.793787] [23850]     0 23850      602       66       3       25         -1000 sleep
<6>[328793.793794] [23851]     0 23851      602       66       3       25         -1000 sleep
<6>[328793.793802] [23855]     0 23855      602       66       3       25         -1000 sleep
<6>[328793.793809] [23860]     0 23860      602       66       3       25         -1000 sleep
<6>[328793.793816] [23902]  1000 23902   106708     7643     184    14751           300 chrome
<6>[328793.793823] [23959]  1000 23959   137050    25830     239    38071           300 chrome
<3>[328793.793830] Out of memory: Kill process 23732 (Compositor) score 574 or sacrifice child
<0>[328802.627654] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
<5>[328802.627692] CPU: 2 PID: 23959 Comm: chrome Tainted: G         C   3.10.18 #1
<5>[328802.627746] [<c020cf9c>] (unwind_backtrace+0x0/0x110) from [<c020a08c>] (show_stack+0x20/0x24)
<5>[328802.627778] [<c020a08c>] (show_stack+0x20/0x24) from [<c07688d0>] (dump_stack+0x20/0x28)
<5>[328802.627807] [<c07688d0>] (dump_stack+0x20/0x28) from [<c0767bc0>] (panic+0xa8/0x1fc)
<5>[328802.627836] [<c0767bc0>] (panic+0xa8/0x1fc) from [<c028beac>] (watchdog_timer_fn+0x234/0x26c)
<5>[328802.627867] [<c028beac>] (watchdog_timer_fn+0x234/0x26c) from [<c024cf28>] (__run_hrtimer+0xc4/0x1e0)
<5>[328802.627895] [<c024cf28>] (__run_hrtimer+0xc4/0x1e0) from [<c024db60>] (hrtimer_interrupt+0x148/0x2a0)
<5>[328802.627925] [<c024db60>] (hrtimer_interrupt+0x148/0x2a0) from [<c06182b4>] (arch_timer_handler_virt+0x38/0x48)
<5>[328802.627955] [<c06182b4>] (arch_timer_handler_virt+0x38/0x48) from [<c02902b0>] (handle_percpu_devid_irq+0x8c/0x124)
<5>[328802.627980] [<c02902b0>] (handle_percpu_devid_irq+0x8c/0x124) from [<c028c258>] (generic_handle_irq+0x30/0x40)
<5>[328802.628007] [<c028c258>] (generic_handle_irq+0x30/0x40) from [<c0206964>] (handle_IRQ+0x78/0xa0)
<5>[328802.628031] [<c0206964>] (handle_IRQ+0x78/0xa0) from [<c0200390>] (gic_handle_irq+0x48/0x6c)
<5>[328802.628054] [<c0200390>] (gic_handle_irq+0x48/0x6c) from [<c0205b80>] (__irq_svc+0x40/0x50)
upload_file_kcrash-b928b99500000000.kcrash
127 KB Download
The low memory condition might have nothing to do with the kernel crash. Looks like cpu 3 locked up while in cpuidle state:

<2>[328802.628936] CPU3: stopping
<5>[328802.628959] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G C 3.10.18 #1
<5>[328802.628990] [<c020cf9c>] (unwind_backtrace+0x0/0x110) from [<c020a08c>] (show_stack+0x20/0x24)
<5>[328802.629014] [<c020a08c>] (show_stack+0x20/0x24) from [<c07688d0>] (dump_stack+0x20/0x28)
<5>[328802.629038] [<c07688d0>] (dump_stack+0x20/0x28) from [<c020b7d4>] (handle_IPI+0xcc/0x124)
<5>[328802.629059] [<c020b7d4>] (handle_IPI+0xcc/0x124) from [<c02003ac>] (gic_handle_irq+0x64/0x6c)
<5>[328802.629081] [<c02003ac>] (gic_handle_irq+0x64/0x6c) from [<c0205b80>] (__irq_svc+0x40/0x50)
<5>[328802.629095] Exception stack(0xef2fbf00 to 0xef2fbf48)
<5>[328802.629113] bf00: ef2fbf48 000011e9 e7135d03 000011e9 00000001 c1f211c8 38a149c0 000011e8
<5>[328802.629131] bf20: c0e10100 c0e10100 c0e60be8 ef2fbf7c 00000008 ef2fbf48 c026d514 c05f8684
<5>[328802.629144] bf40: 80000113 ffffffff
<5>[328802.629167] [<c0205b80>] (__irq_svc+0x40/0x50) from [<c05f8684>] (cpuidle_enter_state+0x60/0xe8)
<5>[328802.629192] [<c05f8684>] (cpuidle_enter_state+0x60/0xe8) from [<c05f883c>] (cpuidle_idle_call+0x130/0x224)
<5>[328802.629216] [<c05f883c>] (cpuidle_idle_call+0x130/0x224) from [<c0206d04>] (arch_cpu_idle+0x18/0x48)
<5>[328802.629239] [<c0206d04>] (arch_cpu_idle+0x18/0x48) from [<c026bfc0>] (cpu_startup_entry+0x110/0x1d0)
<5>[328802.629261] [<c026bfc0>] (cpu_startup_entry+0x110/0x1d0) from [<c07648d8>] (secondary_start_kernel+0x130/0x154)
<5>[328802.629295] [<c07648d8>] (secondary_start_kernel+0x130/0x154) from [<80763e24>] (0x80763e24)
<4>[328803.822976] SMP: failed to stop secondary CPUs

Comment 3 by srcv@chromium.org, Sep 26 2016

Observed this issue on chrome device Pit with M54 54.0.2840.39 / 8743.41.0 beta during hangout calls.
- Hangout call ended abruptly on chrome device sometimes during screensharing and sometimes during switching external camera

Crash IDs:
35a81e2d00000000
44dc3a5e00000000
346e6e2d00000000



 

Comment 4 by srcv@chromium.org, Sep 27 2016

Observed this issue on chrome device Pit with M53 53.0.2785.144 / 8530.93.0 stable during hangout calls. Hangout call ended abruptly on Pit during external camera switching

Crash IDs:
112aeb2d00000000
eb29eb2d00000000
c71cf95e00000000

Comment 5 by srcv@chromium.org, Sep 27 2016

Cc: srcv@chromium.org

Comment 6 by srcv@chromium.org, Sep 30 2016

Observed this issue on chrome device Pi with M55 55.0.2874.0 / 8848.0.0 dev 

Crash Id: 5476516d00000000
Cc: alberto@chromium.org diand...@chromium.org
Owner: snanda@chromium.org
Sameer: I haven't had any time to look into this.  Could you see if you could find someone?  Really this looks like a Chrome memory leak and the system is so full that it gives up trying to find more memory and reboots.  Probably this actually needs someone from the Chrome team, but I don't know who.  Alberto?

A few notes:
* It's possible that this memory leak is showing up elsewhere, too.  See b/31401810
* It's possible that we could make this a little better if we try to re-enable compaction (we tried in http://crosbug.com/p/45689 but that got reverted).  That would just delay the inevitable, though.  At least the machine Dan looked at was really truly out of memory.
Cc: marc...@chromium.org
How would compaction help with an OOM situation? It will only help you find contiguous areas, which user space doen't use.
Cc: derat@chromium.org abodenha@chromium.org ericrk@chromium.org
Labels: M-54 ReleaseBlock-Stable
+few chrome folks who may be able to help/evaluate too

This is top kernel crash in ChromeOS M54 
https://crash.corp.google.com/browse?q=product.name%3D%27ChromeOS%27%20AND%20product.version%3D%278743.76.0%27%20AND%20exec_name%3D%27kernel%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=

@8: As per above, compaction probably wouldn't help too much in the crash analyzed in @2.  ...but in general it will help avoid kernel issues in low memory situations and should cause the kernel to report out of memory less often...

Actually, though, something is peculiar here.  This is unlike previous OOM stuff I've seen before.  One thing that is weird is:

<6>[328793.777925] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
...
<6>[328793.778410] [22900]  1000 22900   134957     6316     259    30445           417 chrome
<6>[328793.778416] [23053]  1000 23053   163287     5521     323    31211           358 chrome
<6>[328793.778423] [23084]  1000 23084   126568     5541     248    26618           475 chrome
<6>[328793.778430] [23728]  1000 23728   154236     7045     286    35064           533 chrome

I don't remember seeing such weird oom_score_adj values before?

Also note that, when it dies, there are still plenty of Chrome processes to kill, like:

<6>[328793.793667] [10037]  1000 10037   118006     5125     217    22141           300 chrome
<6>[328793.793675] [10044]  1000 10044   150264     6464     308    43793           300 chrome
<6>[328793.793682] [10057]  1000 10057   115684     4670     187    16089           300 chrome
<6>[328793.793689] [10261]  1000 10261    70473     2701     103     3107           300 chrome
<6>[328793.793696] [10273]  1000 10273   226592     6982     500    67620           300 chrome
<6>[328793.793703] [10334]  1000 10334    68171     2990     102     2648           300 chrome

...that means it's not REALLY out of memory because it should just be able to kill some tabs.

Also note that it dies trying to kill the compositer:

<3>[328793.793830] Out of memory: Kill process 23732 (Compositor) score 574 or sacrifice child
<0>[328802.627654] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3

...which is one of the tasks with a really weird OOM score...
Labels: -ReleaseBlock-Stable
Sameer/Stephane, who should be able to evaluate this further?
Owner: abodenha@chromium.org
Albert, are you aware of recent spikes in Chrome memory leaks that dianders was suspecting in c#7 as a possible theory?
Owner: bccheng@chromium.org
Possibly triggered by  bug 624456  or b/32464369

bccheng@ any thoughts here?
Labels: M-56
reproducible in 9000.50.0, 56.0.2924.53 on Kevin when close lid/open lid with external monitor connected via apple type-c adapter. 

crash id: 
15b2ae9080000000
6ce16e9080000000
@14: Why do you think your crashes have anything to do with the other crashes reported here?

15b2ae9080000000: <3>[ 600.115145] INFO: task rockchip_drm_at:151 blocked for more than 120 seconds.
6ce16e9080000000: <3>[ 960.108733] INFO: task rockchip_drm_at:161 blocked for more than 120 seconds.

Neither of those have anything to do with memory pressure that I'm aware of.  Please file a new bug.
Issue 679855 has been merged into this issue.
Labels: Hotlist-Enterprise
Can this bug may cause crashes at asus chromebook (2048MB RAM) Asus C202SA crash? Receiving numerous crashes from this device.

https://drive.google.com/a/google.com/file/d/0B-g52zibXA02UHNNVWF1cXBnUVU/view?usp=sharing

Comment 18 by ka...@chromium.org, Mar 10 2017

Issue 700403 has been merged into this issue.
I _might_ be debugging a similar issue in  bug #702707 .  We'll see...
Issue 700372 has been merged into this issue.
This crash has gone up from 15.62% in Stable 9202.56.1 to 29.17% in 9202.60.0 (almost doubled). https://crash.corp.google.com/browse?q=product.name%3D%27ChromeOS%27%20AND%20product.version%20in%20(%279202.56.1%27%2C%279202.60.0%27)&ignore_case=false&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt=&compProp=product.Version&v1=9202.56.1&v2=9202.60.0
Ben are you looking at this crash. If you are not the right owner, can you point to someone else who can take a look?
This is probably the same as  bug #702707 

I've got patches to address that bug and am working on porting to various kernels.  Right now I've got it in 4.4 and 3.14.
As per the analysis in  bug #702707  probably the "Compositor" process often ends up somewhere in the kernel and is blocked.  Then when we choose it as an OOM victim it doesn't die.

Before the fixes in  bug #702707  this refusal to die will wedge the whole system with a lockup, as seen here.

After the fixes in  bug #702707  this refusal to die will no longer wedge the whole system.  We'll instead pick some other task to kill.

That will probably fix this bug.

--

I was a bit curious about what this "Compositor" process was since it doesn't show up normally in a "ps aux".  If someone on this thread wants to comment more about it that would be interesting...

From what I can tell from quick poking the "Compositor" process seems to be a short lived process that's spawned sometimes, like when I create a new tab.

It's unclear (to me) if the OOM killer should really be killing the "Compositor".  Possibly it's getting killed because:

a) This child inherits the oom score from the parent

b) This child might share a "->mm" with the parent.  In this case the OOM killer tries to kill a child instead of the parent (in the hopes that it can save the parent, since the child cannot survive without a parent).  If this is the case then likely my patch is the best we can do and likely the parent will be killed shortly after the child fails to die.

---

NOTE that I put a delay during the spawning of the Compositor process (patching __set_task_comm) and then I straced it.  It looks like it spends a  bunch of time waiting on a Futex.  Here's the trace:


# strace -p 4833
strace: Process 4833 attached
gettid()                                = 5
gettid()                                = 5
clock_gettime(CLOCK_MONOTONIC, {tv_sec=123, tv_nsec=86353890}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=123, tv_nsec=86986807}) = 0
futex(0xffcf5164, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xffcf5144, 2) = 1
futex(0xffcf5144, FUTEX_WAKE_PRIVATE, 1) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=123, tv_nsec=87659974}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=123, tv_nsec=87810474}) = 0
futex(0xffcf5164, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xffcf5144, 2) = 1
futex(0xffcf5144, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xb95f94e4, FUTEX_WAKE_PRIVATE, 1) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=123, tv_nsec=92023307}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=123, tv_nsec=92366890}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=123, tv_nsec=92625307}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=123, tv_nsec=92830932}) = 0
futex(0xeb7fece4, FUTEX_WAIT_PRIVATE, 1, NULL

< long delay here >

) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=186056067}) = 0
futex(0xeb7fecc4, FUTEX_WAKE_PRIVATE, 1) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=187452275}) = 0
futex(0xffcf459c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xffcf457c, 2) = 1
futex(0xffcf457c, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xb96279c4, FUTEX_WAKE_PRIVATE, 1) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=188572275}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=188773817}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=188905067}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=189098733}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=189319233}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=189448442}) = 0
futex(0xeb7fece4, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=191274567}) = 0
futex(0xeb7fecc4, FUTEX_WAKE_PRIVATE, 1) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=191495358}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=191634192}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=143, tv_nsec=191736567}) = 0
futex(0xeb7fece4, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=161, tv_nsec=662066784}) = 0
futex(0xeb7fecc4, FUTEX_WAKE_PRIVATE, 1) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=161, tv_nsec=662558534}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=161, tv_nsec=662716909}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=161, tv_nsec=662842909}) = 0
futex(0xffcf4b04, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xffcf4ae4, 2) = 1
futex(0xffcf4ae4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xb96279c4, FUTEX_WAKE_PRIVATE, 1) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=161, tv_nsec=663396492}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=161, tv_nsec=663540284}) = 0
futex(0xffcf4b04, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xffcf4ae4, 2) = 1
futex(0xffcf4ae4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xb96279c4, FUTEX_WAKE_PRIVATE, 1) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=161, tv_nsec=664300367}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=161, tv_nsec=664496367}) = 0
futex(0xeb7fece4, FUTEX_WAIT_PRIVATE, 1, NULL) = ?
+++ exited with 0 +++

===

If it really is that we were trying to wait on a mutex then we're all good.  The kernel looks like it will properly unblock the child once the holder of the futex is killed, so we won't end up in permanent zombie mode.

===

Leaving this open for a little bit to see if anyone from graphics might have thoughts.  ...but otherwise we can probably close it as fixed once the fixes for  bug #702707  land.
Cc: durga.behera@chromium.org bccheng@chromium.org pucchakayala@chromium.org songsuk@chromium.org ajha@chromium.org kavvaru@chromium.org dhadd...@chromium.org brajkumar@chromium.org
 Issue 709945  has been merged into this issue.
Labels: M-59
Status: Assigned (was: Available)
We're still seeing the issue on minnie in 59.0.3065.0/9448.0.0, 
Owner: diand...@chromium.org
Doug looks like merged bug in #25 has the same root cause. PTAL. 
@26: Grace: can you please give me some pointers to crashes that are on R59?
Owner: gkihumba@chromium.org
Specifically note that I tried searching crash for minnie kernel crashes on 9448.0.0.  I found 3.

1. f2f90cb610000000: Looks like some yet unknown CPU errata.  CPU0 is wedged on PC 0x2d64ad24, which is userspace.  
                     Since CPU0 is wedged we get various errors about interrupts not being serviced, too.
2. 77ee0a8c80000000: Looks like the same CPU errata.  CPU3 wedged on PC 0x372c7d24.
3. 66fda20790000000: stuck_netdevice.  b/35578769

So if you have examples of cases where you think we've having trouble freeing up memory, please point me at them.
Forked #1 and #2 to  bug #710131 
The first 2 crashes in #29 are the R59 ones. I saw the magic signature and thought it was the same root cause. Looks like it's not.
Owner: bccheng@chromium.org
Assigning this back to Ben. Please close if the original issue was fixed.
Observed this crash on Jerry 59.0.3071.25/9460.11.0 dev while sending feedback report.  Crash ID: d0fec22e80000000
Mergedinto: 702707
Status: Duplicate (was: Assigned)
Summary: Crash likely caused by trouble freeing up memory in low-memory situations (must see an OOM to qualify for this bug) (was: Crash likely caused by trouble freeing up memory in low-memory situations)
@33: not all "watchdog" crashes are caused by low memory.  In your case, I see:

<4>[  698.956853] SMP: failed to stop secondary CPUs
<3>[  698.956868] CPU1 PC: <38506080> 0x38506080

...so you should star  bug #710131  and follow that.

---

Marking this bug as a dupe, which effectively closes this bug.

Comment 35 by ka...@chromium.org, Sep 25 2017

Issue 755938 has been merged into this issue.

Sign in to add a comment