New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 846797 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 4
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Collect feedback reports which contain OOM kill events and analyze their cause

Project Member Reported by cylee@chromium.org, May 25 2018

Issue description

Vovo has collected some crash logs and analyzed OOM causes from it.
To get more OOM logs, I'm turning to feedback reports to get more data.

Currently there lacks a way to bulk download feedback reports and filter those we're interested in. Feedback team has a stubby server which we can list/query feedback reports and download system_logs.txt. However the service is not for bulk download, so it would be fairly slow to examine a large amount (say, 1 month) of feedback reports.

I've contacted feedback support team and we're seeking for ways to attach OOM labels to feedback reports so we can search by labels. In the meanwhile, I'm writing a script to download feedback in parallel (e.g., multiple worker or a map-reduce program). 

After that I'll analyze those collected report to understand why OOM happens.


 

Comment 1 by cylee@chromium.org, May 29 2018

Cc: conradlo@chromium.org
I have a script to download feedbacks. The main problem is there're a lot of auto-generated "Last 10000 JS log records" feedbacks which spams the service
(Like https://listnr.corp.google.com/product/208/report/85471460579)
There're ~40000 feedback a day and most of them are such auto-generated feedbacks. One system_logs.txt is ~10M, so the size of feedback in one day is ~400G. We tried my best to filter out those auto-generated feedbacks and it's becomes ~4000 feedbacks a day.

Per discussed with cywang@ and vovoy@, we see OOM kill count decreased recently. So we compared feedbacks on 3/21 and 5/16. The statics is at
https://docs.google.com/spreadsheets/d/1QqrJPKvOYRPnfE-bVQZX4GbOkUHwJrp4AASjK50IOcs/edit#gid=0

In summary, the number of OOM kills decreased a lot (20% -> 5%). 

We also checked the number of report with the template "Error: Tab killed feedback". The feedback is displayed to users if there are 2 sad tabs rendered in 10 seconds. Users have to click "send feedback" to submit the feedback.
The percentage of such feedbacks dropped from 23% to 4%, which explains the OOM kill drops. That is, the chance of sad tab decreased a lot from March to May.

We only sampled one day each from March and May. I'll try to get more samples for analysis. Also we're not sure what caused the drop.




do we have any ideas on what caused the sudden drop?

Comment 3 by cylee@chromium.org, May 29 2018

FYI, the downloader code has been submitted to my google3 experimental directory: https://critique.corp.google.com/#review/198438683 

Sample usage:
  1. checkout the source
  2. google3$ blaze build experimental/users/cylee/fb_downloader
  3. google3$ blaze-bin/experimental/users/cylee/fb_downloader/fb_downloader --date 20180516 --download_dir ./non_automatic --min_chrome_version 65
  or you can run fb_downloader --help to see all options.

Comment 4 by cylee@chromium.org, May 30 2018

Here's some statistics by analyzing OOM log from 20180516:

anon < 100MB and anon < swap_free: 36 (20.81%)
swap_free < 100MB and swap_free < anon: 76 (43.93%)
others: 61 (35.26%)
Total: 173

Details are in the attached file.

vovo has plans to solve OOM caused by the first case (anon ~= 0) and the second case (swap free ~= 0). However there're still other cases we need to investigate.
20180516.report
11.9 KB Download
Status: Fixed (was: Started)

Sign in to add a comment