New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 636755 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Oct 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 589726
issue 629108
issue 636420



Sign in to add a comment

Bisect numbers completely different from perf dashboard

Project Member Reported by petrcermak@chromium.org, Aug 11 2016

Issue description

Context: https://bugs.chromium.org/p/chromium/issues/detail?id=636420

According to the dashboard (https://chromeperf.appspot.com/group_report?bug_id=636420), the metric increased 550,228 to 619,856 bytes.

However, according to the bisect (https://bugs.chromium.org/p/chromium/issues/detail?id=636420#c3), the metric stayed at 16,384 bytes.

The thing that I'm confused about is not that the bisect did not reproduce the regression (that happens quite often), but that the bisect results are order-of-magnitude different from the dashboard.

How is that possible?
Do we use completely different hardware on the bisect bots?
Is there a bug in the code that sets up the revisions during a bisect?
Any ideas?
 
Correction for #1:

Dashboard: 31,007,400 → 31,421,700
Blocking: 629108
Blocking: 636420
Wow, thanks for the detailed report!

I am working off of this spreadsheet about our hardware: https://docs.google.com/spreadsheets/d/1LTOSY9y1_sdDiL94XQTZrXmSzokn4p44hlgHQvLnEt4/edit#gid=62496670

Comment #1:
Builder: Win 7 Perf (3) (build187-m1)
windows	2008 R2		PowerEdge R210 II	x64	1	Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz	15.97 GB	2.4.4	2.8.3.windows.1
          "values": [
            619856
          ]

Bisector: win_perf_bisect (build242-m4)
windows	2008 R2		PowerEdge R220	x64	1	Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz	23.83 GB	2.4.4	2.8.3.windows.1
bisector.lkgr: RevisionState(rev=chromium@410353, values=[16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384], mean_value=16384.0, std_dev=0.0)
@@@STEP_LOG_LINE@Debug Info@bisector.fkbr: RevisionState(rev=chromium@410387, values=[16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384], mean_value=16384.0, std_dev=0.0)

Dave, it looks like the hardware doesn't match! Did we bisect against the correct bisector? Or have we got a hardware mismatch in the lab? (Also you might want to regenerate the spreadsheet to double-check). Should this have gone to win_x64_perf_bisect which has the same hardware? It looks like we could swap out Windows bisectors with Android bisector linux hosts to update the configs.

Petr, could that difference account for such a huge change in results?
The bots seem to be very similar (3.10 vs. 3.30 GHz and 15.97 vs 23.83 GB). I don't see how this could bring down V8's memory allocated by malloc from 605 KiB to 16 KiB.

Strange coincidence: I've just looked through the the charts on the dashboard (https://chromeperf.appspot.com/group_report?bug_id=636420) and I can see that 16,384 was reported on the dashboard by a completely different bot, chromium-rel-mac10 (brown chart at the top).
Owner: petrcermak@chromium.org
Following up, the case in #1, all the perfbots and bisectors are on the exact same hardware/OS (MacBookPro11,2 with OS X 10.11.6)

I think there is something strange happening with the metrics, Petr. Assigning to you to triage.
I definitely think this is either the metric, or dashboard that having some problem:

Before & after trace for "ChromiumPerf/chromium-rel-mac10/system_health.memory_desktop / memory:chrome:all_processes:reported_by_chrome:v8:allocated_by_malloc:effective_size_avg / load_tools /" in https://chromeperf.appspot.com/group_report?bug_id=636420:

Before:
https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_32-2016-08-07_08-11-19-33994.html

After:
https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_34-2016-08-09_08-44-44-82743.html


Clicking on the memory dumps in both traces show the memory data are very similar. So I don't expect a huge regression as seen on the graph. 
Note that in #5 I pulled the numbers we were seeing from the dashboard out of the chartjson from the perfbot; you could go back and check all the json.output links on the bots to verify, but I'm pretty sure it's the metric and not the dashboard.
Cc: hpayer@chromium.org u...@chromium.org
(The regression that's mentioned in the original post is now actually in https://bugs.chromium.org/p/chromium/issues/detail?id=637269).

#8: The traces you are referring to are from chromium-rel-mac10, but this issue is on Windows.

The situation is really strange. Here are the traces:

DASHBOARD (https://chromeperf.appspot.com/group_report?bug_id=637269)
  r410353:
    LOAD:social:twitter: 2 isolates (16+521 KiB allocated by malloc)
      https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_35-2016-08-08_10-40-38-7040.html
    BROWSE:social:twitter: 2 isolates (16+521 KiB allocated by malloc)
      https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_44-2016-08-08_10-40-52-44621.html
  r410387:
    LOAD:social:twitter: 2 isolates (16+589 KiB allocated by malloc)
      https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_35-2016-08-08_13-19-28-36286.html
    BROWSE:social:twitter: 2 isolates (16+589 KiB allocated by malloc)
      https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_44-2016-08-08_13-19-42-86279.html

BISECT (https://build.chromium.org/p/tryserver.chromium.perf/builders/win_perf_bisect/builds/6830)
  r410353:
    LOAD:social:twitter: 2 isolates (16+521 KiB allocated by malloc)
      https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_35-2016-08-12_07-36-54-50217.html
    BROWSE:social:twitter: ONLY 1 ISOLATE (16 KiB allocated by malloc) !!!!!
      https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_44-2016-08-12_07-37-07-72625.html
  r410387:
    LOAD:social:twitter: 2 isolates (16+589 KiB allocated by malloc)
      https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_35-2016-08-12_09-05-40-32766.html
    BROWSE:social:twitter: ONLY 1 ISOLATE (16 KiB allocated by malloc) !!!!!
      https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_44-2016-08-12_09-05-53-16135.html

The strange thing here is this:

DASHBOARD:
  LOAD:social:twitter: Always 2 isolates
  BROWSE:social:twitter: Always 2 isolates
BISECT
  LOAD:social:twitter: Always 2 isolates
  BROWSE:social:twitter: Always 1 isolate !!!!!

ulan,hpayer: This is a very V8-specific thing. Do you guys have any idea why this would happen?
Cc: reve...@chromium.org ericrk@chromium.org
As for #1, again the numbers reported on dashboard and by bisect match the numbers in their traces:

Trace from dashboard (r405900): https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_9-2016-07-15_18-42-37-40510.html
Trace from bisect (r405900): https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_9-2016-08-08_07-23-20-71724.html

Trace from dashboard (r405918): https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_9-2016-07-15_20-44-16-59995.html
Trace from bisect (r405900): https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_9-2016-08-08_08-15-59-55929.html

There's a huge difference between the size of cc on dashboard (15.3-17.3 MiB) and in the bisect (56.1 MiB). I dug into the traces and it turns out that CC holds at most 1 MiB resources/textures on the dashboard, but it holds quite a lot of 3-5 MiB resources/textures in the bisect.

+cc ericrk, reveman: Do you have any idea what could lead to such a huge difference in cc/gpu (e.g. different screen resolution or pixel density)?

Both the devices (dashboard buildbot and bisect trybot seem to have exactly the same configuration)
Is it possible GPU is misconfigured on some bots?
For #11 - when the original regression occurred the Mac retina bots were running MacOS 10.9. These bots appear to have been upgraded to 10.11 on June 27/28. This upgrade caused significant differences in many metrics, including these.

Unfortunately, when we compare perf dashboard to buildbot, we are comparing the original (10.9) metric to the new (10.11) bisect bot metric. I believe this accounts for the difference.

The follow up question is, why did upgrading change this metric so substantially. From looking at the size of tiles being used (for lack of a direct indicator), it appears that our larges (full-window) tile size went from 1024kb to ~3520kb.

This makes me suspect that the pre-upgrade retina bots were being forced into a non-retina resolution. Looking at the tile sizes (largest tile is 1kb), it seems very unlikely that we were rendering retina content.
ericrk: Your explanation seems very likely. I also suspected this could have something to do with Retina.

sullivan: Is there any way to confirm/reject this hypothesis? Are changes like this tracked anywhere?
I dug into this some more. You can check the OS in the following roundabout way:
1) Click the stdio link on the graph (you may need to scroll in the tooltip). You'll probably get a 404 since it's expired.
2) Delete the end of the stdio until the url ends in the buildnumber
3) On the buildbot status page that results, click the [stdout] link, which is stored for longer in logdog.
4) In the log, there will be a "_LogBrowserInfo" line that starts with either "OS: mac mavericks" or "OS: mac elcapitan".

Through these steps, I found that the bot in question was upgraded to elcapitan on build 3391, which corresponds to r407939 - r407983 range on the perf dashboard:
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.perf%2FMac_Retina_Perf__3_%2F3390%2F%2B%2Frecipes%2Fsteps%2Fblink_perf.events%2F0%2Fstdout
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.perf%2FMac_Retina_Perf__3_%2F3391%2F%2B%2Frecipes%2Fsteps%2Fblink_perf.events.reference%2F0%2Fstdout

Eric, it looks like you were looking into the graphs linked in comment 1, https://chromeperf.appspot.com/group_report?bug_id=629108. That spike is at build 3279 (r405901 - r405918), before the upgrade.

But it still sounds like there is a problem with retina being properly enabled. Here are the logs from before/after the alert on that page:
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.perf%2FMac_Retina_Perf__3_%2F3278%2F%2B%2Frecipes%2Fsteps%2Fsystem_health.memory_desktop%2F0%2Fstdout
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.perf%2FMac_Retina_Perf__3_%2F3279%2F%2B%2Frecipes%2Fsteps%2Fsystem_health.memory_desktop%2F0%2Fstdout

On both of them I see:

(INFO) 2016-07-15 20:34:45,795 browser._LogBrowserInfo:124    max_resolution_height: 2160
(INFO) 2016-07-15 20:34:45,795 browser._LogBrowserInfo:124    max_resolution_width: 4096

I'm not very accustomed to reading this output, but one difference sticks out:

3278: (INFO) 2016-07-15 18:33:05,231 browser._LogBrowserInfo:128    rasterization       : enabled
3279: (INFO) 2016-07-15 20:34:45,796 browser._LogBrowserInfo:128    rasterization       : unavailable_software

Could that be the issue?
Thanks for the additional info! Looked into this more.

There was a "regression" here in the original graph, it was caused by https://codereview.chromium.org/2151393002, which disabled GPU rasterization on 10.9 systems. There's not much we can do about this regression, as GPU rasterization was disabled to prevent visual corruption. The change to enable GPU rasterization never made it to a stable release on 10.9, so no stable users will experience this "regression"

This would also explain why we had such a hard time bisecting - the bisect bot was at 10.11, so it was completely unaffected by the change.

So I think we've explained the original regression. I'm still unclear on why 10.11 and 10.9 bots are so different.

The "max resolution" numbers cited above just indicate the GPUs capabilities, not the resolution we're running at. I'm still guessing that retina was not correctly enabled on the old 10.9 bot, but I'm not sure how to confirm - I've looked at the logs some more, but I don't think we log actual screen or window resolution.

The rasterization values you mention in #15 do explain the original (small) regression, but not the huge difference between 10.9/10.11.
Status: Fixed (was: Untriaged)
I'm marking this as "Fixed" because we figured out what the problem was. Please re-open if you feel this is not appropriate.
Labels: need-labs-startup

Sign in to add a comment