Issue metadata
Sign in to add a comment
|
100.2% regression in loading.desktop.network_service at 623494:623545 |
||||||||||||||||||
Issue descriptionNetwork service got slower after Per-message dispatch tasks https://chromium.googlesource.com/chromium/src/+/471aa794bbb9e3a98542ee2135ea61ddf4bf87a3
,
Jan 18
(4 days ago)
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/15e50a74540000
,
Jan 18
(4 days ago)
+altimin FYI. This regression is clearly the result of changing Mojo bindings dispatch to schedule one task per message rather than flushing available messages in a single task per endpoint, as we discussed doing like... forever ago. I know that's the behavior we want since it gives the scheduler more task granularity and it also prevents a busy interface from starving its sequence, but we should be aware that it's going to cause real performance regressions in the wild. I am disabling the behavior temporarily until M73 branch, but we'll probably have some perf trouble on our hands once I flip it back on in a week.
,
Jan 18
(4 days ago)
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/15e50a74540000 All of the runs failed. The most common error (12/20 runs) was: Run at /b/s/w/ir/thirdTraceback (most recent call last):
,
Jan 18
(4 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/492f7c2f6f96f59f8aab79317faeeaa5136de83c commit 492f7c2f6f96f59f8aab79317faeeaa5136de83c Author: Ken Rockot <rockot@google.com> Date: Fri Jan 18 20:09:26 2019 Temporarily revert Mojo back to batch dispatch This is to avoid any surprises in M73 since we're so close to branch. Bug: 866708, 923437 Change-Id: Ia063bd130f9124e598651c9a385ac6871cfbe729 Reviewed-on: https://chromium-review.googlesource.com/c/1422663 Reviewed-by: John Abd-El-Malek <jam@chromium.org> Commit-Queue: Ken Rockot <rockot@google.com> Cr-Commit-Position: refs/heads/master@{#624251} [modify] https://crrev.com/492f7c2f6f96f59f8aab79317faeeaa5136de83c/mojo/public/cpp/bindings/connector.h
,
Jan 18
(4 days ago)
Given that this is controlled by a boolean now, can it be Finched so that we can see the performance from the field? If scheduling team wants this functionality because it'll give them more flexibility, that's great, but we should also ensure it doesn't regress performance in the meantime.
,
Jan 18
(4 days ago)
I really don't think this is something we should Finch. Also I would not treat this as something the "scheduling team" wants. It's something we should want in Chrome in general for better scheduling granularity. It makes a lot of sense and I consider it a bug that we don't do it already. We should be willing to tolerate some benchmark regressions to land this. If there are any major and obvious regressions, we can hook up some particularly sensitive bindings (e.g. network ones) to higher-priority task runners.
,
Jan 18
(4 days ago)
,
Jan 18
(4 days ago)
Looking at the history of the metric, it seems that the distribution is bimodal and sometimes switches back and forth. Most likely that there are two scenarios (e.g. "test end" event being triggered at two different points) and some scheduling changes might cause a change in probability distribution, which may look as a regression. If I'm correct, that's a problem with a benchmark (unfortunately, almost all benchmarks are prone to this). I think that jam@'s suggestion is a good one — I would expect to see no change (if not an improvement) in top-level metrics. If it is the case, it will be much easier to agree that it's a problem with a benchmark rather than with the patch. WDYT?
,
Jan 18
(4 days ago)
@Ken: I don't think anyone cares about benchmark regressions if they're not representative of the field. But how do we know that this is only going to cause benchmark regressions, and not real world regressions, without Finching it?
,
Jan 19
(4 days ago)
What about landing it on ToT and Finching the old behavior for comparison data? My concern is leaving it off ToT for too long, because nearly every week, new broken test code is landed which I then have to either fix or disable before trying to land this again. Also, do we have a reliable set of UMA metrics that we can actually monitor for meaningful performance monitoring in the wild? I remember doing the same thing when switching over to Mojo IPC in general and found all of our metrics to be really noisy and the whole effort kind of felt like a waste of time.
,
Yesterday
(24 hours ago)
@Ken: that sounds like a good idea; so would it land with a feature that's disabled before 73 branch, and then next thurs it would turn to enabled by default? That way you'd get data from 73 stable but it would be on trunk sooner. For stats, you can look at the hearbeat metrics. These and other high-level metrics are listed in https://docs.google.com/document/d/1GQLrftTK2xSseb030-xkWQdB5LJlo2dSm4lpF2Be0Us/edit?pli=1# which we're tracking for network service.
,
Today
(18 hours ago)
Speed Launch Metrics Survey contains the high-level metrics that we should care about: https://docs.google.com/document/d/1Ww487ZskJ-xBmJGwPO-XPz_QcJvw-kSNffm0nPhVpj8 |
|||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Jan 18 (4 days ago)