New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 816470 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

BattOr agent is doing a soft flush before recording a clock sync marker

Project Member Reported by tdres...@chromium.org, Feb 26 2018

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Feb 26 2018

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=816470

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=c5e9cca10807ca52ad1b3fc31fd529b92c56043a8612cea65dc7aa3bb1640b22


Bot(s) for this bug's original alert(s):

chromium-rel-mac11-air
chromium-rel-mac11-pro
chromium-rel-mac12
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Feb 26 2018

Cc: dpranke@chromium.org charliea@chromium.org tandrii@chromium.org
Owner: charliea@chromium.org
Status: Assigned (was: Untriaged)
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/1493bb70440000

Revert "[test_env.py] Reland: Warm up vpython virtualenv cache on swarming task shards" by charliea@chromium.org
https://chromium.googlesource.com/chromium/src/+/5908c5d2269594aff9c2454c4732360a311c27ca

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
Cc: -tandrii@chromium.org -charliea@chromium.org -dpranke@chromium.org -tdres...@chromium.org aschulman@chromium.org
Summary: BattOr agent is doing a soft flush before recording a clock sync marker (was: 440688%-2285324.6% regression in system_health.memory_desktop at 538765:538923)
I'm not sure how pinpoint landed on that change, but it's not the right one. 

I did figure out the problem: the recent BattOr agent update does a soft flush, waiting 50ms, before issuing the clock sync marker. This 50ms soft flush is what's showing up as a clock sync regression. We should only be doing a soft flush when actually opening the connection, so I'm going to look into how we can fix this - it shouldn't be too hard.
Project Member

Comment 5 by bugdroid1@chromium.org, Mar 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c27595f5c34ecd0d00e4638832fcf8f478eea048

commit c27595f5c34ecd0d00e4638832fcf8f478eea048
Author: Charlie Andrews <charliea@chromium.org>
Date: Mon Mar 05 21:10:57 2018

Adds the capability to track whether the BattOr connection is open

This is the first part of a fix aimed to prevent us from slow flushing
if the BattOr connection is already open. This used to be how it worked,
but recent improvements we made to the retry protocol caused a
regression in clock sync latency.

TBR=nednguyen@google.com

Bug:  816470 
Change-Id: Ibfceb02e62c333ff3d2b32c8bd52f5a1277bbe70
Reviewed-on: https://chromium-review.googlesource.com/946515
Reviewed-by: Charlie Andrews <charliea@chromium.org>
Commit-Queue: Charlie Andrews <charliea@chromium.org>
Cr-Commit-Position: refs/heads/master@{#540936}
[modify] https://crrev.com/c27595f5c34ecd0d00e4638832fcf8f478eea048/tools/battor_agent/battor_agent_unittest.cc
[modify] https://crrev.com/c27595f5c34ecd0d00e4638832fcf8f478eea048/tools/battor_agent/battor_connection.h
[modify] https://crrev.com/c27595f5c34ecd0d00e4638832fcf8f478eea048/tools/battor_agent/battor_connection_impl.cc
[modify] https://crrev.com/c27595f5c34ecd0d00e4638832fcf8f478eea048/tools/battor_agent/battor_connection_impl.h
[modify] https://crrev.com/c27595f5c34ecd0d00e4638832fcf8f478eea048/tools/battor_agent/battor_connection_impl_unittest.cc

Project Member

Comment 6 by bugdroid1@chromium.org, Mar 6 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fe367a2f23b529ca6c4ece813c87678316394783

commit fe367a2f23b529ca6c4ece813c87678316394783
Author: Charlie Andrews <charliea@chromium.org>
Date: Tue Mar 06 22:00:28 2018

Make the BattOr agent avoid needlessly reopening a connection

Because we do a slow flush when we reopen a connection, this prevents
a ~50ms startup cost if a connection is already open, which is important
especially when recording clock sync markers.

I've verified locally that this fixes a ~50ms clock sync latency
regression.

TBR=nednguyen@google.com

Bug:  816470 
Change-Id: Icab77977faa61755945883cf50613bfe1e575182
Reviewed-on: https://chromium-review.googlesource.com/946674
Commit-Queue: Charlie Andrews <charliea@chromium.org>
Reviewed-by: Charlie Andrews <charliea@chromium.org>
Cr-Commit-Position: refs/heads/master@{#541211}
[modify] https://crrev.com/fe367a2f23b529ca6c4ece813c87678316394783/tools/battor_agent/battor_agent.cc
[modify] https://crrev.com/fe367a2f23b529ca6c4ece813c87678316394783/tools/battor_agent/battor_agent.h
[modify] https://crrev.com/fe367a2f23b529ca6c4ece813c87678316394783/tools/battor_agent/battor_agent_unittest.cc

Project Member

Comment 7 by bugdroid1@chromium.org, Mar 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/7b53f088f8e81ed262a051e6a3dfe5ec078037f4

commit 7b53f088f8e81ed262a051e6a3dfe5ec078037f4
Author: Charlie Andrews <charliea@chromium.org>
Date: Tue Mar 13 21:48:37 2018

Upload the BattOr agent binary to latest (fixing clock sync bug)

In this CL, I also update the platform strings that I forgot to update
when I fixed the architecture string bug in  https://crbug.com/816443 .
This is necessary to upload new versions of the firmware.

TBR=nednguyen@google.com
Bug:  chromium:816470 
Change-Id: I65a2415ff5f5331439ae7f0abb34dcd7b1546053
Reviewed-on: https://chromium-review.googlesource.com/961278
Reviewed-by: Charlie Andrews <charliea@chromium.org>
Commit-Queue: Charlie Andrews <charliea@chromium.org>

[modify] https://crrev.com/7b53f088f8e81ed262a051e6a3dfe5ec078037f4/common/battor/battor/battor_binary_dependencies.json
[modify] https://crrev.com/7b53f088f8e81ed262a051e6a3dfe5ec078037f4/common/battor/bin/upload_battor_binaries.py

Cc: sullivan@chromium.org
I don't understand what all the memory regressions linked to this bug (https://chromeperf.appspot.com/group_report?bug_id=816470) have to do with BattOr flushes? Also, should the graphs improve after the CLs that landed?

Note that the reason bisect is pointing to the revert in r538843 is because that fixes the test, so it's a "change". Unfortunately the build seems broken at the memory regression.
I think something might be wrong on the dashboard side here: those weren't the original alerts at the time that this bug was filed. The original alerts had to do with a clock sync latency regression, which makes a heck of a lot more sense.
Cc: simonhatch@chromium.org
+simonhatch any idea what could have happened here? "Original alerts" are all memory alerts.

I don't know where the clock sync alerts are, but filing a new bug for the memory alerts, which are quite severe.
Status: Fixed (was: Assigned)
I think it's possible that I screwed this up, but I'm not 100% sure. I found the original alerts that made me file this bug based on an email that Ned sent me, but that email linked to the benchmarking team triage queue and, after I associated them with this bug (which I thought was the right thing to do?), they now seem to have disappeared. It also said that it would post a message and kick off a bisect on this bug, but neither of those appear to have happened (the bug wasn't really necessary, as the graph has already recoverdd). I am going to close this bug for now, though, because  the fix (https://chromeperf.appspot.com/group_report?keys=agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgICQooSntAoM) was effective.
Cc: charliea@google.com charliea@chromium.org catapult...@skia-buildbots.google.com.iam.gserviceaccount.com mlippautz@chromium.org hbos@chromium.org
 Issue 817952  has been merged into this issue.
Question: how were you able to get back to those? I accidentally attached them to the wrong bug - they should have been attached to this one, not that one, which I've now fixed. But after I did that, I had no idea how to get back to them: just clicking "Show triaged" seemed to show _all_ bugs for the rotation.
I just used "show triaged", and looked through all bugs--those were the only clock sync related alerts.
Got it. Thanks!

Sign in to add a comment