Issue metadata
Sign in to add a comment
|
BattOr agent is doing a soft flush before recording a clock sync marker |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Feb 26 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1493bb70440000
,
Feb 26 2018
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/1493bb70440000 Revert "[test_env.py] Reland: Warm up vpython virtualenv cache on swarming task shards" by charliea@chromium.org https://chromium.googlesource.com/chromium/src/+/5908c5d2269594aff9c2454c4732360a311c27ca Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Mar 2 2018
I'm not sure how pinpoint landed on that change, but it's not the right one. I did figure out the problem: the recent BattOr agent update does a soft flush, waiting 50ms, before issuing the clock sync marker. This 50ms soft flush is what's showing up as a clock sync regression. We should only be doing a soft flush when actually opening the connection, so I'm going to look into how we can fix this - it shouldn't be too hard.
,
Mar 5 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c27595f5c34ecd0d00e4638832fcf8f478eea048 commit c27595f5c34ecd0d00e4638832fcf8f478eea048 Author: Charlie Andrews <charliea@chromium.org> Date: Mon Mar 05 21:10:57 2018 Adds the capability to track whether the BattOr connection is open This is the first part of a fix aimed to prevent us from slow flushing if the BattOr connection is already open. This used to be how it worked, but recent improvements we made to the retry protocol caused a regression in clock sync latency. TBR=nednguyen@google.com Bug: 816470 Change-Id: Ibfceb02e62c333ff3d2b32c8bd52f5a1277bbe70 Reviewed-on: https://chromium-review.googlesource.com/946515 Reviewed-by: Charlie Andrews <charliea@chromium.org> Commit-Queue: Charlie Andrews <charliea@chromium.org> Cr-Commit-Position: refs/heads/master@{#540936} [modify] https://crrev.com/c27595f5c34ecd0d00e4638832fcf8f478eea048/tools/battor_agent/battor_agent_unittest.cc [modify] https://crrev.com/c27595f5c34ecd0d00e4638832fcf8f478eea048/tools/battor_agent/battor_connection.h [modify] https://crrev.com/c27595f5c34ecd0d00e4638832fcf8f478eea048/tools/battor_agent/battor_connection_impl.cc [modify] https://crrev.com/c27595f5c34ecd0d00e4638832fcf8f478eea048/tools/battor_agent/battor_connection_impl.h [modify] https://crrev.com/c27595f5c34ecd0d00e4638832fcf8f478eea048/tools/battor_agent/battor_connection_impl_unittest.cc
,
Mar 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fe367a2f23b529ca6c4ece813c87678316394783 commit fe367a2f23b529ca6c4ece813c87678316394783 Author: Charlie Andrews <charliea@chromium.org> Date: Tue Mar 06 22:00:28 2018 Make the BattOr agent avoid needlessly reopening a connection Because we do a slow flush when we reopen a connection, this prevents a ~50ms startup cost if a connection is already open, which is important especially when recording clock sync markers. I've verified locally that this fixes a ~50ms clock sync latency regression. TBR=nednguyen@google.com Bug: 816470 Change-Id: Icab77977faa61755945883cf50613bfe1e575182 Reviewed-on: https://chromium-review.googlesource.com/946674 Commit-Queue: Charlie Andrews <charliea@chromium.org> Reviewed-by: Charlie Andrews <charliea@chromium.org> Cr-Commit-Position: refs/heads/master@{#541211} [modify] https://crrev.com/fe367a2f23b529ca6c4ece813c87678316394783/tools/battor_agent/battor_agent.cc [modify] https://crrev.com/fe367a2f23b529ca6c4ece813c87678316394783/tools/battor_agent/battor_agent.h [modify] https://crrev.com/fe367a2f23b529ca6c4ece813c87678316394783/tools/battor_agent/battor_agent_unittest.cc
,
Mar 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/7b53f088f8e81ed262a051e6a3dfe5ec078037f4 commit 7b53f088f8e81ed262a051e6a3dfe5ec078037f4 Author: Charlie Andrews <charliea@chromium.org> Date: Tue Mar 13 21:48:37 2018 Upload the BattOr agent binary to latest (fixing clock sync bug) In this CL, I also update the platform strings that I forgot to update when I fixed the architecture string bug in https://crbug.com/816443 . This is necessary to upload new versions of the firmware. TBR=nednguyen@google.com Bug: chromium:816470 Change-Id: I65a2415ff5f5331439ae7f0abb34dcd7b1546053 Reviewed-on: https://chromium-review.googlesource.com/961278 Reviewed-by: Charlie Andrews <charliea@chromium.org> Commit-Queue: Charlie Andrews <charliea@chromium.org> [modify] https://crrev.com/7b53f088f8e81ed262a051e6a3dfe5ec078037f4/common/battor/battor/battor_binary_dependencies.json [modify] https://crrev.com/7b53f088f8e81ed262a051e6a3dfe5ec078037f4/common/battor/bin/upload_battor_binaries.py
,
Mar 16 2018
I don't understand what all the memory regressions linked to this bug (https://chromeperf.appspot.com/group_report?bug_id=816470) have to do with BattOr flushes? Also, should the graphs improve after the CLs that landed? Note that the reason bisect is pointing to the revert in r538843 is because that fixes the test, so it's a "change". Unfortunately the build seems broken at the memory regression.
,
Mar 16 2018
I think something might be wrong on the dashboard side here: those weren't the original alerts at the time that this bug was filed. The original alerts had to do with a clock sync latency regression, which makes a heck of a lot more sense.
,
Mar 16 2018
+simonhatch any idea what could have happened here? "Original alerts" are all memory alerts. I don't know where the clock sync alerts are, but filing a new bug for the memory alerts, which are quite severe.
,
Mar 16 2018
I think it's possible that I screwed this up, but I'm not 100% sure. I found the original alerts that made me file this bug based on an email that Ned sent me, but that email linked to the benchmarking team triage queue and, after I associated them with this bug (which I thought was the right thing to do?), they now seem to have disappeared. It also said that it would post a message and kick off a bisect on this bug, but neither of those appear to have happened (the bug wasn't really necessary, as the graph has already recoverdd). I am going to close this bug for now, though, because the fix (https://chromeperf.appspot.com/group_report?keys=agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgICQooSntAoM) was effective.
,
Mar 16 2018
Issue 817952 has been merged into this issue.
,
Mar 16 2018
All the clock sync latency regressions I see on https://chromeperf.appspot.com/alerts?sheriff=Benchmark%20Team&triaged=true&sortby=end_revision&sortdirection=down are linked to bug 800814 .
,
Mar 16 2018
Question: how were you able to get back to those? I accidentally attached them to the wrong bug - they should have been attached to this one, not that one, which I've now fixed. But after I did that, I had no idea how to get back to them: just clicking "Show triaged" seemed to show _all_ bugs for the rotation.
,
Mar 16 2018
I just used "show triaged", and looked through all bugs--those were the only clock sync related alerts.
,
Mar 16 2018
Got it. Thanks! |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Feb 26 2018