Check failed: commit_time_ > navigation_start_ in metrics_web_contents_observer.cc on Win10 |
||||||||
Issue descriptionVersion: 51.0.2687.0 git rev 9861e39f90174b134366db95471a64e0a474d874 OS: Windows 10.0.10586 on GCE What steps will reproduce the problem? (1) Run components_unittests --gtest_filter=MetricsWebContentsObserverTest.* https://chromium-swarm-dev.appspot.com/user/task/2daf6575e9d70a10 FATAL:metrics_web_contents_observer.cc(346)] Check failed: commit_time_ > navigation_start_ (486765000 bogo-microseconds vs. 486765000 bogo-microseconds)
,
Mar 21 2016
This is not just a clock skew issue on the GCE, something else is going wrong. For the tests that don't "commit" e.g. MetricsWebContentsObserverTest.LogAbortChainsNoCommit there is a clearer failure: [ RUN ] MetricsWebContentsObserverTest.LogAbortChainsNoCommit d:\src\gclient\src\base\test\histogram_tester.cc(166): error: Value of: actual_count Actual: 2 Expected: expected_count Which is: 1 Histogram "PageLoad.Internal.ProvisionalAbortChainSize.NoCommit" does not have the right total number of samples (1). It has (2). [ FAILED ] MetricsWebContentsObserverTest.LogAbortChainsNoCommit (2420 ms)
,
Mar 21 2016
Huh this is really odd. Can you list all of the tests that fail?
,
Mar 21 2016
Failing:
MetricsWebContentsObserverTest.LogAbortChainsNoCommit
Crashing (due to the DCHECK):
MetricsWebContentsObserverTest.BadIPC
MetricsWebContentsObserverTest.DontLogAbortChains
MetricsWebContentsObserverTest.DontLogIrrelevantNavigation
MetricsWebContentsObserverTest.LogAbortChains
MetricsWebContentsObserverTest.LogAbortChainsSameURL
MetricsWebContentsObserverTest.NotInMainError
MetricsWebContentsObserverTest.NotInMainFrame
MetricsWebContentsObserverTest.SamePageNoTrigger
,
Mar 21 2016
Sorry I did not notice the link. My hunch is that this is all caused by a stopped clock used for testing. I'm fairly sure we'd see these types of errors (i.e. the AbortChain failures) if timestamps are not incrementing. These tests definitely rely on a clock that at least does not stand still.
,
Mar 21 2016
Interesting... I added a really simple test:
TEST(TimeTicks, TimeIncrements) {
TimeTicks ticks(TimeTicks::Now());
Sleep(0);
TimeTicks ticks2(TimeTicks::Now());
ASSERT_GT(ticks2, ticks);
}
this passes locally on my Win8 machine, and my Win10 VM (VMWare).
I just ran this on the Win10 GCE bot and get:
https://chromium-swarm-dev.appspot.com/user/task/2db3fd071e1b9a10
[ RUN ] TimeTicks.TimeIncrements
d:\src\gclient\src\base\time\time_win_unittest.cc(245): error: Expected: (ticks2) > (ticks), actual: 1411312000 bogo-microseconds vs 1411312000 bogo-microseconds
[ FAILED ] TimeTicks.TimeIncrements (1 ms)
,
Mar 21 2016
+bruce, the QPC-meister
,
Mar 21 2016
I don't think it's valid to assume that two calls to TimeTicks::Now() won't return the same value. It's certainly valid to assume that the second will not be lower than the first. I changed the DCHECK_GT to a DCHECK_GE but it still doesn't work because there's other parts in the code that assume that seem to check time_to_commit is non zero. I could hack into PageLoadTracker::GetPageLoadMetricsInfo to return a very small time_to_commit if the two times are equal, but that seems somehow wrong.
,
Mar 21 2016
I suspect we'll just have to go through the code and remove these assumptions. Hopefully we can mock a timer in the tests to guarantee ticks at least for the abort cases. cc bmcquade, shivanisha who may have to pick this up as I'm OOO this week. I'll see if I can get to it tomorrow. Bryan, what do you think the best solution to this is? I think we can add a bool to the PageLoadTracker instead of using the proxy |time_to_commit|.
,
Mar 21 2016
The timer frequency that we get is highly variable. Common QPC frequencies are ~3.6 MHz or ~3.0 GHz (CPU speed) - so that's three orders of magnitude range. On some machines we decide not to use QPC and the frequency could be much lower - as low as 64 Hz. I would hope that none of our test machines fall back to that low a frequency. Since Sleep(0) returns immediately it seems possible that the test code in comment #6 could give the same value for both calls to TimeTicks::Now() if the CPU is running fast enough and QPC is 'only' running at 3.6 MHz.
,
Mar 21 2016
Sorry, yes, we should be allowing equality as a valid value. It is totally valid for two calls to get the current time to return the same value. Our assertions should only be checking that time didn't go backwards. Most of our assertions do this correctly but it sounds like we made mistakes in a few places.
,
Mar 21 2016
Regarding assuming a zero time_to_commit indicates that the page did not commit, that is unfortunately an assumption we've made in the code, that I see now is invalid. We'll need to add some additional state to distinguish between a commit having happened or not, rather than checking for a non-zero value. Is there a Chromium Optional<T> class that can be used in our case as an Optional<base::TimeDelta>, which allows us to additionally capture whether or not a value is set?
,
Mar 22 2016
I tried adding a bool and passing it around, but I still get failures so I probably did it wrong https://chromium-swarm-dev.appspot.com/user/task/2db46b620742fe10 my CL (which doesn't work) is here: https://codereview.chromium.org/1817263003
,
Mar 22 2016
#12 - reg: Optional<T> it's being discussed / attempted for some time now, it looks getting promising but might take a little more time. See: https://codereview.chromium.org/1245163002/
,
Mar 22 2016
I'm taking a look at this now. Thanks for writing up that patch wfh@, I think it gets us most of the way there. bmcquade@, I think passing a bool around is conceptually the best method.
,
Mar 22 2016
Sounds good, thanks! Can you add a TODO to consider using Optional<> when it's available?
,
Mar 22 2016
To clarify, the tests that are still failing rely on time advancing in more subtle ways (i.e. the code uses time advancing to make judgements about causality). Those cases can probably be fixed pretty easily.
,
Mar 22 2016
we decided to shelve the current CL and move towards using base::Optional once https://codereview.chromium.org/1245163002/ lands and use this to determine whether the TimeTicks is valid or not. to repro this locally, change IsBuggyAthlon() in base/time/time_win.cc to return true. This causes DCHECK failures and test failures.
,
Mar 24 2016
Starting work on this using the almost stable work on Optional in https://codereview.chromium.org/1245163002/(though not landed yet)
,
Mar 25 2016
The changes to the TimeDelta values in PageLoadExtraInfo are working and I will floating a patch soon. There are also some TimeDelta values in PageLoadTiming but at the moment we will not be able to use an Optional in the IPC between renderer and browser until the serialization framework is enhanced to handle Optionals. I will put a TODO comment regarding that.
,
Mar 29 2016
,
Jun 21 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/271f0336503725749c86df213f6710c6423f9b5f commit 271f0336503725749c86df213f6710c6423f9b5f Author: shivanisha <shivanisha@chromium.org> Date: Tue Jun 21 17:17:33 2016 This fix uses Optional TimeDeltas since they may be non existent instead of using zero as a non-existent value. It uses base::Optional class (https://codereview.chromium.org/1245163002/) BUG= 596367 Apart from the initial fix for this change which involves TimeDelta values in PageLoadExtraInfo to be converted to use base::Optional, this CL has the following 3 more changes: 1. Converted the foreground event condition from event_time < background_time to event_time <= background_time. This made all tests that have an event occuring before calling WasHidden (which fills the background_time as Now()) to be deterministic in assuming that the event will be logged in a foreground histogram irrespective of the clock resolution. This is also a better logical choice since the event has not been subjected to any background-specific functionality if it occurred at the same time as backgrounding. This fixed the first failure reported on this CL :CorePageLoadMetricsObserverTest.OnlyBackgroundLaterEvent. The tests that have event occurring after background will be non deterministic as they cannot be guaranteed to have event time strictly greater than background time. So I have added an interface in the test framework that lets the test retrieve the value of PageLoadExtraInfo for a committed load. I have not added the same for provisional loads/aborted loads but could be added in future if need be. 2. For the failure resulting from inter process time tick skew, conditional clamping is introduced only if the clock is not high resolution. The testing strategy for that is that the existing unit tests should all pass on a low resolution clock system. Failures from this should result in a DCHECK getting hit. (I have not added specific tests for testing this because it will be difficult to test all possible code paths in specific tests and mocking inter process time tick skew would require mocking isHighResolution at the least. The complexity is not worth it since all code paths will not get covered in a few tests anyways.) Review-Url: https://codereview.chromium.org/1837233002 Cr-Commit-Position: refs/heads/master@{#401022} [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/aborts_page_load_metrics_observer.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/aborts_page_load_metrics_observer_unittest.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/core_page_load_metrics_observer.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/core_page_load_metrics_observer.h [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/core_page_load_metrics_observer_unittest.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/document_write_page_load_metrics_observer.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/from_gws_page_load_metrics_observer.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/from_gws_page_load_metrics_observer_unittest.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/https_engagement_page_load_metrics_observer.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/page_load_metrics_observer_test_harness.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/page_load_metrics_observer_test_harness.h [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/metrics_web_contents_observer.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/metrics_web_contents_observer.h [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/page_load_metrics_observer.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/page_load_metrics_observer.h [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/page_load_metrics_util.cc [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/page_load_metrics_util.h [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/common/page_load_timing.h [modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/tools/metrics/histograms/histograms.xml
,
Jun 23 2016
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by csharrison@chromium.org
, Mar 21 2016