New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 596367 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug

Blocking:
issue 596344



Sign in to add a comment

Check failed: commit_time_ > navigation_start_ in metrics_web_contents_observer.cc on Win10

Project Member Reported by wfh@chromium.org, Mar 21 2016

Issue description

Version: 51.0.2687.0 git rev 9861e39f90174b134366db95471a64e0a474d874
OS: Windows 10.0.10586 on GCE

What steps will reproduce the problem?
(1) Run components_unittests --gtest_filter=MetricsWebContentsObserverTest.*

https://chromium-swarm-dev.appspot.com/user/task/2daf6575e9d70a10

FATAL:metrics_web_contents_observer.cc(346)] Check failed: commit_time_ > navigation_start_ (486765000 bogo-microseconds vs. 486765000 bogo-microseconds)
 
Cc: csharrison@chromium.org

Comment 2 by wfh@chromium.org, Mar 21 2016

Cc: kinuko@chromium.org wfh@chromium.org
Owner: ----
Status: Available (was: Started)
This is not just a clock skew issue on the GCE, something else is going wrong.  For the tests that don't "commit" e.g. MetricsWebContentsObserverTest.LogAbortChainsNoCommit there is a clearer failure:

[ RUN      ] MetricsWebContentsObserverTest.LogAbortChainsNoCommit

d:\src\gclient\src\base\test\histogram_tester.cc(166): error: Value of: actual_count

  Actual: 2

Expected: expected_count

Which is: 1

Histogram "PageLoad.Internal.ProvisionalAbortChainSize.NoCommit" does not have the right total number of samples (1). It has (2).

[  FAILED  ] MetricsWebContentsObserverTest.LogAbortChainsNoCommit (2420 ms)
Huh this is really odd. Can you list all of the tests that fail?

Comment 4 by wfh@chromium.org, Mar 21 2016

Failing:
    MetricsWebContentsObserverTest.LogAbortChainsNoCommit

Crashing (due to the DCHECK):
    MetricsWebContentsObserverTest.BadIPC
    MetricsWebContentsObserverTest.DontLogAbortChains
    MetricsWebContentsObserverTest.DontLogIrrelevantNavigation
    MetricsWebContentsObserverTest.LogAbortChains
    MetricsWebContentsObserverTest.LogAbortChainsSameURL
    MetricsWebContentsObserverTest.NotInMainError
    MetricsWebContentsObserverTest.NotInMainFrame
    MetricsWebContentsObserverTest.SamePageNoTrigger
Sorry I did not notice the link. My hunch is that this is all caused by a stopped clock used for testing. I'm fairly sure we'd see these types of errors (i.e. the AbortChain failures) if timestamps are not incrementing.

These tests definitely rely on a clock that at least does not stand still.

Comment 6 by wfh@chromium.org, Mar 21 2016

Cc: mar...@chromium.org
Interesting... I added a really simple test:

TEST(TimeTicks, TimeIncrements) {
  TimeTicks ticks(TimeTicks::Now());

  Sleep(0);

  TimeTicks ticks2(TimeTicks::Now());

  ASSERT_GT(ticks2, ticks);
}

this passes locally on my Win8 machine, and my Win10 VM (VMWare).

I just ran this on the Win10 GCE bot and get:

https://chromium-swarm-dev.appspot.com/user/task/2db3fd071e1b9a10

[ RUN      ] TimeTicks.TimeIncrements

d:\src\gclient\src\base\time\time_win_unittest.cc(245): error: Expected: (ticks2) > (ticks), actual: 1411312000 bogo-microseconds vs 1411312000 bogo-microseconds

[  FAILED  ] TimeTicks.TimeIncrements (1 ms)

Comment 7 by wfh@chromium.org, Mar 21 2016

Cc: brucedaw...@chromium.org
+bruce, the QPC-meister

Comment 8 by wfh@chromium.org, Mar 21 2016

I don't think it's valid to assume that two calls to TimeTicks::Now() won't return the same value. It's certainly valid to assume that the second will not be lower than the first.

I changed the DCHECK_GT to a DCHECK_GE but it still doesn't work because there's other parts in the code that assume that seem to check time_to_commit is non zero.

I could hack into PageLoadTracker::GetPageLoadMetricsInfo to return a very small time_to_commit if the two times are equal, but that seems somehow wrong.
Cc: shivanisha@chromium.org bmcquade@chromium.org
I suspect we'll just have to go through the code and remove these assumptions.

Hopefully we can mock a timer in the tests to guarantee ticks at least for the abort cases.

cc bmcquade, shivanisha who may have to pick this up as I'm OOO this week. I'll see if I can get to it tomorrow. Bryan, what do you think the best solution to this is? I think we can add a bool to the PageLoadTracker instead of using the proxy |time_to_commit|. 
The timer frequency that we get is highly variable. Common QPC frequencies are ~3.6 MHz or ~3.0 GHz (CPU speed) - so that's three orders of magnitude range.

On some machines we decide not to use QPC and the frequency could be much lower - as low as 64 Hz. I would hope that none of our test machines fall back to that low a frequency.

Since Sleep(0) returns immediately it seems possible that the test code in comment #6 could give the same value for both calls to TimeTicks::Now() if the CPU is running fast enough and QPC is 'only' running at 3.6 MHz.
Sorry, yes, we should be allowing equality as a valid value. It is totally valid for two calls to get the current time to return the same value. Our assertions should only be checking that time didn't go backwards. Most of our assertions do this correctly but it sounds like we made mistakes in a few places.
Regarding assuming a zero time_to_commit indicates that the page did not commit, that is unfortunately an assumption we've made in the code, that I see now is invalid. We'll need to add some additional state to distinguish between a commit having happened or not, rather than checking for a non-zero value.

Is there a Chromium Optional<T> class that can be used in our case as an Optional<base::TimeDelta>, which allows us to additionally capture whether or not a value is set?

Comment 13 by wfh@chromium.org, Mar 22 2016

I tried adding a bool and passing it around, but I still get failures so I probably did it wrong

https://chromium-swarm-dev.appspot.com/user/task/2db46b620742fe10

my CL (which doesn't work) is here:

https://codereview.chromium.org/1817263003
#12 - reg: Optional<T> it's being discussed / attempted for some time now, it looks getting promising but might take a little more time. See: https://codereview.chromium.org/1245163002/
Owner: csharrison@chromium.org
Status: Assigned (was: Available)
I'm taking a look at this now. Thanks for writing up that patch wfh@, I think it gets us most of the way there.

bmcquade@, I think passing a bool around is conceptually the best method.
Sounds good, thanks! Can you add a TODO to consider using Optional<> when it's available?
To clarify, the tests that are still failing rely on time advancing in more subtle ways (i.e. the code uses time advancing to make judgements about causality). Those cases can probably be fixed pretty easily.

Comment 18 by wfh@chromium.org, Mar 22 2016

we decided to shelve the current CL and move towards using base::Optional once https://codereview.chromium.org/1245163002/ lands and use this to determine whether the TimeTicks is valid or not.

to repro this locally, change IsBuggyAthlon() in base/time/time_win.cc to return true. This causes DCHECK failures and test failures.
Starting work on this using the almost stable work on Optional in https://codereview.chromium.org/1245163002/(though not landed yet)
The changes to the TimeDelta values in PageLoadExtraInfo are working and I will floating a patch soon. There are also some TimeDelta values in PageLoadTiming but at the moment we will not be able to use an Optional in the IPC between renderer and browser until the serialization framework is enhanced to handle Optionals.
I will put a TODO comment regarding that.
Owner: shivanisha@chromium.org
Project Member

Comment 22 by bugdroid1@chromium.org, Jun 21 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/271f0336503725749c86df213f6710c6423f9b5f

commit 271f0336503725749c86df213f6710c6423f9b5f
Author: shivanisha <shivanisha@chromium.org>
Date: Tue Jun 21 17:17:33 2016

This fix uses Optional TimeDeltas since they may be non existent instead
of using zero as a non-existent value.
It uses base::Optional class (https://codereview.chromium.org/1245163002/)
BUG= 596367 

Apart from the initial fix for this change which involves TimeDelta values in
PageLoadExtraInfo to be converted to use base::Optional, this CL has the following
3 more changes:
1. Converted the foreground event condition from event_time < background_time to
event_time <= background_time. This made all tests that have an event occuring
before calling WasHidden (which fills the background_time as Now()) to be
deterministic in assuming that the event will be logged in a foreground histogram
irrespective of the clock resolution. This is also a better logical choice since
the event has not been subjected to any background-specific functionality if it
occurred at the same time as backgrounding. This fixed the first failure reported
on this CL :CorePageLoadMetricsObserverTest.OnlyBackgroundLaterEvent.

The tests that have event occurring after background will be non
deterministic as they cannot be guaranteed to have event time strictly greater
than background time.
So I have added an interface in the test
framework that lets the test retrieve the value of PageLoadExtraInfo for a
committed load. I have not added the same for provisional loads/aborted loads but
could be added in future if need be.

2. For the failure resulting from inter process time tick skew, conditional
clamping is introduced only if the clock is not high resolution. The testing
strategy for that is that the existing unit tests should all pass on a low
resolution clock system.  Failures from this should result in a DCHECK getting
hit.
(I have not added specific tests for testing this because
it will be difficult to test all possible code paths in specific tests and mocking
inter process time tick skew would require mocking isHighResolution at the least.
The complexity is not worth it since all code paths will not get covered in a few
tests anyways.)

Review-Url: https://codereview.chromium.org/1837233002
Cr-Commit-Position: refs/heads/master@{#401022}

[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/aborts_page_load_metrics_observer.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/aborts_page_load_metrics_observer_unittest.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/core_page_load_metrics_observer.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/core_page_load_metrics_observer.h
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/core_page_load_metrics_observer_unittest.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/document_write_page_load_metrics_observer.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/from_gws_page_load_metrics_observer.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/from_gws_page_load_metrics_observer_unittest.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/https_engagement_page_load_metrics_observer.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/page_load_metrics_observer_test_harness.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/chrome/browser/page_load_metrics/observers/page_load_metrics_observer_test_harness.h
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/metrics_web_contents_observer.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/metrics_web_contents_observer.h
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/page_load_metrics_observer.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/page_load_metrics_observer.h
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/page_load_metrics_util.cc
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/browser/page_load_metrics_util.h
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/components/page_load_metrics/common/page_load_timing.h
[modify] https://crrev.com/271f0336503725749c86df213f6710c6423f9b5f/tools/metrics/histograms/histograms.xml

Status: Verified (was: Assigned)

Sign in to add a comment