New issue
Advanced search Search tips

Issue 907379 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 3
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug
Flaky-Test: RlzLibTest.DelayedInitOnlyNoFirstRun



Sign in to add a comment

RlzLibTest.DelayedInitOnlyNoFirstRun is flaky

Project Member Reported by Findit, Nov 21

Issue description

Reverting Culprit (70.0% confidence): r609428 at https://chromium-review.googlesource.com/c/chromium/src/+/1345707
Components: Internals>Core
Owner: wzang@chromium.org
Status: Assigned (was: Untriaged)
Cc: rogerta@chromium.org
Hi Roger, do you have suggestions on how to fix the test? The error message is: 
Expected: (event_name) in (cgi), actual: 'CAI' not in 'events=CBI,CCI,CBF'

https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8929278016104041520/+/steps/components_unittests/0/logs/RlzLibTest.DelayedInitOnlyNoFirstRun/0

My "guess" for the reason of flakiness is: |RecordProductEvent| may fail to acquire the |ScopedRlzValueStoreLock| when recording the CAI event, because the lock is being taken by |UpdateExistingAccessPointRlz|. In tests, the delay for |ScheduleDelayedInit| is set to zero, so it's possible for this race condition to happen.

I don't think is's reproducible in production, because it will be using a non-zero delay? 
Can you test your theory by using a very short delay instead of zero?  Like 1 second?
Cc: hayato@chromium.org
I'm not able to reproduce the failure locally using the same build args of the bots. I've tried 100+ times and they all passed.

It's hard to tell if the CL is really the culprit.
hayato@, if we can't reproduce the flakiness, is it okay to reland the CL as is? 

Otherwise, we'll have to apply a speculative fix, for example, add a 1-second delay.
> hayato@, if we can't reproduce the flakiness, is it okay to reland the CL as is? 

As chromium sheriff, I don't think it is a good idea, however, it might be okay if this flakiness is cased by other reasons than the CL itself. We are not sure.

> Otherwise, we'll have to apply a speculative fix, for example, add a 1-second delay.

Yeah, that sounds worth trying. Could you try that? If the flakiness happens again, FindIt bot will tell us, and we can notice that.
Labels: -Sheriff-Chromium
Project Member

Comment 9 by bugdroid1@chromium.org, Dec 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1d497322e9fd8d9ab1da00767e30c618186151c6

commit 1d497322e9fd8d9ab1da00767e30c618186151c6
Author: Wenzhao Zang <wzang@chromium.org>
Date: Mon Dec 03 18:43:40 2018

cros: Fix DCHECK for rlz_lib::UpdateExistingAccessPointRlz

In addition to fixing the DCHECK, this CL contains a speculative fix
for flaky test ( crbug.com/907379 ). The guess for the flakiness is
(not able to repro the failure locally though):

|RecordProductEvent| may fail to acquire the |ScopedRlzValueStoreLock|
when recording the CAI event, because the lock is being taken by
|UpdateExistingAccessPointRlz|. In tests, the delay for
|ScheduleDelayedInit| is set to zero, so it's possible for this race
condition to happen. But in production, the delay is non-zero.

I don't think we're losing test coverage by disabling
|UpdateExistingAccessPointRlz| here because the call is a no-op in
tests, and another test (RlzLibTest.UpdateExistingAccessPointRlz) is
testing its implementation.

Bug:  906293 ,  907379 
Change-Id: I208919f22edc09fc1a739a1efe1909a71e5a2730
Reviewed-on: https://chromium-review.googlesource.com/c/1353990
Reviewed-by: Roger Tawa <rogerta@chromium.org>
Commit-Queue: Wenzhao (Colin) Zang <wzang@chromium.org>
Cr-Commit-Position: refs/heads/master@{#613172}
[modify] https://crrev.com/1d497322e9fd8d9ab1da00767e30c618186151c6/chrome/browser/rlz/chrome_rlz_tracker_delegate.cc
[modify] https://crrev.com/1d497322e9fd8d9ab1da00767e30c618186151c6/chrome/browser/rlz/chrome_rlz_tracker_delegate.h
[modify] https://crrev.com/1d497322e9fd8d9ab1da00767e30c618186151c6/components/rlz/rlz_tracker.cc
[modify] https://crrev.com/1d497322e9fd8d9ab1da00767e30c618186151c6/components/rlz/rlz_tracker_delegate.h
[modify] https://crrev.com/1d497322e9fd8d9ab1da00767e30c618186151c6/components/rlz/rlz_tracker_unittest.cc

Status: Fixed (was: Assigned)
Project Member

Comment 11 by bugdroid1@chromium.org, Dec 4

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/52f966107c7d77bc6121cfb843bcd8649801cf0f

commit 52f966107c7d77bc6121cfb843bcd8649801cf0f
Author: Jérôme Lebel <jlebel@chromium.org>
Date: Tue Dec 04 06:19:46 2018

Revert "cros: Fix DCHECK for rlz_lib::UpdateExistingAccessPointRlz"

This reverts commit 1d497322e9fd8d9ab1da00767e30c618186151c6.

Reason for revert: This patch breaks Chrome iOS
../../ios/chrome/browser/ios_chrome_main_parts.mm:175:56: error: allocating an object of abstract class type 'RLZTrackerDelegateImpl'
  rlz::RLZTracker::SetRlzDelegate(base::WrapUnique(new RLZTrackerDelegateImpl));


Original change's description:
> cros: Fix DCHECK for rlz_lib::UpdateExistingAccessPointRlz
> 
> In addition to fixing the DCHECK, this CL contains a speculative fix
> for flaky test ( crbug.com/907379 ). The guess for the flakiness is
> (not able to repro the failure locally though):
> 
> |RecordProductEvent| may fail to acquire the |ScopedRlzValueStoreLock|
> when recording the CAI event, because the lock is being taken by
> |UpdateExistingAccessPointRlz|. In tests, the delay for
> |ScheduleDelayedInit| is set to zero, so it's possible for this race
> condition to happen. But in production, the delay is non-zero.
> 
> I don't think we're losing test coverage by disabling
> |UpdateExistingAccessPointRlz| here because the call is a no-op in
> tests, and another test (RlzLibTest.UpdateExistingAccessPointRlz) is
> testing its implementation.
> 
> Bug:  906293 ,  907379 
> Change-Id: I208919f22edc09fc1a739a1efe1909a71e5a2730
> Reviewed-on: https://chromium-review.googlesource.com/c/1353990
> Reviewed-by: Roger Tawa <rogerta@chromium.org>
> Commit-Queue: Wenzhao (Colin) Zang <wzang@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#613172}

TBR=rogerta@chromium.org,wzang@chromium.org

Change-Id: Ie7888c8175087b921972a86e475415032c58db07
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  906293 ,  907379 
Reviewed-on: https://chromium-review.googlesource.com/c/1360274
Reviewed-by: Jérôme Lebel <jlebel@chromium.org>
Commit-Queue: Jérôme Lebel <jlebel@chromium.org>
Cr-Commit-Position: refs/heads/master@{#613447}
[modify] https://crrev.com/52f966107c7d77bc6121cfb843bcd8649801cf0f/chrome/browser/rlz/chrome_rlz_tracker_delegate.cc
[modify] https://crrev.com/52f966107c7d77bc6121cfb843bcd8649801cf0f/chrome/browser/rlz/chrome_rlz_tracker_delegate.h
[modify] https://crrev.com/52f966107c7d77bc6121cfb843bcd8649801cf0f/components/rlz/rlz_tracker.cc
[modify] https://crrev.com/52f966107c7d77bc6121cfb843bcd8649801cf0f/components/rlz/rlz_tracker_delegate.h
[modify] https://crrev.com/52f966107c7d77bc6121cfb843bcd8649801cf0f/components/rlz/rlz_tracker_unittest.cc

Project Member

Comment 12 by bugdroid1@chromium.org, Dec 6

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0d2a5a88a940d3df25c6a074824adb6a901e099f

commit 0d2a5a88a940d3df25c6a074824adb6a901e099f
Author: Jérôme Lebel <jlebel@chromium.org>
Date: Thu Dec 06 10:13:37 2018

Reland "cros: Fix DCHECK for rlz_lib::UpdateExistingAccessPointRlz"

This is a reland of 1d497322e9fd8d9ab1da00767e30c618186151c6

This patch also includes implementation for:
RLZTrackerDelegateImpl::ShouldUpdateExistingAccessPointRlz()

Original change's description:
> cros: Fix DCHECK for rlz_lib::UpdateExistingAccessPointRlz
>
> In addition to fixing the DCHECK, this CL contains a speculative fix
> for flaky test ( crbug.com/907379 ). The guess for the flakiness is
> (not able to repro the failure locally though):
>
> |RecordProductEvent| may fail to acquire the |ScopedRlzValueStoreLock|
> when recording the CAI event, because the lock is being taken by
> |UpdateExistingAccessPointRlz|. In tests, the delay for
> |ScheduleDelayedInit| is set to zero, so it's possible for this race
> condition to happen. But in production, the delay is non-zero.
>
> I don't think we're losing test coverage by disabling
> |UpdateExistingAccessPointRlz| here because the call is a no-op in
> tests, and another test (RlzLibTest.UpdateExistingAccessPointRlz) is
> testing its implementation.
>
> Bug:  906293 ,  907379 
> Change-Id: I208919f22edc09fc1a739a1efe1909a71e5a2730
> Reviewed-on: https://chromium-review.googlesource.com/c/1353990
> Reviewed-by: Roger Tawa <rogerta@chromium.org>
> Commit-Queue: Wenzhao (Colin) Zang <wzang@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#613172}

Bug:  906293 ,  907379 ,  911439 
Change-Id: Ic1d9f85bc3714f890d2dc7fcc1d070d0711baec0
Reviewed-on: https://chromium-review.googlesource.com/c/1360692
Commit-Queue: Jérôme Lebel <jlebel@chromium.org>
Reviewed-by: Wenzhao (Colin) Zang <wzang@chromium.org>
Reviewed-by: Rohit Rao <rohitrao@chromium.org>
Reviewed-by: Roger Tawa <rogerta@chromium.org>
Reviewed-by: Jérôme Lebel <jlebel@chromium.org>
Cr-Commit-Position: refs/heads/master@{#614303}
[modify] https://crrev.com/0d2a5a88a940d3df25c6a074824adb6a901e099f/chrome/browser/rlz/chrome_rlz_tracker_delegate.cc
[modify] https://crrev.com/0d2a5a88a940d3df25c6a074824adb6a901e099f/chrome/browser/rlz/chrome_rlz_tracker_delegate.h
[modify] https://crrev.com/0d2a5a88a940d3df25c6a074824adb6a901e099f/components/rlz/rlz_tracker.cc
[modify] https://crrev.com/0d2a5a88a940d3df25c6a074824adb6a901e099f/components/rlz/rlz_tracker_delegate.h
[modify] https://crrev.com/0d2a5a88a940d3df25c6a074824adb6a901e099f/components/rlz/rlz_tracker_unittest.cc
[modify] https://crrev.com/0d2a5a88a940d3df25c6a074824adb6a901e099f/ios/chrome/browser/rlz/rlz_tracker_delegate_impl.cc
[modify] https://crrev.com/0d2a5a88a940d3df25c6a074824adb6a901e099f/ios/chrome/browser/rlz/rlz_tracker_delegate_impl.h

Sign in to add a comment