New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 645137 link

Starred by 6 users

Issue metadata

Status: Started
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Feature



Sign in to add a comment

Add tests to prevent cross-version crashes

Project Member Reported by abodenha@chromium.org, Sep 8 2016

Issue description

 Bug 643408  was triggered when incompatible data from 53 was synced to devices running 52.

We need to figure out what tests need to be added to prevent a repeat of this sort of failure.

In general, unexpected data from a server should not lead to a crash. Perhaps some sort of fuzz testing would be appropriate here?
 

Comment 1 by zea@chromium.org, Sep 8 2016

Could you explain a bit more about how the crash happened? In general, it's possible for sync data from any version to sync to any other version. As such, any consumer of sync data needs to think about forwards and backwards compatibility.

Fuzz testing certainly might be an option. We haven't used it in the past for Sync though (yet), although we've discussed it.

Comment 2 by x...@chromium.org, Sep 8 2016

This is the crash report: https://crash.corp.google.com/browse?stbtiq=3b6cc1a500000000. And then click into the "Source" to see the code here: https://screenshot.googleplex.com/8ze8mHz2kre. It's possible that |sync_item|'s ordinal is invalid so it might crash here. 

It's still possible that |sync_item|'s ordinal is invalid on ToT but we have if-clause to guard that to ensure no crash happens, see code here: https://cs.chromium.org/chromium/src/chrome/browser/ui/app_list/app_list_syncable_service.cc?rcl=0&l=898

The reason |sync_item|'s ordinal might be invalid is because in this CL: https://codereview.chromium.org/2055553004, it refactors the shelf pin model to be sync-based model while before that it was done via prefs. So when user switches to a older version it might cause problem. 

Comment 3 by x...@chromium.org, Sep 8 2016

So actually it's not caused by sync data, but because there is no guard for a situation that theoretically would never happen on M52. But switching from a newer version to an older version triggered this...
Cc: khmel@chromium.org
 bug 692802  was a result of the same sort of thing.

If we're getting data from an external source and letting it drive code there should be nothing that can be expressed in that data that could lead to a crash.

The answer to situations like this is fuzz testing.  Any of our code that acts on untrusted data should be tested against a wide range of "impossible" scenarios to make sure it fails in safe ways.

xdai@ & khmel@, minch@ just joined our team and is currently in orientation.  Can you work with them to try to move this forward?

Comment 5 by dymp...@gmail.com, Mar 8 2017

Abodenha@, should we start warning users?

comment 3: But switching from a newer version to an older version triggered this..
Comment 3 is about  Bug 643408  which should be dealt with by forcing updates during OOBE.   Bug 692802  seems much less common. We'll monitor and see if we need to take further action there.

In general, migrating between versions or using multiple devices on different versions SHOULD be a safe thing to do.  I don't want to warn users away from supported uses. I want to fix our testing so that we don't encounter problems like this in the future.
Owner: minch@chromium.org
Owner: abodenha@chromium.org
This task is too big/deep for a noogler. I'll need to free up someone more senior to dig into it.
Owner: rcui@chromium.org

Comment 10 by rcui@chromium.org, May 31 2017

Looking at this, I think we'd want to:

1. Add some unit tests for the particular regression we saw with  Bug 643408 .
2. Add unit tests that test with fuzz/invalid data.
3. Look into adding some sync integration tests to the autotest test suite.
Sounds like a good plan. 2 was what I was specifically thinking of when I filed this.
Project Member

Comment 12 by sheriffbot@chromium.org, Jun 1 2017

Labels: Hotlist-Recharge-BouncingOwner
Owner: ----
Status: Untriaged (was: Assigned)
The assigned owner "rcui@chromium.org" is not able to receive e-mails, please re-triage.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Owner: abodenha@chromium.org
Status: Assigned (was: Untriaged)
rcui@ can you look into what's up with your chromium.org address? I'll take this back for now.

Comment 14 by rcui@chromium.org, Jun 1 2017

Owner: rcui@chromium.org

Comment 15 by rcui@chromium.org, Jun 1 2017

Labels: -Hotlist-Recharge-BouncingOwner

Comment 16 by rcui@chromium.org, Jun 1 2017

The emails that bounced were sent before I had reinstated my @chromium.org account.  I've cleared the bounced-email status on my account and am assigning this to myself again.
Cool. Thanks!

Comment 18 by rcui@chromium.org, Jun 6 2017

Met with khmel@, and thanks to that I'm in a much better position to start digging into the code.  From the looks of it, we'd want to write some unit tests to validate the robustness of the code in app_list_syncable_service.cc which is where the original crash occurred and is what provides the interface to main sync mechanism.  I've pulled down the chromium source and am starting to poke around ..

Comment 19 by rcui@chromium.org, Jun 6 2017

Status: Started (was: Assigned)

Comment 20 by rcui@chromium.org, Jun 12 2017

I'm close to wrapping up some unit tests that validate how the basic sync (merge/process) functionality handles bad data.  However I have concerns about both the scalability and maintainability of this approach as well as its effectiveness in detecting the regressions we care about.

To actually prevent the issue we saw here (running N and N-1 consecutively causes a crash) from happening again I think what we really need are integration-level tests that run older builds against data generated by new builds (up to the last X versions, say).

I'll be looking at our integration test systems (i.e., autotest, browser test) this wee k in order to come up with a test proposal.
Project Member

Comment 21 by bugdroid1@chromium.org, Jun 24 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a2a65e1fa1e5c88feb43b1d0b3264c3764d0481b

commit a2a65e1fa1e5c88feb43b1d0b3264c3764d0481b
Author: rcui <rcui@chromium.org>
Date: Sat Jun 24 00:37:07 2017

Tests that validate the merge and process functionality of
AppListSyncableService with various kinds of good and bad data.

BUG=645137

Review-Url: https://codereview.chromium.org/2952463002
Cr-Commit-Position: refs/heads/master@{#482092}

[modify] https://crrev.com/a2a65e1fa1e5c88feb43b1d0b3264c3764d0481b/chrome/browser/ui/app_list/app_list_syncable_service_unittest.cc

Labels: Not-Touch-Friendly-Launcher

Comment 23 by s...@chromium.org, Jan 17 2018

Labels: SyncHandoff2018 Sync-Triaged
What is the latest status on this? Is this still relevant? Do we need another owner?
Components: -UI>Shell>Shelf
This doesn't look like a shelf bug per se, I'll just leave the "sync" component.
Components: UI>Shell>Launcher
Still relevant. 

The issue that triggered this was because version 53 made changes to the shelf and launcher(folders IIRC). If the user logged into 53 and used the new features and then signed into a device on M52 the new data would sync.

Sync worked correctly, but when it handed that data to the shelf, kaboom.

So I'd argue that this ISN'T a sync issue. It's more a matter of how we sanitize the data we get from sync.

This seems like a perfect job for fuzz tests.

Sign in to add a comment