New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 165171 link

Starred by 71 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Mar 2013
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug

Restricted
  • Only users with EditIssue permission may comment.



Sign in to add a comment

Incorrect SyncDataType parsing for throttled types causes chrome to crash

Project Member Reported by a...@chromium.org, Dec 10 2012

Issue description

Gmail is down; everyone's Chrome is now crashing in the sync thread.

My crash is

https://crash.corp.google.com/reportdetail?reportid=e89d2ec29d56fac9

but it hasn't symbolized yet.
 
Full crash log from a user: https://gist.github.com/4251938

Comment 2 by palmer@chromium.org, Dec 10 2012

Cc: palmer@chromium.org

Comment 3 by rsesek@chromium.org, Dec 10 2012

Thread 23 Crashed:: Chrome_SyncThread
0   libsystem_kernel.dylib        	0x997db9c6 __pthread_kill + 10
1   libsystem_c.dylib             	0x9bb6df78 pthread_kill + 106
2   libsystem_c.dylib             	0x9bb5ebdd abort + 167
3   libc++abi.dylib               	0x9cc5c921 abort_message + 94
4   libc++abi.dylib               	0x9cc5a1bc default_terminate() + 36
5   libc++abi.dylib               	0x9cc5a1fe safe_handler_caller(void (*)()) + 15
6   libc++abi.dylib               	0x9cc5a268 std::terminate() + 23
7   libc++abi.dylib               	0x9cc5b2a0 __cxa_throw + 112
8   libstdc++.6.dylib             	0x962a485a std::__throw_out_of_range(char const*) + 121
9   com.google.Chrome.framework   	0x02a3ba8b syncer::SyncerProtoUtil::PostClientToServerMessage (sync_protocol_error.h:65)
10  com.google.Chrome.framework   	0x02a2e001 syncer::BuildAndPostCommits (commit.cc:103)
11  com.google.Chrome.framework   	0x02a3a959 syncer::Syncer::SyncShare (syncer.cc:174)
12  com.google.Chrome.framework   	0x02a365cd syncer::SyncSchedulerImpl::DoSyncSessionJob (linked_ptr.h:115)
13  com.google.Chrome.framework   	0x02a3a225 base::internal::Invoker<2, base::internal::BindState<base::internal::RunnableAdapter<void (syncer::SyncSchedulerImpl::*)(const syncer::SyncSchedulerImpl::SyncSessionJob &)>, void (syncer::SyncSchedulerImpl *, const syncer::SyncSchedulerImpl::SyncSessionJob &), void (base::WeakPtr<syncer::SyncSchedulerImpl>, syncer::SyncSchedulerImpl::SyncSessionJob)>, void (syncer::SyncSchedulerImpl *, const syncer::SyncSchedulerImpl::SyncSessionJob &)>::Run (bind_internal.h:1256)
14  com.google.Chrome.framework   	0x00adab36 MessageLoop::RunTask (stl_vector.h:400)
15  com.google.Chrome.framework   	0x00adb110 MessageLoop::DoDelayedWork (message_loop.cc:482)
16  com.google.Chrome.framework   	0x00add69d base::MessagePumpDefault::Run (message_pump_default.cc:33)
17  com.google.Chrome.framework   	0x00ada600 MessageLoop::RunHandler (message_loop.cc:401)
18  com.google.Chrome.framework   	0x00aec3e1 base::RunLoop::Run (run_loop.cc:84)
19  com.google.Chrome.framework   	0x00ada37a MessageLoop::Run (message_loop.cc:308)
20  com.google.Chrome.framework   	0x00b01e11 base::Thread::Run (thread.cc:134)
21  com.google.Chrome.framework   	0x00b01e9b base::Thread::ThreadMain (thread.cc:170)
22  com.google.Chrome.framework   	0x00afe8e9 base::::ThreadFunc (platform_thread_posix.cc:65)
23  libsystem_c.dylib             	0x9bb6bed9 _pthread_start + 335
24  libsystem_c.dylib             	0x9bb6f6de thread_start + 34

Related to bug 163267?

Comment 4 by rsesek@chromium.org, Dec 10 2012

Issue 165176 has been merged into this issue.

Comment 5 by rsesek@chromium.org, Dec 10 2012

Issue 165161 has been merged into this issue.

Comment 6 by rsesek@chromium.org, Dec 10 2012

Issue 165163 has been merged into this issue.

Comment 7 by k...@google.com, Dec 10 2012

Owner: tim@chromium.org
Status: Assigned

Comment 8 by k...@google.com, Dec 10 2012

Cc: tmc...@chromium.org
Labels: ReleaseBlock-Stable Mstone-24
I think this affected everything, but we should get a fix ASAP.

Comment 9 by akalin@chromium.org, Dec 10 2012

avi, your crash is now symbolized, but it seems unrelated.
Issue 165166 has been merged into this issue.
I noticed that my crash (with same trace as comment 3) didn't have a chrome://crashes entry. (I had to see it in Console.app).

Was that the case for others too? If so, we should probably have a separate bug to investigate that. (Unless the Google outage also affected the crash servers...)

Comment 12 by a...@chromium.org, Dec 10 2012

akalin: yeah. chrome://crash and chrome://crashes are too close :(

Comment 13 Deleted

There's  bug 165168 , which is an independent bug that is masking this one.
Labels: Restrict-AddIssueComment-Commit
Issue 165156 has been merged into this issue.
Issue 165152 has been merged into this issue.
 Issue 165157  has been merged into this issue.
 Issue 165162  has been merged into this issue.

Comment 20 by tim@chromium.org, Dec 10 2012

https://crash.corp.google.com/reportdetail?reportid=13a4c173601caec6 might be more interesting... still trying to get a handle on what's happening.  
So I think I figured out a problem, at least for the stack for 13a4c173601caec6.


syncer::`anonymous namespace'::ConvertErrorPBToLocalType() calls GetModelTypeFromSpecificsFieldNumber, which tries to convert a protobuf field number to an enum value.  If the field number is unknown, it returns UNSPECIFIED == 0.  But we use an EnumSet template instantiation which is defined only from FIRST_REAL_MODEL_TYPE (non-zero), i.e. enums in the set are converted to offsets from FIRST_REAL_MODEL_TYPE and stored in a bitset.  So if UNSPECIFIED is passed to the set, it calculates a negative offset and tries to store that in a bitset, resulting in the out_of_range error.

So a few problems:

1) That code path has NOTREACHED(), so we don't have unit test coverage for this.
2) We should use a separate function for "untrusted" input and handle the case appropriately.
3) Why is the server sending down throttled types that are unknown to the client?  Possibly because these are new types that aren't known to the client yet (esp. stable ones).

Hopefully, we can figure out a server-side fix for #3 so we can stop triggering this code path on the client, and then we can fix the client after that.

Comment 22 by tim@chromium.org, Dec 10 2012

We still don't know (3).  It's interesting that the crash *stopped* happening - something decided to change the set of throttled datatypes sent down to clients specifically when the outage occurred.

Comment 23 by zea@chromium.org, Dec 10 2012

We've confirmed the server is (for some reason) throttling all types in certain cases, including ones the stable client does not yet know about. So we should be able to at least do something server-side to prevent this from occurring.

Fred's point that the client isn't handling the responses correctly though still stands.
Owner: akalin@chromium.org
Status: Started
We've narrowed down the server-side problem.  I'll repurpose this bug for the client-side fix.
akalin: Per the internal email thread, this crash signature has been present for at least a few days before this outage. Is this crash possible even if the server is in fact reachable (sounds to be so).

Comment 26 by tim@chromium.org, Dec 10 2012

Yes, in fact this crash would *not* happen if the sync server itself was unreachable.  It's due to a backend service that sync servers depend on becoming overwhelmed, and sync servers responding to that by telling all clients to throttle all data types (including data types that the client may not understand yet).

Comment 27 by tim@chromium.org, Dec 11 2012

Summary: Incorrect SyncDataType parsing for throttled types causes chrome to crash (was: When Gmail is down, Chrome Sync crashes Chrome)
Renaming this bug as it really has nothing to do with GMail specifically.
To clarify:

- Chrome Sync Server relies on a backend infrastructure component to enforce quotas on per-datatype sync traffic.
- That quota service experienced traffic problems today due to a faulty load balancing configuration change.
- That change was to a core piece of infrastructure that many services at Google depend on. This means other services may have been affected at the same time, leading to the confounding original title of this bug.
- Because of the quota service failure, Chrome Sync Servers reacted too conservatively by telling clients to throttle "all" data types, without accounting for the fact that not all client versions support all data types.

The crash is due to faulty logic responsible for handling "throttled" data types on the client when the data types are unrecognized.
Project Member

Comment 28 by bugdroid1@chromium.org, Dec 11 2012

The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=172232

------------------------------------------------------------------------
r172232 | akalin@chromium.org | 2012-12-11T02:05:07.754724Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/engine/syncer_proto_util.cc?r1=172232&r2=172231&pathrev=172232
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/internal_api/public/base/model_type.h?r1=172232&r2=172231&pathrev=172232
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/syncable/model_type_unittest.cc?r1=172232&r2=172231&pathrev=172232
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/syncable/model_type.cc?r1=172232&r2=172231&pathrev=172232
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/engine/store_timestamps_command.cc?r1=172232&r2=172231&pathrev=172232

[Sync] Handle invalid specifics field numbers gracefully

Change GetModelTypeFromSpecificsFieldNumber() to not NOTREACHED() on an
unknown field number.  Instead, have callers compare the return value
to UNSPECIFIED and handle that case.

BUG= 165171 


Review URL: https://chromiumcodereview.appspot.com/11490018
------------------------------------------------------------------------
We'll make sure that this behaves in tomorrow's canary and then we'll discuss whether to merge this to M24.
Issue 163267 has been merged into this issue.

Comment 31 by dharani@google.com, Dec 11 2012

akalin: r172232 is in 25.0.1357.0 canary. could you please verify it?
Labels: Merge-Requested
I can't find a crash with 1357, and I can find some for 1354, so I think we're ready to merge this!
Cc: timsteele@google.com apps-tses-bugs@chromium.org kareng@google.com
Issue 165308 has been merged into this issue.

Comment 34 by dharani@google.com, Dec 11 2012

Labels: -Merge-Requested Merge-Approved
Project Member

Comment 35 by bugdroid1@chromium.org, Dec 11 2012

Labels: -Merge-Approved merge-merged-1312
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=172389

------------------------------------------------------------------------
r172389 | akalin@chromium.org | 2012-12-11T20:20:02.994460Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/branches/1312/src/sync/internal_api/public/base/model_type.h?r1=172389&r2=172388&pathrev=172389
   M http://src.chromium.org/viewvc/chrome/branches/1312/src/sync/syncable/model_type_unittest.cc?r1=172389&r2=172388&pathrev=172389
   M http://src.chromium.org/viewvc/chrome/branches/1312/src/sync/syncable/model_type.cc?r1=172389&r2=172388&pathrev=172389
   M http://src.chromium.org/viewvc/chrome/branches/1312/src/sync/engine/store_timestamps_command.cc?r1=172389&r2=172388&pathrev=172389
   M http://src.chromium.org/viewvc/chrome/branches/1312/src/sync/engine/syncer_proto_util.cc?r1=172389&r2=172388&pathrev=172389

Merge 172232
> [Sync] Handle invalid specifics field numbers gracefully
> 
> Change GetModelTypeFromSpecificsFieldNumber() to not NOTREACHED() on an
> unknown field number.  Instead, have callers compare the return value
> to UNSPECIFIED and handle that case.
> 
> BUG= 165171 
> 
> 
> Review URL: https://chromiumcodereview.appspot.com/11490018

TBR=akalin@chromium.org
Review URL: https://codereview.chromium.org/11533014
------------------------------------------------------------------------
Given that we fixed this server-side, do we need to merge this onto M23, too? (My gut feeling is 'no'.)
Project Member

Comment 37 by bugdroid1@chromium.org, Dec 13 2012

The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=172816

------------------------------------------------------------------------
r172816 | akalin@chromium.org | 2012-12-13T04:28:33.853046Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/sync.gyp?r1=172816&r2=172815&pathrev=172816
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/engine/store_timestamps_command.h?r1=172816&r2=172815&pathrev=172816
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/engine/syncer_proto_util_unittest.cc?r1=172816&r2=172815&pathrev=172816
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/engine/syncer_proto_util.cc?r1=172816&r2=172815&pathrev=172816
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/engine/syncer_proto_util.h?r1=172816&r2=172815&pathrev=172816
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/engine/DEPS?r1=172816&r2=172815&pathrev=172816
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/internal_api/public/base/model_type.h?r1=172816&r2=172815&pathrev=172816
   A http://src.chromium.org/viewvc/chrome/trunk/src/sync/engine/store_timestamps_command_unittest.cc?r1=172816&r2=172815&pathrev=172816
   M http://src.chromium.org/viewvc/chrome/trunk/src/sync/engine/store_timestamps_command.cc?r1=172816&r2=172815&pathrev=172816

[Sync] Add tests for invalid specifics field number handling

This is a follow up to r172232.

Change NOTREACHED() to DLOG(WARNING), since the server sending down
unknown/invalid field numbers is a valid event.

Add tests for the code that uses GetModelTypeFromSpecificsFieldNumber().

Clean up some code in sync/engine/ a bit.

BUG= 165171 


Review URL: https://chromiumcodereview.appspot.com/11485019
------------------------------------------------------------------------

Comment 38 by dharani@google.com, Dec 13 2012

Labels: -ReleaseBlock-Stable
Removing release blocker as we have a possible fix merged in M24.
Cc: -palmer@chromium.org
Project Member

Comment 40 by bugdroid1@chromium.org, Mar 10 2013

Labels: -Area-Internals -Feature-Sync -Mstone-24 Cr-Services-Sync Cr-Internals M-24

Comment 41 by laforge@google.com, Mar 13 2013

Labels: -Restrict-AddIssueComment-Commit Restrict-AddIssueComment-EditIssue
Status: Fixed
This was resolved a while ago.

Sign in to add a comment