New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 661597 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug-Regression



Sign in to add a comment

VPN framework getting in broken state with 55.x/56.x

Reported by pdavis2...@gmail.com, Nov 2 2016

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36
Platform: 8941.0.0 (Official Build) dev-channel swanky

Example URL:

Steps to reproduce the problem:
1. Use Cisco AnyConnect application, create VPN connection to a host by IP address. 
2. When on untrusted cert screen, prior to clicking accept, disable WiFi
3. Click accept
4. Enable WiFi
5. Attempt another VPN connection to same IP address
6. VPN framework is now in messed up state, this also yields a: org.chromium.flimflam.Error.InvalidArguments: Unexpected call.
7. Next connection attempt hangs at Connecting
8. Other connection attempts fail and sometimes are not even going to the correct connection entry

What is the expected behavior?
VPN Framework should properly connect again to this IP address.

What went wrong?
VPN Framework seems to get in messed up state following these instructions. Seeing this on both the 55.x and 56.x builds that are current in the Beta and Dev channels, but not on the 54.x build on the Stable Channel.

On my Toshiba Chromebook I am also seeing the Chrome browser "crash" (screen goes black for a few seconds) instead of the flimflam error at times.

Did this work before? Yes 54.x (Current version on Stable channel)

Chrome version: 56.0.2903.0  Channel: dev
OS Version: 56.0.2903.0 dev (64-bit)
Flash Version:
 

Comment 1 by dskaram@google.com, Nov 2 2016

Labels: ReleaseBlock-Stable
Owner: cernekee@chromium.org
+Kevin to assign

Comment 2 by dskaram@google.com, Nov 3 2016

Labels: M-55
Components: -Internals>Network Internals>Network>VPN
Any update on this? This is marked as a 55 stable blocker, and we are nearing 55 stable. 
No updates yet.  It looks like this only happens when somebody toggles the wifi state while a third party VPN is in the middle of connecting?
There are probably other ways to reproduce it as I was seeing it intermittently. I just happened to find this way as a reliable way to reproduce the broken behavior 100%. Please get this regression fixed before releasing this to Stable.
Any chance we can get a fix in the next two days? If so we can make the targeted RC, if not we may have to punt or delay. 

Has this been reproduced locally?
I am inclined to say "punt."

I'm running M56 on one of my "daily drivers" with a third-party VPN always connected, and never ran into it.

Trying the repro steps listed in the OP, I do see a spurious disconnect notification on the initial connection, but the connection attempt resumes when the LAN connection is re-enabled.  It would be helpful to see the net.log / messages / ui.LATEST / chrome logs from a failing system to better understand what is happening.

I also see a weird MTU error connecting to ocserv.  The AnyConnect client claims that the server is passing an MTU less than 1280, but I think the actual MTU is ~1341.

FWIW I'm seeing frequent Chrome browser crashes running ToT built Saturday morning.  Most of the time these seem to happen when I click the system menu, but one time it happened when I tried to select a VPN to connect to.  Will resync my tree and investigate if the problem continues.
Kevin - I would ask again that you look to fix this regression prior to updating Stable. This problem can be easily reproduced and is not present in the current Stable build. I will be glad to walk you through it if you have any trouble.  
If you tell me exactly how you want me to capture additional logs, I will be glad to send you those.

> If you tell me exactly how you want me to capture additional logs, I will be glad to send you those.

Could you please reproduce the bug, then hit alt-shift-I to file feedback right afterward?  Let me know what feedback comment (or Google account) to search for.

Comment 12 Deleted

Let me know if you need anything else. I put your name in the comments and emailed you the email address I submitted it from.
Cc: ejcaruso@chromium.org kirtika@chromium.org
Thanks, here is what I see:

Service 0 = wifi
Service 18 = vpn

Wifi comes up:

2016-11-29T10:26:14.063199-05:00 INFO shill[879]: [INFO:service.cc(404)] Service 0: state Idle -> Associating
2016-11-29T10:26:14.063256-05:00 INFO shill[879]: [INFO:service.cc(404)] Service 3: state Associating -> Idle
2016-11-29T10:26:16.933700-05:00 INFO shill[879]: [INFO:service.cc(404)] Service 0: state Associating -> Configuring
2016-11-29T10:26:17.583885-05:00 INFO shill[879]: [INFO:service.cc(404)] Service 0: state Configuring -> Connected
2016-11-29T10:26:17.741867-05:00 INFO shill[879]: [INFO:portal_detector.cc(130)] Portal detection completed attempt 1 with phase==Content, status==Success, failures in content==0
2016-11-29T10:26:17.742067-05:00 INFO shill[879]: [INFO:service.cc(404)] Service 0: state Connected -> Online

VPN connection starts, but setParameters isn't called yet:

2016-11-29T10:26:57.894325-05:00 INFO shill[879]: [INFO:vpn_driver.cc(277)] Schedule VPN connect timeout: 300 seconds.
2016-11-29T10:26:57.894352-05:00 INFO shill[879]: [INFO:service.cc(404)] Service 18: state Idle -> Configuring
2016-11-29T10:26:57.894375-05:00 INFO shill[879]: [INFO:manager.cc(1487)] Service 18 updated; state: Configuring failure Unknown

Wifi is disabled by the user:

2016-11-29T10:27:05.331141-05:00 INFO shill[879]: [INFO:service.cc(404)] Service 0: state Online -> Idle

VPN aborts:

2016-11-29T10:27:05.429579-05:00 INFO shill[879]: [INFO:manager.cc(1731)] Default physical service: 0 (not connected)
2016-11-29T10:27:05.430298-05:00 INFO shill[879]: [INFO:device.cc(1901)] Already in desired enable state.
2016-11-29T10:27:05.430479-05:00 INFO shill[879]: [INFO:device.cc(245)] Device destructed: tun0 index 3
2016-11-29T10:27:05.438697-05:00 INFO shill[879]: [INFO:service.cc(404)] Service 18: state Configuring -> Failure
2016-11-29T10:27:05.438727-05:00 INFO shill[879]: [INFO:service.cc(989)] Noting an unexpected failure to connect.
2016-11-29T10:27:05.438738-05:00 INFO shill[879]: [INFO:manager.cc(1487)] Service 18 updated; state: Failure failure internal-error

This looks like a shill restart:

2016-11-29T10:27:05.593184-05:00 INFO shill[3804]: [INFO:manager.cc(279)] Manager started.
2016-11-29T10:27:05.594754-05:00 INFO shill[3804]: [INFO:chromeos_power_manager_proxy.cc(352)] OnSignalConnected interface: org.chromium.PowerManager signal: SuspendImminentsuccess: 1
2016-11-29T10:27:05.595318-05:00 INFO shill[3804]: [INFO:chromeos_power_manager_proxy.cc(352)] OnSignalConnected interface: org.chromium.PowerManager signal: SuspendDonesuccess: 1
2016-11-29T10:27:05.595811-05:00 INFO shill[3804]: [INFO:chromeos_power_manager_proxy.cc(352)] OnSignalConnected interface: org.chromium.PowerManager signal: DarkSuspendImminentsuccess: 1
2016-11-29T10:27:05.619828-05:00 INFO shill[3804]: [INFO:service.cc(296)] wifi service 0 constructed.
2016-11-29T10:27:05.619857-05:00 INFO shill[3804]: [INFO:wifi_service.cc(187)] Constructed WiFi service 0 name: [SSID=6]
2016-11-29T10:27:05.619874-05:00 INFO shill[3804]: [INFO:manager.cc(530)] PushProfileInternal finished; 1 profile(s) now present.
2016-11-29T10:27:05.619884-05:00 INFO shill[3804]: [INFO:service.cc(296)] wifi service 1 constructed.
2016-11-29T10:27:05.619892-05:00 INFO shill[3804]: [INFO:wifi_service.cc(187)] Constructed WiFi service 1 name: [SSID=7]
2016-11-29T10:27:05.619902-05:00 INFO shill[3804]: [INFO:service.cc(296)] vpn service 2 constructed.
2016-11-29T10:27:05.621779-05:00 INFO shill[3804]: [INFO:service.cc(296)] vpn service 3 constructed.
2016-11-29T10:27:05.623308-05:00 INFO shill[3804]: [INFO:service.cc(296)] vpn service 4 constructed.
[...]

/var/log/messages shows a crash, but crash reporting was disabled:

2016-11-29T10:27:05.439055-05:00 INFO kernel: [   88.026656] traps: shill[879] general protection ip:7f501f50a5d3 sp:7ffe2a29c368 error:0 in libstdc++.so.6.0.20[7f501f493000+f6000]
2016-11-29T10:27:05.478837-05:00 WARNING crash_reporter[3788]: [user] Received crash notification for shill[879] sig 11, user 0 (ignoring - no consent)
2016-11-29T10:27:05.482864-05:00 WARNING kernel: [   88.070595] init: shill main process (879) killed by SEGV signal
2016-11-29T10:27:05.482885-05:00 WARNING kernel: [   88.070644] init: shill main process ended, respawning

Is there a set of crash report files under /var/spool/crash that you could send over?  (If not, try opting in to crash reporting via https://support.google.com/chrome/answer/96817 before reproducing the failure.  If they're still missing, enter developer mode and `touch /var/lib/crash_sender_paused`)

FWIW, I found crash dumps on my machine from my tests yesterday, although I didn't see the same symptoms that you did.  I'll continue investigating from this end, but getting crash dumps from your system would be useful in confirming whether we hit the same crash.
I just turned on to send crash reports and this time I ran in to the situation where it appears the whole browser crashes and restarted. Is there any way to check if you got a crash report?
Crashed it a couple more times for you, the first was the full browser crash/restart, the other two were the unstable VPN framework, so theoretically if the crash reporter works you should see 3 of them.
Unfortunately the crash reports do not contain any information that would let me trace it back to your machine.  Could you please grab the files directly from /var/spool/crash and email them to me?

(FYI, the opt-in to crash reporting is what tells the crash_reporter daemon to collect the core/minidump files under /var/spool/crash instead of discarding them immediately.)
Ok, will be a little bit because I need to put this Chromebook in to Developer Mode which appears it's going to wipe out all of my prior crashes too.
Just sent you one shill crash where browser died.  Will try to repro another one where it doesn't fully die and will send that too incase they are different.
Appears to also be a shill crash when the browser isn't resetting itself as well. Let me know if that dump is good enough to root cause the regression or you need more.
For future reference (this may not work now, since I am not sure if wiping the machine resets the client ID) -
- chrome://crashes will get you the crash reports sent. 
- chrome://system has the CLIENT-ID field, if we get that, we can filter crash reports on our internal crash database site to show crashes from that client-ID.

Can you provide your current client ID? 


Sent it to you via a private email.  Also sent Kevin my shill dmp file directly.
The minidump received via email says:

Thread 0 (crashed)
 0  libstdc++.so.6.0.20 + 0x775d3
    rax = 0x2073736563637553   rdx = 0x00007fcc20cabd80
    rcx = 0x0000000000000001   rbx = 0x00007fcc20d04830
    rsi = 0x0000000000000000   rdi = 0x00007fcc20d04830
    rbp = 0x00007ffe7b5d86f0   rsp = 0x00007ffe7b5d8248
     r8 = 0x0000000000000000    r9 = 0x0000000000000000
    r10 = 0x000000000000002f   r11 = 0x0000000000000246
    r12 = 0x00007fcc20cb3620   r13 = 0x00007fcc1ec91220
    r14 = 0x00007ffe7b5d8860   r15 = 0x00007fcc20cb3df8
    rip = 0x00007fcc1e9f45d3
    Found by: given as instruction pointer in context
 1  shill + 0x108416
    rbp = 0x00007ffe7b5d8a50   rsp = 0x00007ffe7b5d8700
    rip = 0x00007fcc1f503416
    Found by: previous frame's frame pointer
 2  shill + 0x111278
    rbp = 0x00007ffe7b5d8aa0   rsp = 0x00007ffe7b5d8a60
    rip = 0x00007fcc1f50c278
    Found by: previous frame's frame pointer
 3  shill + 0xd68e8
    rbp = 0x00007ffe7b5d8af0   rsp = 0x00007ffe7b5d8ab0
    rip = 0x00007fcc1f4d18e8

(1) indicates that (0) probably happened in a call to UpdateDefaultServices():

/build/swanky/var/cache/portage/chromeos-base/shill/out/Default/../../../../../../../tmp/portage/chromeos-base/shill-0.0.3-r243/work/shill-0.0.3/aosp/system/connectivity/shill/manager.cc:1959
  108400:       48 8d b5 30 fe ff ff    lea    -0x1d0(%rbp),%rsi
  108407:       48 8d 95 10 fe ff ff    lea    -0x1f0(%rbp),%rdx
  10840e:       4c 89 e7                mov    %r12,%rdi
  108411:       e8 9a ef ff ff          callq  1073b0 <_ZN5shill7Manager21UpdateDefaultServicesERK13scoped_refptrINS_7ServiceEES5_>
/build/swanky/var/cache/portage/chromeos-base/shill/out/Default/../../../../../../../tmp/portage/chromeos-bas
e/shill-0.0.3-r243/work/shill-0.0.3/aosp/system/connectivity/shill/manager.cc:1960
  108416:       4c 89 e7                mov    %r12,%rdi
  108419:       e8 c2 06 00 00          callq  108ae0 <_ZN5shill7Manager22RefreshConnectionStateEv>
/build/swanky/var/cache/portage/chromeos-base/shill/out/Default/../../../../../../../tmp/portage/chromeos-base/shill-0.0.3-r243/work/shill-0.0.3/aosp/system/connectivity/shill/manager.cc:1961
  10841e:       4c 89 e7                mov    %r12,%rdi
  108421:       e8 9a 09 00 00          callq  108dc0 <_ZN5shill7Manager23DetectMultiHomedDevicesEv>

For (0), it looks like 0x18(%rdi) contained " Success" rather than a pointer, so the branch at 775c7 was taken and the dereference at 775d3 faulted:

00000000000775c0 <_ZSt18_Rb_tree_incrementPSt18_Rb_tree_node_base>:
local_Rb_tree_increment():
/mnt/host/source/src/third_party/gcc/libstdc++-v3/src/c++98/tree.cc:62
   775c0:       48 8b 47 18             mov    0x18(%rdi),%rax
   775c4:       48 85 c0                test   %rax,%rax
   775c7:       75 0a                   jne    775d3 <_ZSt18_Rb_tree_incrementPSt18_Rb_tree_node_base+0x13>
   775c9:       eb 15                   jmp    775e0 <_ZSt18_Rb_tree_incrementPSt18_Rb_tree_node_base+0x20>
   775cb:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
/mnt/host/source/src/third_party/gcc/libstdc++-v3/src/c++98/tree.cc:65
   775d0:       48 89 d0                mov    %rdx,%rax
   775d3:       48 8b 50 10             mov    0x10(%rax),%rdx
   775d7:       48 85 d2                test   %rdx,%rdx
   775da:       75 f4                   jne    775d0 <_ZSt18_Rb_tree_incrementPSt18_Rb_tree_node_base+0x10>
   775dc:       f3 c3                   repz retq 

I am wondering whether we have a "stale" registered DefaultServiceCallback that references a freed object.  I see that DeregisterDefaultServiceCallback() is being called right before the failure but maybe there is more to the story.

It looks like this VPN app does not yet support reconnections, so any change in connectivity will trigger a disconnection.  Pete, can you confirm?

BTW, Kirtika, do you know if anyone has successfully run the "live" shill binary (not just unit tests) with valgrind and/or ASAN?  I would be particularly interested in checking out the VPN flows.

The VPN app does support reconnect and it works well, but it's not going to do a reconnect when you're at this part of connection establishment since you're not even logged in yet.
Ah, right, reconnect_supported_ will always be false before SetParameters is invoked, regardless of the app's support for the feature.

Tracing through the call sequence, I see that Manager::UpdateDefaultServices() does this:

  for (const auto& callback : default_service_callbacks_) {
    callback.second.Run(physical_service);
  }

This invokes a callback to ThirdPartyVpnDriver::OnDefaultServiceChanged(), which calls ThirdPartyVpnDriver::Cleanup(), which calls:

  void Manager::DeregisterDefaultServiceCallback(int tag) {
    default_service_callbacks_.erase(tag);
  }

So the bug is basically:
http://stackoverflow.com/questions/8234779/how-to-remove-from-a-map-while-iterating-it

In the Linux kernel we have a list_for_each_safe() macro to handle this sort of thing.  It works by using an extra temporary variable to avoid touching the (potentially deleted) element after it has been operated on.

The std::map documentation says "References and iterators to the *erased elements* are invalidated. Other references and iterators are not affected."  Therefore I think a similar approach could work for shill:

  for (auto it = default_service_callbacks_.cbegin();
       it != default_service_callbacks_.cend();
       /* no increment */) {
    const ServiceCallback& callback = (*it++).second;
    callback.Run(physical_service);
  }

Any thoughts/suggestions before I send a CL?
Labels: Merge-Request-56 Merge-Request-55
> Any chance we can get a fix in the next two days? If so we can make the targeted RC

Bernie: I'm putting in the merge requests early to try to get this into M55 Wednesday.  Will that work?
Project Member

Comment 27 by bugdroid1@chromium.org, Nov 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/c645a4451726b1c37e02c8fcaed71a18bf63ef52

commit c645a4451726b1c37e02c8fcaed71a18bf63ef52
Author: Kevin Cernekee <cernekee@chromium.org>
Date: Tue Nov 29 21:18:59 2016

shill: Fix crash on VPN connection failure

If ThirdPartyVpnDriver::OnDefaultServiceChanged() is invoked before
the VPN app is able to handle reconnections, it will call Cleanup()
which calls Manager::DeregisterDefaultServiceCallback() and erases
the current element from the std::map.  This invalidates the reference
to the element, causing the loop to segfault.

So, switch back to an iterator, and advance the iterator before invoking
the callback so that it is safe for the callback to erase the current
element.

BUG= chromium:661597 
TEST=`FEATURES=test emerge-link shill`
TEST=manually reproduce the problem described in the bug, and verify
     that shill no longer crashes

Change-Id: I47e1ffdaa8685467e288f6b29e8396d7dced4caa
Reviewed-on: https://chromium-review.googlesource.com/414612
Commit-Ready: Kevin Cernekee <cernekee@chromium.org>
Tested-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-by: Ben Chan <benchan@chromium.org>
Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org>

[modify] https://crrev.com/c645a4451726b1c37e02c8fcaed71a18bf63ef52/manager.cc

Project Member

Comment 28 by bugdroid1@chromium.org, Nov 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/e4a788887d3c9d594412d8730057fa94027dc0e1

commit e4a788887d3c9d594412d8730057fa94027dc0e1
Author: Ben Chan <benchan@chromium.org>
Date: Tue Nov 29 23:01:42 2016

shill: add test to verify safe iteration of default service callbacks

Manager::UpdateDefaultServices() iterates through registered default
service callbacks held in a std::map and invokes each of the callbacks
in sequence. As revealed in chromium:661597, a callback upon invocation
may call Manager::DeregisterDefaultServiceCallback() to remove itself
from the std::map, which leads to unsafe access of an invalidated
std::map::iterator. CL:414612 fixes that issue. This CL adds a unit test
to expose the original issue and verify that CL:414612 has addressed the
issue.

BUG= chromium:661597 
CQ-DEPEND=CL:414612
TEST=Run unit tests.

Change-Id: I4f42fa5fd3995fbb6bed78467d8b952366cfd914
Reviewed-on: https://chromium-review.googlesource.com/414616
Commit-Ready: Ben Chan <benchan@chromium.org>
Tested-by: Ben Chan <benchan@chromium.org>
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/e4a788887d3c9d594412d8730057fa94027dc0e1/manager_unittest.cc
[modify] https://crrev.com/e4a788887d3c9d594412d8730057fa94027dc0e1/manager.h

Thanks for getting this regression fixed before M55 goes to Stable. When will I be able to test this on M55 Dev?
err 56 Dev
Guessing it will land in M57 canary 9037.0.0 later today.  Currently waiting on approval to merge back to M55/M56.
Labels: -Merge-Request-55 -Merge-Request-56 Merge-Approved-56 Merge-Approved-55
Project Member

Comment 33 by bugdroid1@chromium.org, Dec 1 2016

Labels: merge-merged-release-R55-8872.B
The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/56c118b188f75d59158f48ce29d348434c4279a3

commit 56c118b188f75d59158f48ce29d348434c4279a3
Author: Kevin Cernekee <cernekee@chromium.org>
Date: Tue Nov 29 21:18:59 2016

shill: Fix crash on VPN connection failure

If ThirdPartyVpnDriver::OnDefaultServiceChanged() is invoked before
the VPN app is able to handle reconnections, it will call Cleanup()
which calls Manager::DeregisterDefaultServiceCallback() and erases
the current element from the std::map.  This invalidates the reference
to the element, causing the loop to segfault.

So, switch back to an iterator, and advance the iterator before invoking
the callback so that it is safe for the callback to erase the current
element.

BUG= chromium:661597 
TEST=`FEATURES=test emerge-link shill`
TEST=manually reproduce the problem described in the bug, and verify
     that shill no longer crashes

Change-Id: I47e1ffdaa8685467e288f6b29e8396d7dced4caa
Reviewed-on: https://chromium-review.googlesource.com/414612
Commit-Ready: Kevin Cernekee <cernekee@chromium.org>
Tested-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-by: Ben Chan <benchan@chromium.org>
Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org>
(cherry picked from commit c645a4451726b1c37e02c8fcaed71a18bf63ef52)

[modify] https://crrev.com/56c118b188f75d59158f48ce29d348434c4279a3/manager.cc

Project Member

Comment 34 by bugdroid1@chromium.org, Dec 1 2016

Labels: merge-merged-release-R56-9000.B
The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/595393b80ba22cfbe42efe23a3c0e1e61c1638de

commit 595393b80ba22cfbe42efe23a3c0e1e61c1638de
Author: Ben Chan <benchan@chromium.org>
Date: Tue Nov 29 23:01:42 2016

shill: add test to verify safe iteration of default service callbacks

Manager::UpdateDefaultServices() iterates through registered default
service callbacks held in a std::map and invokes each of the callbacks
in sequence. As revealed in chromium:661597, a callback upon invocation
may call Manager::DeregisterDefaultServiceCallback() to remove itself
from the std::map, which leads to unsafe access of an invalidated
std::map::iterator. CL:414612 fixes that issue. This CL adds a unit test
to expose the original issue and verify that CL:414612 has addressed the
issue.

BUG= chromium:661597 
CQ-DEPEND=CL:414612
TEST=Run unit tests.

Change-Id: I4f42fa5fd3995fbb6bed78467d8b952366cfd914
Reviewed-on: https://chromium-review.googlesource.com/414616
Commit-Ready: Ben Chan <benchan@chromium.org>
Tested-by: Ben Chan <benchan@chromium.org>
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>
(cherry picked from commit e4a788887d3c9d594412d8730057fa94027dc0e1)

[modify] https://crrev.com/595393b80ba22cfbe42efe23a3c0e1e61c1638de/manager_unittest.cc
[modify] https://crrev.com/595393b80ba22cfbe42efe23a3c0e1e61c1638de/manager.h

Labels: -Merge-Approved-55 -Merge-Approved-56
Status: Fixed (was: Unconfirmed)
Project Member

Comment 36 by bugdroid1@chromium.org, Dec 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/7a7487f4505332ba5d534f81d213e04743a64385

commit 7a7487f4505332ba5d534f81d213e04743a64385
Author: Kevin Cernekee <cernekee@chromium.org>
Date: Tue Nov 29 21:18:59 2016

shill: Fix crash on VPN connection failure

If ThirdPartyVpnDriver::OnDefaultServiceChanged() is invoked before
the VPN app is able to handle reconnections, it will call Cleanup()
which calls Manager::DeregisterDefaultServiceCallback() and erases
the current element from the std::map.  This invalidates the reference
to the element, causing the loop to segfault.

So, switch back to an iterator, and advance the iterator before invoking
the callback so that it is safe for the callback to erase the current
element.

BUG= chromium:661597 
TEST=`FEATURES=test emerge-link shill`
TEST=manually reproduce the problem described in the bug, and verify
     that shill no longer crashes

Change-Id: I47e1ffdaa8685467e288f6b29e8396d7dced4caa
Reviewed-on: https://chromium-review.googlesource.com/414612
Commit-Ready: Kevin Cernekee <cernekee@chromium.org>
Tested-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-by: Ben Chan <benchan@chromium.org>
Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org>
(cherry picked from commit c645a4451726b1c37e02c8fcaed71a18bf63ef52)

[modify] https://crrev.com/7a7487f4505332ba5d534f81d213e04743a64385/manager.cc

Project Member

Comment 37 by bugdroid1@chromium.org, Dec 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/e0092e46f924c98656b2fefa1b877eafa5f86202

commit e0092e46f924c98656b2fefa1b877eafa5f86202
Author: Ben Chan <benchan@chromium.org>
Date: Tue Nov 29 23:01:42 2016

shill: add test to verify safe iteration of default service callbacks

Manager::UpdateDefaultServices() iterates through registered default
service callbacks held in a std::map and invokes each of the callbacks
in sequence. As revealed in chromium:661597, a callback upon invocation
may call Manager::DeregisterDefaultServiceCallback() to remove itself
from the std::map, which leads to unsafe access of an invalidated
std::map::iterator. CL:414612 fixes that issue. This CL adds a unit test
to expose the original issue and verify that CL:414612 has addressed the
issue.

BUG= chromium:661597 
CQ-DEPEND=CL:414612
TEST=Run unit tests.

Change-Id: I4f42fa5fd3995fbb6bed78467d8b952366cfd914
Reviewed-on: https://chromium-review.googlesource.com/414616
Commit-Ready: Ben Chan <benchan@chromium.org>
Tested-by: Ben Chan <benchan@chromium.org>
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>
(cherry picked from commit e4a788887d3c9d594412d8730057fa94027dc0e1)

[modify] https://crrev.com/e0092e46f924c98656b2fefa1b877eafa5f86202/manager_unittest.cc
[modify] https://crrev.com/e0092e46f924c98656b2fefa1b877eafa5f86202/manager.h

Received update to 56.0.2924.12 dev (64-bit) and looks like the issue is resolved in this build.  OS now reports Internal error - Server message: Underlying network disconnected and VPN framework does not get in messed up state.
Great, thanks for the update.

For M56 the change landed in OS version 9000.13.0.  The last dev channel push was 9000.15.0 on 12/1.

(Note that several OS builds contained Chrome [browser] version 56.0.2924.12 so that isn't always a 100%-precise descriptor for a release.)

For M55 there has not been a build since this CL landed in the tree, not sure if we made the cutoff.
Good to know. Platform is 9000.15.0 (official build) dev-channel.
Status: Verified (was: Fixed)
No issues on Cyan R55-8872.67.0 plus Comment#38.

Sign in to add a comment