bluetoothd crash in prim_attr_cmp() |
||||||||||
Issue description
OS: CHROMEOS_RELEASE_BUILDER_PATH=eve-release/R66-10452.22.0
with latest bluez package on Apr 16 2018.
What steps will reproduce the problem?
Run magic tethering on Eve with 1 pixel phone.
What is the expected result?
It should run without bluetoothd crash.
What happens instead?
Bluetoothd crash is observed daily or every other day.
In the attached tarball of logs, bluetoothd segfault occurred twice in one hour.
2018-04-16T13:03:54.586304+08:00 INFO kernel: [ 809.057963] bluetoothd[2634]: segfault at 2c ip 00005bd6b0dc416c sp 00007ffd546ebe80 error 4 in bluetoothd[5bd6b0d47000+c9000]
2018-04-16T13:07:54.612540+08:00 INFO kernel: [ 1049.092709] bluetoothd[6940]: segfault at 2c ip 00005cb257d3a16c sp 00007ffca66563b0 error 4 in bluetoothd[5cb257cbd000+c9000]
The backtrace of the core dump
#0 prim_attr_cmp (a=0x0, b=<optimized out>) at src/device.c:4005
#1 0x00007b0009778de2 in g_slist_find_custom (list=0x5bd6b128d710, data=0x5bd6b127d5f0, func=0x5bd6b0dc4140 <prim_attr_cmp>) at ../../glib-2.50.3/glib/gslist.c:746
#2 0x00005bd6b0dc190b in gatt_service_removed (attr=0x5bd6b127d5f0, user_data=0x5bd6b128a660) at src/device.c:4045
#3 0x00005bd6b0de08bc in queue_foreach (queue=0x5bd6b12792a0, function=0x5bd6b0deeae0 <handle_notify>, user_data=0x7ffd546ebfa0) at src/shared/queue.c:220
#4 0x00005bd6b0decb60 in notify_service_changed (db=0x5bd6b12862c0, service=0x5bd6b127c100, added=false) at src/shared/gatt-db.c:268
#5 gatt_db_service_destroy (data=0x5bd6b127c100) at src/shared/gatt-db.c:279
#6 0x00005bd6b0de069f in queue_remove_all (queue=<optimized out>, function=<optimized out>, user_data=0x0, destroy=0x5bd6b0decb00 <gatt_db_service_destroy>)
at src/shared/queue.c:351
#7 0x00005bd6b0decc62 in gatt_db_clear_range (db=0x5bd6b12862c0, start_handle=1, end_handle=65535) at src/shared/gatt-db.c:453
#8 gatt_db_clear (db=0x5bd6b12862c0) at src/shared/gatt-db.c:413
#9 0x00005bd6b0dc4860 in gatt_cache_cleanup (device=<optimized out>) at src/device.c:573
#10 gatt_client_cleanup (device=0x5bd6b128a660) at src/device.c:581
#11 0x00005bd6b0dbe994 in attio_cleanup (device=0x5bd6b128a660) at src/device.c:609
#12 0x00005bd6b0dc1673 in device_free (user_data=0x5bd6b128a660) at src/device.c:667
#13 0x00005bd6b0ddba4f in remove_interface (data=0x5bd6b128ec00, name=0x5bd6b0e08604 "org.bluez.Device1") at gdbus/object.c:667
#14 0x00005bd6b0ddb980 in g_dbus_unregister_interface (connection=0x5bd6b121a1b0, path=<optimized out>, name=0x5bd6b0e08604 "org.bluez.Device1") at gdbus/object.c:1391
#15 0x00005bd6b0dbd2f0 in btd_device_unref (device=<optimized out>) at src/device.c:6573
#16 device_remove (device=0x5bd6b128a660, remove_stored=<optimized out>) at src/device.c:4515
#17 0x00005bd6b0da35c8 in btd_adapter_remove_device (adapter=0x5bd6b1233660, dev=0x5bd6b128a660) at src/adapter.c:1248
#18 0x00005bd6b0da846c in dev_disconnected (adapter=0x5bd6b1233660, addr=0x5bd6b121cae9, reason=<optimized out>) at src/adapter.c:7456
#19 0x00005bd6b0de1236 in request_complete (mgmt=<optimized out>, status=0 '\000', opcode=20, length=7, param=0x5bd6b121cae9, index=<optimized out>)
at src/shared/mgmt.c:261
#20 can_read_data (io=<optimized out>, user_data=0x5bd6b121ca60) at src/shared/mgmt.c:353
#21 0x00005bd6b0deefc4 in watch_callback (channel=<optimized out>, cond=<optimized out>, user_data=0x7ffd546ebe8c) at src/shared/io-glib.c:170
#22 0x00007b00097596b6 in g_main_dispatch (context=<optimized out>) at ../../glib-2.50.3/glib/gmain.c:3203
#23 g_main_context_dispatch (context=<optimized out>) at ../../glib-2.50.3/glib/gmain.c:3856
#24 0x00007b0009759a5e in g_main_context_iterate (context=<optimized out>, block=<optimized out>, dispatch=<optimized out>, self=<optimized out>)
at ../../glib-2.50.3/glib/gmain.c:3929
#25 0x00007b0009759d8f in g_main_loop_run (loop=0x5bd6b1211ed0) at ../../glib-2.50.3/glib/gmain.c:4125
#26 0x00005bd6b0dc9334 in main (argc=1, argv=0x7ffd546ed608) at src/main.c:796
The corresponding source code
3997 static gint prim_attr_cmp(gconstpointer a, gconstpointer b)
3998 {
3999 const struct gatt_primary *prim = a;
4000 const struct gatt_db_attribute *attr = b;
4001 uint16_t start, end;
4002
4003 gatt_db_attribute_get_service_handles(attr, &start, &end);
4004
4005 return !(prim->range.start == start && prim->range.end == end);
Sometimes the pointer prim is NULL. The null pointer dereference at line 4005 caused the segmentation fault.
,
Apr 21 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/bluez/+/d8ee0901515c465d618fe3ab12ceec85dbf50972 commit d8ee0901515c465d618fe3ab12ceec85dbf50972 Author: Joseph Hwang <josephsih@chromium.org> Date: Sat Apr 21 10:52:53 2018 CHROMIUM: device.c: sanity check of NULL pointers in prim_attr_cmp() In some cases, the gatt_primary service structure may be NULL. Do not dereference the NULL pointer when this occurs. BUG= chromium:833389 ,b:78181184 TEST=Run magic tethering for a few hours. Observe that no segfault occurs in prim_attr_cmp(). Change-Id: I26ba52233f720f8d6f28d4948bf6d9658333f251 Reviewed-on: https://chromium-review.googlesource.com/1016441 Commit-Ready: Shyh-In Hwang <josephsih@chromium.org> Tested-by: Shyh-In Hwang <josephsih@chromium.org> Reviewed-by: Dmitry Grinberg <dmitrygr@google.com> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> [modify] https://crrev.com/d8ee0901515c465d618fe3ab12ceec85dbf50972/src/device.c
,
Apr 26 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/bluez/+/d933f7ed8dd8cf2d29bae7a32e58e38c11cd89aa commit d933f7ed8dd8cf2d29bae7a32e58e38c11cd89aa Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Date: Thu Apr 26 06:00:40 2018 BACKPORT: device: Fix crash freeing device Calling gatt_db_register with NULL pointers makes no sense since it does nothing when the callbacks are NULL so the callback are still reachable causing invalid memory to accessed: Invalid read of size 8 at 0x50EAFDC: g_slist_find_custom (in /usr/lib64/libglib-2.0.so.0.5000.3) by 0x46CDA1: gatt_service_removed (device.c:3563) by 0x4896F8: queue_foreach (queue.c:220) by 0x4951FB: notify_service_changed (gatt-db.c:268) by 0x4951FB: gatt_db_service_destroy (gatt-db.c:279) by 0x4898F5: queue_remove_all (queue.c:336) by 0x4952E2: gatt_db_clear_range (gatt-db.c:461) by 0x48F32B: discovery_op_unref (gatt-client.c:447) by 0x4979AA: bt_gatt_request_unref (gatt-helpers.c:594) by 0x490489: bt_gatt_client_cancel_all (gatt-client.c:2083) by 0x4904D8: bt_gatt_client_free (gatt-client.c:1752) by 0x46CF70: gatt_client_cleanup (device.c:561) by 0x46D01A: attio_cleanup (device.c:586) Address 0x86cb940 is 0 bytes inside a block of size 16 free'd at 0x4C2ED4A: free (vg_replace_malloc.c:530) by 0x50D16CD: g_free (in /usr/lib64/libglib-2.0.so.0.5000.3) by 0x50EA743: g_slice_free_chain_with_offset (in /usr/lib64/libglib-2.0.so.0.5000.3) by 0x46D18C: device_free (device.c:638) by 0x485B05: remove_interface (object.c:667) by 0x485FF9: g_dbus_unregister_interface (object.c:1391) by 0x45EFA9: btd_adapter_remove_device (adapter.c:1200) by 0x45FBC3: dev_disconnected (adapter.c:6800) by 0x48A1A5: request_complete (mgmt.c:261) by 0x48AC0B: can_read_data (mgmt.c:353) by 0x496954: watch_callback (io-glib.c:170) by 0x50CBE51: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5000.3) BUG= chromium:833389 ,b:78181184 TEST=Enable debug option of bluetoothd. Run magic tethering for a few hours. Observe that there are no warning messages as below in /var/log/messages. WARNING bluetoothd[2657]: prim_attr_cmp: null gatt_primary Change-Id: I90963d7eb84f62d123d5644f4ac66763568d47ec Signed-off-by: Joseph Hwang <josephsih@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1023498 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Commit-Ready: Shyh-In Hwang <josephsih@chromium.org> Tested-by: Shyh-In Hwang <josephsih@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> [modify] https://crrev.com/d933f7ed8dd8cf2d29bae7a32e58e38c11cd89aa/src/device.c
,
Apr 26 2018
,
May 14 2018
It is desirable to merge the 2 patches in C#2 and C#3 back to M-67 in order to fix bluetoothd crashes we saw frequently. Thanks!
,
May 14 2018
This bug requires manual review: M67 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 14 2018
,
May 14 2018
Has this change been extensively tested and confirmed working / low risk? We're pretty far long with M67 so need to mitigate risk. Esp given that this could have broad board impact. This wasn't a M67 regression, so prefer to wait for M68.
,
May 16 2018
Here below was the test result from Pramod: "I have tested on Eve on 10682.0.0 build, and the manual tests are all passed. No Regressions were observed." I think it would be less risky by accepting the patches than postponing to next milestone. I personally used to experience the crash several times a day when running magic tethering. Thanks!
,
May 16 2018
Approving merge to M67 Chrome OS.
,
May 17 2018
Thanks, Kevin! Cherry-picking the two patches to M67.
,
May 20 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/bluez/+/e8a45ca64f97ccb0fac07b38829dd47d86de5326 commit e8a45ca64f97ccb0fac07b38829dd47d86de5326 Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Date: Sun May 20 14:46:54 2018 BACKPORT: device: Fix crash freeing device Calling gatt_db_register with NULL pointers makes no sense since it does nothing when the callbacks are NULL so the callback are still reachable causing invalid memory to accessed: Invalid read of size 8 at 0x50EAFDC: g_slist_find_custom (in /usr/lib64/libglib-2.0.so.0.5000.3) by 0x46CDA1: gatt_service_removed (device.c:3563) by 0x4896F8: queue_foreach (queue.c:220) by 0x4951FB: notify_service_changed (gatt-db.c:268) by 0x4951FB: gatt_db_service_destroy (gatt-db.c:279) by 0x4898F5: queue_remove_all (queue.c:336) by 0x4952E2: gatt_db_clear_range (gatt-db.c:461) by 0x48F32B: discovery_op_unref (gatt-client.c:447) by 0x4979AA: bt_gatt_request_unref (gatt-helpers.c:594) by 0x490489: bt_gatt_client_cancel_all (gatt-client.c:2083) by 0x4904D8: bt_gatt_client_free (gatt-client.c:1752) by 0x46CF70: gatt_client_cleanup (device.c:561) by 0x46D01A: attio_cleanup (device.c:586) Address 0x86cb940 is 0 bytes inside a block of size 16 free'd at 0x4C2ED4A: free (vg_replace_malloc.c:530) by 0x50D16CD: g_free (in /usr/lib64/libglib-2.0.so.0.5000.3) by 0x50EA743: g_slice_free_chain_with_offset (in /usr/lib64/libglib-2.0.so.0.5000.3) by 0x46D18C: device_free (device.c:638) by 0x485B05: remove_interface (object.c:667) by 0x485FF9: g_dbus_unregister_interface (object.c:1391) by 0x45EFA9: btd_adapter_remove_device (adapter.c:1200) by 0x45FBC3: dev_disconnected (adapter.c:6800) by 0x48A1A5: request_complete (mgmt.c:261) by 0x48AC0B: can_read_data (mgmt.c:353) by 0x496954: watch_callback (io-glib.c:170) by 0x50CBE51: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5000.3) BUG= chromium:833389 ,b:78181184 TEST=Enable debug option of bluetoothd. Run magic tethering for a few hours. Observe that there are no warning messages as below in /var/log/messages. WARNING bluetoothd[2657]: prim_attr_cmp: null gatt_primary Change-Id: I90963d7eb84f62d123d5644f4ac66763568d47ec Signed-off-by: Joseph Hwang <josephsih@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1023498 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Commit-Ready: Shyh-In Hwang <josephsih@chromium.org> Tested-by: Shyh-In Hwang <josephsih@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> (cherry picked from commit d933f7ed8dd8cf2d29bae7a32e58e38c11cd89aa) Reviewed-on: https://chromium-review.googlesource.com/1063051 Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> [modify] https://crrev.com/e8a45ca64f97ccb0fac07b38829dd47d86de5326/src/device.c
,
May 20 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/bluez/+/4abb61ce31f253e6a91d29e75c9eb2eeda6e2dcc commit 4abb61ce31f253e6a91d29e75c9eb2eeda6e2dcc Author: Joseph Hwang <josephsih@chromium.org> Date: Sun May 20 14:46:55 2018 CHROMIUM: device.c: sanity check of NULL pointers in prim_attr_cmp() In some cases, the gatt_primary service structure may be NULL. Do not dereference the NULL pointer when this occurs. BUG= chromium:833389 ,b:78181184 TEST=Run magic tethering for a few hours. Observe that no segfault occurs in prim_attr_cmp(). Change-Id: I26ba52233f720f8d6f28d4948bf6d9658333f251 Reviewed-on: https://chromium-review.googlesource.com/1016441 Commit-Ready: Shyh-In Hwang <josephsih@chromium.org> Tested-by: Shyh-In Hwang <josephsih@chromium.org> Reviewed-by: Dmitry Grinberg <dmitrygr@google.com> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> (cherry picked from commit d8ee0901515c465d618fe3ab12ceec85dbf50972) Reviewed-on: https://chromium-review.googlesource.com/1063050 Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> Commit-Queue: Shyh-In Hwang <josephsih@chromium.org> Trybot-Ready: Shyh-In Hwang <josephsih@chromium.org> [modify] https://crrev.com/4abb61ce31f253e6a91d29e75c9eb2eeda6e2dcc/src/device.c
,
May 21 2018
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 22 2018
Thanks for reminding! Marked it as Verified since Pramod has conducted manual tests. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by josephsih@chromium.org
, Apr 16 20183.9 MB
3.9 MB Download