libusb built w/clang fails w/NULL pointer access |
||||||||||||||||
Issue description
Starting from last week, I have problem connecting with servo, the error message was,
~/trunk/src/scripts $ sudo servod -b asuka
2017-05-16 15:09:30,922 - servod - INFO - Start
Traceback (most recent call last):
File "/usr/lib/python-exec/python2.7/servod", line 9, in <module>
load_entry_point('servo===0.0.1-027193a', 'console_scripts', 'servod')()
File "/usr/lib64/python2.7/site-packages/servo/servod.py", line 529, in main
main_function()
File "/usr/lib64/python2.7/site-packages/servo/servod.py", line 446, in main_function
multiservo.parse_rc(logger, options.rcfile))
File "/usr/lib64/python2.7/site-packages/servo/servod.py", line 326, in discover_servo
all_servos.extend(usb_find(vid, pid, serialname))
File "/usr/lib64/python2.7/site-packages/servo/servod.py", line 167, in usb_find
for bus in usb.busses():
File "/usr/lib64/python2.7/site-packages/usb/legacy.py", line 343, in <genexpr>
return (Bus(g) for k, g in _interop._groupby(
File "/usr/lib64/python2.7/site-packages/usb/legacy.py", line 339, in __init__
self.devices = [Device(d) for d in devices]
File "/usr/lib64/python2.7/site-packages/usb/legacy.py", line 324, in __init__
self.configurations = [Configuration(c) for c in dev]
File "/usr/lib64/python2.7/site-packages/usb/legacy.py", line 127, in __init__
[Interface(i) for i in cfg],
File "/usr/lib64/python2.7/site-packages/usb/legacy.py", line 113, in __init__
self.endpoints = [Endpoint(e) for e in intf]
File "/usr/lib64/python2.7/site-packages/usb/core.py", line 376, in __iter__
self.configuration
File "/usr/lib64/python2.7/site-packages/usb/core.py", line 258, in __init__
configuration
File "/usr/lib64/python2.7/site-packages/usb/backend/libusb1.py", line 729, in get_endpoint_descriptor
return _WrapDescriptor(i.endpoint[ep], i)
ValueError: NULL pointer access
After bisected various builds, I found the problem happened starting from 9535.0.0. Among the changes, this (https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/95789da27d19d74d7032b36d1c1872860ce77ffd%5E%21/) looks to be the suspicious. After I reverted the change, the problem is gone.
Just wondering if any change in SDK and/or toolchain causing this problem?
Note that our partner reported the same issue as well.
Thanks.
,
May 17 2017
before debugging too much, the easiest thing would be to just try some upgrades. looks like there are newer versions of both pyusb and libusb1 available.
,
May 17 2017
Apparently a problem for everyone, partners are complaining, and partners are blocked. I'm going to revert to the old sdk to unblock unless anyone had a better approach to get this fixed today.
,
May 17 2017
,
May 17 2017
err, hold up. don't go reverting SDKs. that has a number of ramifications and will simply get reverted as soon as the chromiumos-sdk bot passes again (which it normally does at least once a day).
,
May 17 2017
also, that sdk version is only used as the base sdk version when installing a fresh chroot ... it would have no impact on existing users, and anyone running update_chroot would update past it.
,
May 17 2017
The existing sdk build has apparently been broken for like a week, so no worries about a revert getting clobbered. Also, YH has confirmed that reverting the sdk does in fact fix the problem.
,
May 17 2017
that isn't terribly precise. i could revert that CL locally, run update_chroot, and i'm sure it'll continue to fail. you'd have to revert the CL, download a completely new chroot, and then not update things inside it. still seems like the easiest thing to try out would be to grab some package updates. note: i've never used servod, nor do i know how, so i don't think i'm the best owner here.
,
May 17 2017
Unfortunately it seems like you are the only one who actually knows how the SDK works, unless you can suggest someone better. easy repro steps: cros_sdk --delete cros_sdk --create cros_sdk --no-ns-pid $ sudo servod 2017-05-17 13:40:52,759 - servod - INFO - Start 2017-05-17 13:40:52,812 - servod - ERROR - No servos found Passing. cros_sdk --delete cros_sdk --create cros_sdk --no-ns-pid $ sudo servod ... ValueError: NULL pointer access Failing. No knowledge of servod or actual servo is necessary. Do you think you can fix today or prefer to back out to the old sdk and get a fix in later?
,
May 17 2017
you should insert `./update_chroot` before running servod and see if it still fails. again, "backing out the sdk" isn't going to work the way you think it is. that sdk version is simply the base tarball that gets downloaded. then update_chroot runs and updates all the sdk packages and gets things back to ToT behavior. and any attempts to revert or otherwise modify that file will be automatically clobbered by our bots.
,
May 17 2017
update chroot doesn't change anything. Please try it yourself.
,
May 17 2017
it will eventually. all sdks are "eventually consistent". so whatever broken package is in the new sdk will make it out to all developers when it is upgraded or rebuilt. so to reiterate: reverting the SDK version (like the referenced one) would not help. as you can see in that commit, it was created by a bot. so you can either try an upgrade of these packages as i suggested, or take a working chroot and merge in the changes from the failing chroot to try and narrow down where it went wrong. pretty sure if you take your first example and just run: $ sudo emerge -G libusb:1 it'll start failing too. rebuild it w/gcc and it'll work fine again. the SDK environment is owned by everyone. we are all responsible for keeping it healthy.
,
May 17 2017
,
May 17 2017
,
May 17 2017
this isn't a pri-0 as you have a trivial workaround: https://chromium-review.googlesource.com/507988
,
May 17 2017
It's not a fix unless you submit it. So either commit or leave p0
,
May 17 2017
RE: #16 Just wondering why https://chromium-review.googlesource.com/507988 is being marked as HACK? If libusb does not work when built by clang and building it with gcc addresses the problem, then the CL should be considered a proper fix? Thanks.
,
May 17 2017
,
May 17 2017
The proper fix is probably something in clang so it aligns some obscure behavior more with what gcc does, or possibly something in libusb to have it not break on some obscure compiler behavior. Hack or not, using gcc while we figure out why clang built libusb is borked seems reasonable. With the CL in comment 16 I can see the issue fixed: 0. Start with old chroot that works ok. 1. sudo emerge libusb 2. sudo servod #Bad error :( 3. cherry-pick in the patch... 4. sudo emerge libusb 5. sudo servod #Good error (no servo detected) :) So with this CL, any partners whom have been impacted should be instructed to run the following commands: cd ~/trunk/src/third_party/chromiumos-overlay git fetch https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay refs/changes/88/507988/1 && git cherry-pick FETCH_HEAD sudo emerge libusb
,
May 17 2017
In answer to comment 18, no that's not a real fix because the long-term plan is to pretty much stop supporting GCC and have everything building with clang. So the correct 'real' fix would be to figure out what llvm/clang is doing that is causing the package to break, and to fix that. In the short term, building your package with GCC is an OK workaround.
,
May 17 2017
This repro's for me outside of servod, python -c 'import usb;usb.busses().next()'
,
May 17 2017
RE: #21 That also means all partners need to go through this workaround if I understand it correctly. That can be lots of head-scratching, IMHO. In this case, why not supplying the gcc-built libusb (until we fix the real issue) to save their hassles? My 2 cents.
,
May 17 2017
There is a CL in the process that will build libusb with GCC until the real issue is fixed.
,
May 17 2017
After this comment 16 CL lands, and the sdk gets updated (pending https://bugs.chromium.org/p/chromium/issues/detail?id=722063 ) we will be supplying the gcc-built libusb, at which point any newly created chroot will be fine. If partners (or impacted Googlers for that matter) want it to be working now, they can run the three commands listed in comment 20 in their chroot. This is probably PSA worthy if we believe many engineers are having servo trouble (perhaps silently).
,
May 17 2017
Lowering priority as TOT should still function while root cause is being investigated.
,
May 18 2017
,
May 18 2017
,
May 18 2017
RE: #24 What is the CL being referred, as https://chromium-review.googlesource.com/507988 has been abandoned? Thanks.
,
May 18 2017
It's now https://chromium-review.googlesource.com/#/c/508253/. Thanks Bernie.
,
May 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/2840f9a2d1e9941ef7bf9b92940997ad40fad037 commit 2840f9a2d1e9941ef7bf9b92940997ad40fad037 Author: Nick Sanders <nsanders@chromium.org> Date: Fri May 19 20:57:47 2017 [HACK] libusb: build w/gcc Clang build causes libusb to break, add a build exception for libusb to use gcc. TODO(cmtrice): fix libusb under clang and revert this change. BUG=chromium:723145 TEST=rebuild libusb in sdk and `sudo servod` no longer crashes Change-Id: I1d3f4c1d863721fbcff009dde0429043c85b27ea Reviewed-on: https://chromium-review.googlesource.com/508253 Commit-Ready: YH Lin <yueherngl@chromium.org> Tested-by: Nick Sanders <nsanders@chromium.org> Tested-by: YH Lin <yueherngl@chromium.org> Tested-by: Bernie Thompson <bhthompson@chromium.org> Reviewed-by: Caroline Tice <cmtice@chromium.org> Reviewed-by: YH Lin <yueherngl@chromium.org> Reviewed-by: Bernie Thompson <bhthompson@chromium.org> [add] https://crrev.com/2840f9a2d1e9941ef7bf9b92940997ad40fad037/chromeos/config/env/dev-libs/libusb
,
May 22 2017
With ToT this morning the problem is gone. Thanks.
,
May 22 2017
it is good we have a workaround. But this is only a workaround. But can anyone try to figure out the problem with LLVM. This is likely to point to a real problem in the libusb. Vapier suggested you tried newer versions of libusb. Can you try that?
,
May 22 2017
,
May 23 2017
I tried the newer libusb (1.0.21) and found similar problem.
,
May 23 2017
if you have a CL to update packages, please post it so we can get it merged. we want to update regardless of this bug.
,
May 23 2017
Oh sure, please have a look, https://chromium-review.googlesource.com/#/c/513502/ Hopefully I'm updating it the right way... Thanks.
,
May 23 2017
Ok, thanks. It will be good to get this updated. Does any of you have any expertisse with this package so that you can do more triaging? Otherwise, we will just have to do automatic bisection of object files to triage the issue.
,
May 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/portage-stable/+/bd16cba511f10d99cc1b3149ea1f7349fed2372b commit bd16cba511f10d99cc1b3149ea1f7349fed2372b Author: YH Lin <yueherngl@google.com> Date: Fri May 26 00:15:59 2017 libusb: upgraded package to upstream Upgraded dev-libs/libusb to version 1.0.21 on amd64 Updating libusb from 1.0.20 to 1.0.21 (amd64), even though similar problem can still be found (gcc build works but not llvm build). BUG=chromium:723145 TEST=sudo emerge libusb Change-Id: I80187b39f3ab5dbaecce87588b1c0a6c8a193a83 Reviewed-on: https://chromium-review.googlesource.com/513502 Commit-Ready: YH Lin <yueherngl@chromium.org> Tested-by: YH Lin <yueherngl@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/bd16cba511f10d99cc1b3149ea1f7349fed2372b/dev-libs/libusb/Manifest [delete] https://crrev.com/1d1d6042659c26a8fef0fddb516e4e97930672cd/metadata/md5-cache/dev-libs/libusb-1.0.19 [rename] https://crrev.com/bd16cba511f10d99cc1b3149ea1f7349fed2372b/dev-libs/libusb/libusb-1.0.21.ebuild [modify] https://crrev.com/bd16cba511f10d99cc1b3149ea1f7349fed2372b/dev-libs/libusb/metadata.xml [add] https://crrev.com/bd16cba511f10d99cc1b3149ea1f7349fed2372b/metadata/md5-cache/dev-libs/libusb-1.0.21
,
Feb 5 2018
,
Mar 9 2018
,
Aug 1
|
||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||
Comment 1 by nsanders@chromium.org
, May 17 2017Summary: servo connection problem / libusb failure (was: servo connection problem)