Various "ap" daemons crashed and burned at startup
Reported by
jrbarnette@chromium.org,
Sep 7 2017
|
||||||||
Issue description
This whirlwind-paladin run failed:
https://luci-milo.appspot.com/buildbot/chromeos/whirlwind-paladin/9166
The relevant issue seems to be provision failures on one
DUT. Here's the history:
chromeos4-row10-jetstream-host4
2017-09-07 14:14:43 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-jetstream-host4/1184279-repair/
2017-09-07 13:52:23 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/
The provision log shows a really large number of crashes of
daemons with "ap" in the name:
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054931.2021.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054942.2150.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054954.2363.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054948.2235.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054905.1507.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054949.2245.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054943.2159.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054937.2071.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054936.2064.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054913.1744.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054919.1813.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054948.2089.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054926.1787.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054916.1659.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054955.2375.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054919.1821.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054937.1962.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054907.1680.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054907.1690.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054849.1260.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054901.1612.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054855.1488.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054855.1219.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054925.1929.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054849.1408.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054901.1621.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054913.1752.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054855.1506.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054925.1936.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054931.2014.dmp
INFO ---- New Crash Dump timestamp=1504818469 localtime=Sep 07 14:07:49 /usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/sysinfo/var/spool/crash/ap_controller.20170729.061319.8887.dmp
In particular, the provision itself seems to have failed because
ap_controller was crashing.
,
Sep 7 2017
Srini and Kishan, ap-controller crashes a lot on DUT. Strangely, no CL in this failed run is related to jetstream code. Is it possible this happpend to ToT?
,
Sep 7 2017
+ Laurence The failed HWTest log shows "provision" test failed, and all other tests with normal names PASSED. What is "provision" test about? Is it b/c ap-controller service is not started in this test script? chromeos-server31-93: 3876ece384a87310 3 Autotest instance created: cautotest 09-07-2017 [13:50:02] Created suite job: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=140395156 @@@STEP_LINK@Link to suite@http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=140395156@@@ The suite job has another 2:29:51.077322 till timeout. 09-07-2017 [14:23:34] Suite job is finished. 09-07-2017 [14:23:34] Start collecting test results and dump them to json. Suite job [ PASSED ] jetstream_GuestInterfaces [ PASSED ] provision [ FAILED ] provision FAIL: ap-controller service not found, completed successfully jetstream_ApiServerAttestation [ PASSED ] jetstream_DiagnosticReport [ PASSED ] jetstream_LocalApi [ PASSED ] jetstream_ApiServerDeveloperConfiguration [ PASSED ] jetstream_GcdCommands [ PASSED ] jetstream_PrioritizedDevice [ PASSED ] jetstream_NetworkInterfaces [ PASSED ] jetstream_WanCustomDns [ PASSED ] jetstream_GuestFirewall [ PASSED ]
,
Sep 7 2017
Tao, provisioning is not a test as such; it is the step where the build is installed on the device prior to testing. If the ap-controller service is not started successfully, the provisioning is considered to have failed. We've had some issues with this device in the past. I'm going to lock it so tests will not run on it.
,
Sep 7 2017
grepping around on the device I see: root@chromeos4-row10-jetstream-host4 $ grep "ERR ap" /var/log/messages | head -1 1970-01-01T00:00:22.737860+00:00 ERR ap-av[491]: [ERROR:object_proxy.cc(582)] Failed to call method: org.chromium.flimflam.Manager.GetProperties: object_path= /: org.freedesktop.DBus.Error.ServiceUnknown: The name org.chromium.flimflam was not provided by any .service files root@chromeos4-row10-jetstream-host4 $ grep "ERR at" /var/log/messages | head -1 1970-01-01T00:00:27.225858+00:00 ERR attestationd[994]: TPM error 0x21 (Decryption error): Unseal: Error calling Tspi_Data_Unseal looks alot like debug that occurred here: crbug.com/652565
,
Sep 7 2017
,
Sep 7 2017
This CL may address, https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/654297 Assigning to Andrey to confirm TPM as source of this failure and hopefully his CL as the fix.
,
Sep 8 2017
> Assigning to Andrey to confirm TPM as source of this failure and hopefully his CL as the fix. TPM problems? Ai ya. There've been changes in that space, see bug 762334 .
,
Sep 8 2017
The CL from comment #7 won't help here - it's for a different subj.
,
Sep 8 2017
The test fails because ap-controller is not there when it checks, and ap-controller was clearly crashing and restarting a lot. Unfortunately, the log from the bug descr is trimmed, so no info in the log why it was crashing. Will need to check the coredumps tomorrow. It may be related to the issues from comment #8 (see https://crbug.com/652565#c59 for the cases when ap-controller was crashing a lot after DA lockout - similar situation with issue 762334 ). But not 100% sure at this point. It may still be something else. Here's one alternative lead. The survived part if the log contains a lot of errors from ap family of daemons, like: 2017-07-29T05:51:44.637412+00:00 ERR ap-update-manager[4081]: [ERROR:object_proxy.cc(582)] Failed to call method: org.chromium.flimflam.Manager.GetProperties: object_path= /: org.freedesktop.DBus.Error.ServiceUnknown: The name org.chromium.flimflam was not provided by any .service files ... 2017-07-29T05:51:44.865962+00:00 WARNING ap-rodizio[4134]: [WARNING:object_manager.cc(369)] org.chromium.ap.VorlonClient /org/chromium/ap/VorlonClient: Failed to get managed objects 2017-07-29T05:51:44.877636+00:00 ERR ap-cloud-actions[4157]: [ERROR:cloud_action_manager.cc(286)] Failed to read file: /var/lib/cloud_action/allowed_endpoints 2017-07-29T05:51:44.884229+00:00 ERR ap-update-manager[4081]: [ERROR:object_proxy.cc(582)] Failed to call method: org.freedesktop.DBus.Properties.GetAll: object_path= /org/chromium/ap/GroupMonitor: org.freedesktop.DBus.Error.ServiceUnknown: The name org.chromium.ap.GroupMonitor was not provided by any .service files Though, no crashes.
,
Sep 8 2017
There are some recent changes with some keywords from these logged errors from comment #10 (vorlon, rodizio), may be they have affected something. I.e.: https://chrome-internal-review.googlesource.com/c/chromeos/ap-daemons/+/448563 https://chrome-internal-review.googlesource.com/c/chromeos/ap/wireless/+/448732 https://chrome-internal-review.googlesource.com/c/chromeos/ap-daemons/+/445257 https://chrome-internal-review.googlesource.com/c/chromeos/ap/+/446193 https://chrome-internal-review.googlesource.com/c/chromeos/ap-daemons/+/446198
,
Sep 8 2017
mtyin@, kkunduru@, sduvvuri@: to cover all bases, could you please take a look what's causing ap-controller crashes, and if some recent changes in ap-daemons (e.g. something from comment #11) could have caused what we see in this bug?
,
Sep 8 2017
Looking into this. apronin@, is this tot?
,
Sep 8 2017
Yes, ToT (as it was at the time of that build). The paladin failure from the bug descr (https://luci-milo.appspot.com/buildbot/chromeos/whirlwind-paladin/9166) is for 9920.
,
Sep 8 2017
Ok, thx. Will dig into this issue.
,
Sep 8 2017
First update: flash 9921 and daemons are normal on my test. Between 9920 and 9921, we have several changes landing, will take a look at whether the issue has been fixed.
,
Sep 8 2017
ap-controller did crash at boot up time but eventually started. This might due to recent job controller change. According to the log, RestartServices failed consistently at bootup time and resulted into LoadAndApplyStableConfiguration check failure. jintao@, could you take a look at this?
,
Sep 8 2017
thanks for the info. I will look into this.
,
Sep 8 2017
apronin@, the ap-controller crash during boot time seems an old issue that's been around for a while. We have created a separate ticket to track that issue. https://buganizer.corp.google.com/issues/65469492 I am not sure if the ap-controller crash is the cause of this provisioning failure, but it is possible. The provisioning check may be "lucky" to catch the timing when ap-controller just crashed. Is this a flaky issue? Is this specific to whirlwind?
,
Sep 8 2017
It seems to be flaky. Since the 2 crashes due to ap-controller issues, whirlwind-paladin (https://luci-milo.appspot.com/buildbot/chromeos/whirlwind-paladin/) had green builds, and the red builds since then are due to other issues. So, may be a dup of b/65469492 indeed. As I understand, you are looking for potential unwanted side-effects of CL:*420793 jintao@, assigning to you for further analysis. For TPM, so far, no evidence that it caused those ap-controller crashes (which doesn't mean it's ruled out).
,
Sep 8 2017
apronin@, I will look into this issue or pass it along to someone in our team. It's been verified this failure is not related to CL:*420793, as the same crash was observed in some log from 2 months ago. See b/65469492
,
Sep 8 2017
It is a known issue that ap-controller crash once during boot. We have assigned b/65469492 for shamanthakumar@ to look into this.
,
Sep 11 2017
,
Sep 12 2017
It turns out that GeneratePsk failed when ap-get-setup-setting --psk during ap-controller try to call generate_setup_network. This later leads to ap-controller ReadConfiguration failure. This seems to be not related to the bug mentioned above. Potential reason: Could not retrieve TPM endorsement info OR TPM response missing endorsement public key. Can it be simply cryptohomed did not started yet by the time ap-controller attempted to GeneratePsk?
,
Sep 12 2017
It looks like either TPM is in a bad state or there is issue for the communication between ap-dameons and cryptohome. Is it possible to get those debug information: >ap-attestation-client --endpoint=get-endorsement-information And try this command to see if it can fix ap-daemon crush: rm /var/cache/ap/setup-network; crossystem clear_tpm_owner_request=1; reboot
,
Sep 12 2017
File bug b/65561595 tracking this issue.
,
Sep 12 2017
> It looks like either TPM is in a bad state or [ ... ] bug 762334 will be of interest. There's a new behavior that's known to put the TPM in a bad state. That said, if the TPM were in a bad state, I'd expect it to be reported in addition to any failure in ap-controller.
,
Feb 23 2018
presumed obsolete
,
Feb 23 2018
it might be worth escalating this issue - the child issue b/65561595 is not resolved. The Jetstream team may not be properly responding to issues raised by the Chromium team
,
Feb 23 2018
I updated the child issue b/65561595 with an extremely strongly worded statement, marked it p0, and assigned it to the director level engineer of the Jetstream project |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by jrbarnette@chromium.org
, Sep 7 2017