New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 763152 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Feb 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Various "ap" daemons crashed and burned at startup

Reported by jrbarnette@chromium.org, Sep 7 2017

Issue description

This whirlwind-paladin run failed:
    https://luci-milo.appspot.com/buildbot/chromeos/whirlwind-paladin/9166

The relevant issue seems to be provision failures on one
DUT.  Here's the history:

chromeos4-row10-jetstream-host4
    2017-09-07 14:14:43  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-jetstream-host4/1184279-repair/
    2017-09-07 13:52:23  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/
 
The provision log shows a really large number of crashes of
daemons with "ap" in the name:

INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054931.2021.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054942.2150.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054954.2363.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054948.2235.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054905.1507.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054949.2245.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054943.2159.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054937.2071.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054936.2064.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054913.1744.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054919.1813.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054948.2089.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054926.1787.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054916.1659.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054955.2375.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054919.1821.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054937.1962.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054907.1680.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054907.1690.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054849.1260.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054901.1612.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054855.1488.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_group_manager.20170729.054855.1219.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054925.1929.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054849.1408.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054901.1621.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054913.1752.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054855.1506.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_controller.20170729.054925.1936.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/crashinfo.chromeos4-row10-jetstream-host4/ap_get_setup_setting.20170729.054931.2014.dmp
INFO	----	New Crash Dump	timestamp=1504818469	localtime=Sep 07 14:07:49	/usr/local/autotest/results/hosts/chromeos4-row10-jetstream-host4/1184219-provision/20170709135222/sysinfo/var/spool/crash/ap_controller.20170729.061319.8887.dmp

In particular, the provision itself seems to have failed because
ap_controller was crashing.

 
Cc: jintao@chromium.org tbroch@chromium.org yamaguchi@chromium.org
Cc: sduvvuri@chromium.org kkunduru@chromium.org
Srini and Kishan, ap-controller crashes a lot on DUT. 

Strangely, no CL in this failed run is related to jetstream code.  Is it possible this happpend to ToT?
Cc: lgoo...@chromium.org
+ Laurence

The failed HWTest log shows "provision" test failed, and all other tests with normal names PASSED.  What is "provision" test about? Is it b/c ap-controller service is not started in this test script?



chromeos-server31-93: 3876ece384a87310 3
  Autotest instance created: cautotest
  09-07-2017 [13:50:02] Created suite job: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=140395156
  @@@STEP_LINK@Link to suite@http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=140395156@@@
  The suite job has another 2:29:51.077322 till timeout.
  09-07-2017 [14:23:34] Suite job is finished.
  09-07-2017 [14:23:34] Start collecting test results and dump them to json.
  Suite job                                   [ PASSED ]
  jetstream_GuestInterfaces                   [ PASSED ]
  provision                                   [ FAILED ]
  provision                                     FAIL: ap-controller service not found, completed successfully
  jetstream_ApiServerAttestation              [ PASSED ]
  jetstream_DiagnosticReport                  [ PASSED ]
  jetstream_LocalApi                          [ PASSED ]
  jetstream_ApiServerDeveloperConfiguration   [ PASSED ]
  jetstream_GcdCommands                       [ PASSED ]
  jetstream_PrioritizedDevice                 [ PASSED ]
  jetstream_NetworkInterfaces                 [ PASSED ]
  jetstream_WanCustomDns                      [ PASSED ]
  jetstream_GuestFirewall                     [ PASSED ]

Comment 4 by lgoo...@google.com, Sep 7 2017

Tao, provisioning is not a test as such; it is the step where the build is installed on the device prior to testing. If the ap-controller service is not started successfully, the provisioning is considered to have failed.

We've had some issues with this device in the past. I'm going to lock it so tests will not run on it.

grepping around on the device I see:

root@chromeos4-row10-jetstream-host4 $ grep "ERR ap" /var/log/messages | head -1
1970-01-01T00:00:22.737860+00:00 ERR ap-av[491]: [ERROR:object_proxy.cc(582)] Failed to call method: org.chromium.flimflam.Manager.GetProperties: object_path= /: org.freedesktop.DBus.Error.ServiceUnknown: The name org.chromium.flimflam was not provided by any .service files
root@chromeos4-row10-jetstream-host4 $ grep "ERR at" /var/log/messages | head -1
1970-01-01T00:00:27.225858+00:00 ERR attestationd[994]: TPM error 0x21 (Decryption error): Unseal: Error calling Tspi_Data_Unseal

looks alot like debug that occurred here: crbug.com/652565

Cc: apronin@chromium.org
Cc: -apronin@chromium.org
Owner: apronin@chromium.org
Status: Assigned (was: Untriaged)
This CL may address,

https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/654297

Assigning to Andrey to confirm TPM as source of this failure and hopefully his CL as the fix.
> Assigning to Andrey to confirm TPM as source of this failure and hopefully his CL as the fix.

TPM problems?  Ai ya.  There've been changes in that space, see  bug 762334 .

The CL from comment #7 won't help here - it's for a different subj.
The test fails because ap-controller is not there when it checks, and ap-controller was clearly crashing and restarting a lot. Unfortunately, the log from the bug descr is trimmed, so no info in the log why it was crashing. Will need to check the coredumps tomorrow.

It may be related to the issues from comment #8 (see https://crbug.com/652565#c59 for the cases when ap-controller was crashing a lot after DA lockout - similar situation with  issue 762334 ). But not 100% sure at this point.

It may still be something else. Here's one alternative lead.
The survived part if the log contains a lot of errors from ap family of daemons, like:
2017-07-29T05:51:44.637412+00:00 ERR ap-update-manager[4081]: [ERROR:object_proxy.cc(582)] Failed to call method: org.chromium.flimflam.Manager.GetProperties: object_path= /: org.freedesktop.DBus.Error.ServiceUnknown: The name org.chromium.flimflam was not provided by any .service files
...
2017-07-29T05:51:44.865962+00:00 WARNING ap-rodizio[4134]: [WARNING:object_manager.cc(369)] org.chromium.ap.VorlonClient /org/chromium/ap/VorlonClient: Failed to get managed objects
2017-07-29T05:51:44.877636+00:00 ERR ap-cloud-actions[4157]: [ERROR:cloud_action_manager.cc(286)] Failed to read file: /var/lib/cloud_action/allowed_endpoints
2017-07-29T05:51:44.884229+00:00 ERR ap-update-manager[4081]: [ERROR:object_proxy.cc(582)] Failed to call method: org.freedesktop.DBus.Properties.GetAll: object_path= /org/chromium/ap/GroupMonitor: org.freedesktop.DBus.Error.ServiceUnknown: The name org.chromium.ap.GroupMonitor was not provided by any .service files

Though, no crashes.
Cc: mtyin@chromium.org
mtyin@, kkunduru@, sduvvuri@: to cover all bases, could you please take a look what's causing ap-controller crashes, and if some recent changes in ap-daemons (e.g. something from comment #11) could have caused what we see in this bug?

Comment 13 by mtyin@google.com, Sep 8 2017

Looking into this. apronin@, is this tot?
Yes, ToT (as it was at the time of that build). The paladin failure from the bug descr (https://luci-milo.appspot.com/buildbot/chromeos/whirlwind-paladin/9166) is for 9920.

Comment 15 by mtyin@google.com, Sep 8 2017

Ok, thx. Will dig into this issue.
First update: flash 9921 and daemons are normal on my test. Between 9920 and 9921, we have several changes landing, will take a look at whether the issue has been fixed.
ap-controller did crash at boot up time but eventually started. This might due to recent job controller change. According to the log, RestartServices failed consistently at bootup time and resulted into LoadAndApplyStableConfiguration check failure. jintao@, could you take a look at this?
thanks for the info.  I will look into this. 
apronin@, the ap-controller crash during boot time seems an old issue that's been around for a while.  We have created a separate ticket to track that issue. https://buganizer.corp.google.com/issues/65469492

I am not sure if the ap-controller crash is the cause of this provisioning failure, but it is possible. The provisioning check may be "lucky" to catch the timing when ap-controller just crashed. 

Is this a flaky issue? Is this specific to whirlwind? 


Cc: -jintao@chromium.org apronin@chromium.org
Owner: jintao@chromium.org
It seems to be flaky. Since the 2 crashes due to ap-controller issues, whirlwind-paladin (https://luci-milo.appspot.com/buildbot/chromeos/whirlwind-paladin/) had green builds, and the red builds since then are due to other issues.
So, may be a dup of b/65469492 indeed. As I understand, you are looking for potential unwanted side-effects of CL:*420793

jintao@, assigning to you for further analysis.
For TPM, so far, no evidence that it caused those ap-controller crashes (which doesn't mean it's ruled out).
apronin@, I will look into this issue or pass it along to someone in our team. 

It's been verified this failure is not related to CL:*420793, as the same crash was observed in some log from 2 months ago. See b/65469492
It is a known issue that ap-controller crash once during boot.  We have assigned b/65469492 for shamanthakumar@ to look into this. 


Comment 24 by mtyin@google.com, Sep 12 2017

It turns out that

GeneratePsk failed when ap-get-setup-setting --psk during ap-controller try to call generate_setup_network. This later leads to ap-controller ReadConfiguration failure. This seems to be not related to the bug mentioned above.

Potential reason:
Could not retrieve TPM endorsement info OR
TPM response missing endorsement public key.

Can it be simply cryptohomed did not started yet by the time ap-controller attempted to GeneratePsk?

Comment 25 by kyan@google.com, Sep 12 2017

It looks like either TPM is in a bad state or there is issue for the communication between ap-dameons and cryptohome.

Is it possible to get those debug information:
>ap-attestation-client --endpoint=get-endorsement-information

And try this command to see if it can fix ap-daemon crush:
rm /var/cache/ap/setup-network; crossystem clear_tpm_owner_request=1; reboot

Comment 26 by mtyin@google.com, Sep 12 2017

File bug b/65561595 tracking this issue.
> It looks like either TPM is in a bad state or [ ... ]

 bug 762334  will be of interest.  There's a new behavior that's
known to put the TPM in a bad state.

That said, if the TPM were in a bad state, I'd expect it to
be reported in addition to any failure in ap-controller.
Status: Archived (was: Assigned)
presumed obsolete

Comment 29 by ra...@google.com, Feb 23 2018

it might be worth escalating this issue - the child issue b/65561595 is not resolved. The Jetstream team may not be properly responding to issues raised by the Chromium team

Comment 30 by ra...@google.com, Feb 23 2018

I updated the child issue b/65561595 with an extremely strongly worded statement, marked it p0, and assigned it to the director level engineer of the Jetstream project

Sign in to add a comment