New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 873687 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Sep 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: ----



Sign in to add a comment

Bob-paladin: logging_CrashSender fails in policy signature verification

Project Member Reported by xixuan@chromium.org, Aug 13

Issue description

It happens at 3 successive builders:
https://uberchromegw.corp.google.com/i/chromeos/builders/bob-paladin/builds/3797
https://uberchromegw.corp.google.com/i/chromeos/builders/bob-paladin/builds/3798
https://uberchromegw.corp.google.com/i/chromeos/builders/bob-paladin/builds/3799

In 3798 & 3799, logging_CrashSender failed both and the second retry lasts too long and gets aborted.

Assign to sheriff to investigate whether there's recent changes making it flaky. cc gardener & ARC Constable.


If it continues to fail logging_CrashSender, bob will be moved to experimental.
 
It continues with flaky login_* tests and one failure, I mark bob-paladin as experimental.

https://uberchromegw.corp.google.com/i/chromeos/builders/bob-paladin/builds/3800
Owner: derat@chromium.org
Status: Available (was: Untriaged)
Summary: Bob-paladin: logging_CrashSender fails in policy signature verification (was: Bob-paladin: login_* test becomes flaky)
Let's keep this bug about logging_CrashSender, the login_RetrieveActiveSessions may be something completely different.

For logging_CrashSender, we find this in the log when trying to mock a crash sending:

20:56:37 DEBUG| crash_sender stdout/stderr: [0812/205607:ERROR:device_policy_impl.cc(712)] Signature does not match the data or can not be verified!
[0812/205607:ERROR:device_policy_impl.cc(752)] Policy signature verification failed!

There's also a ~30-50 sec delay every time, which is not there normally for the test, and ultimately cumulates up to time out the suite. I think I narrowed this down to crash_sender calling 'metrics_library -c', which tries to check device policy (via libpolicy) for consent to send stats.

As far as I can tell, this message appears when /var/lib/whitelist/owner.key and /var/lib/whitelist/policy do not match. It looks like the test is installing its own versions of these files in https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/cros/crash/crash_test.py?g=0&l=195 to make sure that doesn't happen. Apparently something there breaks here, but I found no recent changes in the test, metrics_library or libpolicy that would be an obvious culprit.

This policy stuff is really way out of my area and I don't really know how any of it works or who's even working on it. Dan, looks like you have been reviewing most libpolicy changes recently. Can you find a suitable owner for this?
Note that there also seems to be a disk space exhaustion issue due to a kernel warning firing 10 times a second on Bob (and Kevin, but Kevin has bigger disks) right now. I think it's quite possible that these weird failures are fallout from that (e.g. some daemon not being able to write something it wants to and then freaking out about it...). I filed it as  issue 873822 , so if we can't get any further here maybe we should wait and see if fixing that magically resolves things.
Cc: derat@chromium.org tnagel@chromium.org igorcov@chromium.org adokar@google.com
Owner: mnissler@chromium.org
Sorry, I don't know much about this code. Adding some people who probably do.
The failure turned into a timeout in the same test:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2859620

However, I can see the symptoms of  issue 873687  in /var/log/messages, so possibly related too.
^^ I mean the symptoms of  issue 873822  .
Status: WontFix (was: Available)
logging_CrashSender looks green right now, closing WontFix.

Sign in to add a comment