New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 921679 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 18
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

logging_UserCrash: Fails on peach_pit-chrome-pfq and veyron_minnie-tot-chrome-pfq-informational

Project Member Reported by afakhry@chromium.org, Jan 14

Issue description

P1 as it will prevent the PFQ from uprevving Chrome.

- peach_pit-chrome-pfq: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924376480584237008
- veyron_minnie-tot-chrome-pfq-informational: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924366477814896480

Both failures produce: 

01/14 02:13:51.791 WARNI|              test:0606| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/common_lib/test.py", line 600, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/common_lib/test.py", line 800, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/common_lib/test.py", line 464, in execute
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/common_lib/test.py", line 371, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/tests/logging_UserCrash/logging_UserCrash.py", line 322, in run_once
    initialize_crash_reporter=True)
  File "/usr/local/autotest/cros/crash/crash_test.py", line 731, in run_crash_tests
    getattr(self, '_test_' + test_name)()
  File "/usr/local/autotest/tests/logging_UserCrash/logging_UserCrash.py", line 64, in _test_chronos_crasher
    self._check_crashing_process('chronos')
  File "/usr/local/autotest/cros/crash/user_crash_test.py", line 510, in _check_crashing_process
    'minidump')
  File "/usr/local/autotest/cros/crash/user_crash_test.py", line 368, in _check_generated_report_sending
    raise error.TestFail('Report should not have signature')
TestFail: Report should not have signature


- Example full log is here: 

https://00e9e64bacad4d2b3471ab9aaaeaf335caa99033ed0b4d237e-apidata.googleusercontent.com/download/storage/v1/b/chromeos-autotest-results/o/276800607-chromeos-test%2Fchromeos6-row2-rack11-host2%2Flogging_UserCrash%2Fdebug%2Flogging_UserCrash.DEBUG?qk=AD5uMEv7W19-ZC_lUdobN-kG9mxBfMcBlmEHP3KZydoFcpkwW83MinMYXyxUcOPTTyQu3jk3z8nwOPJjJ5x5QdDI5xz40hHcgsFGxSfnkhdF94NSM3D1fNzJUYQq_m-t8KzZPPsOXoRHLfvI8HjOXgx1A4SXofW_B5BHoMH41if1hPUVJSqgfafebD5ZPbAq6rbHdzNUoIQYEx7vSRMrjp-Ae8ZRL4zcWaMRgSPTMTnLIqL5zGsk1hVc3VpRTT5XoU6y3Wln1LMJgPbOm6A8l66Nre7AMnWsZQu74SVMfB53m-z-Plij2jqCdZX67jBCD89byDU4kIRa6Zn0lVBppE2meDYMJD5yRqxUI2d_VLbqrIB9_1TwhPEdhPrtfAqyTR4w-yjW9LbcI49G7aoVmo6XEuSF7uqtkzhP2Ty89P70dZKHPFPR8pZ8VXlu7XDSk6baXE3RKJPQ_iVEGjnPc3lplvuDx_SMgQDtOnHJYVBQo7nbjEWOfO3rr-kpknSUUzAvbcjmtc5T-ph6ugyO8G3G_jthiVVtaFVzwOWiUKJfndCMoOBhCev-QF11nzhEmHdSTHDmSe_cQOHD-F20PwGe0AukWnjXTwcPSC78dOG8lyE2FHbAVBdOg9p-L7G8msyUaJSfong4JknxOqIc8Up3YI7c0VTEfhbH-25tuGPa7H0JNjfYWfRQ9whzzxFVaYN72USKeALTOBwfadehr42DIHmxm0yZ7ZKKvIS4NBa2BLDtYi41s6LxVWv-ZpsnhUEzlresO52jvCHUAMmysuhlMFu5Liru551YtRoxbF8FmqqaoFZsaDfTpxdYpBCfiSU7vE3gSILDVlHHnSM4b7el41JKx-88qsdf-a-s98wZB9jB_F0059Q0SeK8cN5mKl_Ctsxb0SL8WJ_KlHHf1twf8jzpwCZvGA


 
Cc: semenzato@chromium.org
Failed again in: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924242727641122208, seems like a flaky issue.

Comment 2 by afakhry@chromium.org, Jan 16 (6 days ago)

Cc: benchan@chromium.org
vapier@ or benchan@ Could you please take a look and see if this something you can fix?

This same error ("Report should not have signature") caused:
- Last night's PFQ build to fail: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924180808655505632
- Most recent peach_pit-tot-chrome-pfq-informational: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924178119767498400

Comment 3 by rsesek@chromium.org, Jan 16 (6 days ago)

Cc: -rsesek@chromium.org

Comment 4 by vapier@chromium.org, Jan 16 (6 days ago)

Components: -Internals>CrashReporting
i stopped paying attention to flaky failures as they pretty much were all issue 740422.  this one might be a "real" flake which is due to the test framework.

Comment 5 by afakhry@chromium.org, Jan 16 (6 days ago)

Cc: deanliao@chromium.org
vapier@, issue 740422 seems to be related to a permission denied error when calling getcwd. This one however seems to be specific to crash reporting. Are they really related?

I'm not sure which signature that is that shouldn't be in the report, because this line [1] in the test code produces the following log in the test logs:

01/14 02:13:47.690 DEBUG|   user_crash_test:0311| Crasher process result: {'crashed': True, 'returncode': 139, 'output': 'pid=20731\nCrashing as requested.\n', 'crash_reporter_caught': True}

As you can see, no 'sig' key in the result. Yet, this [2] later complains about its existence:

TestFail: Report should not have signature

[1]: https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/cros/crash/user_crash_test.py?type=cs&q=%22Report+should+not+have+signature%22+p:chromeos_&g=0&l=311

[2]: https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/cros/crash/user_crash_test.py?type=cs&q=%22Report+should+not+have+signature%22+p:chromeos_&g=0&l=368

Comment 6 by semenzato@chromium.org, Jan 17 (5 days ago)

Cc: evanhernandez@chromium.org
We have a similar failure in the CQ

TestFail: Report signature mismatch: -66, vs crash_reporter-user-collection

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924077289994238816

I am wondering if it's the same flake.  Should I file a separate bug?  Or maybe whoever looks at this can decide.

BTW we need an owner for this.  Suggestions are welcomed.  Volunteers are even more welcomed!

Comment 7 by semenzato@chromium.org, Jan 17 (5 days ago)

Oh wait.  

kernel | dawid.kurek-AT-displaylink.com | 1411588 | drm/evdi: Log process executable name along with PID

this might change the crash signature.


Comment 8 by semenzato@chromium.org, Jan 17 (5 days ago)

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1411588

Seems very specific though.  Not sure this can happen in the autotest.

Comment 9 by briannorris@chromium.org, Jan 17 (5 days ago)

Cc: cmtm@chromium.org
LOL, this has some terrible test code. It's basically grepping for 'sig:' in our logs. So we're triggering on stuff like this:

Jan 14 05:51:03 shill[1579]: [INFO:wifi_service.cc(786)] Rep ep updated for 65. [SSID=GoogleGuest-Legacy], bssid: 94:b4:0f:ff:d0:e0, sig: -70, sec: none, freq: 2412


But...that's not normally in syslog. Oh, wait, we're starting to use journalctl, which merges in shill logs? I bet that caused this then.

Comment 10 by semenzato@chromium.org, Jan 17 (5 days ago)

Thank you Brian!
Let's disable this test until it's fixed.

Comment 11 by briannorris@chromium.org, Jan 17 (5 days ago)

Owner: semenzato@chromium.org
Status: Assigned (was: Untriaged)
(BTW, I came here because I saw a paladin failing on this.)

@10: Sure, assigning to you disabling :) If I'm reading correctly, then that's the only test directly looking for 'sig:' in this way. But anything using the 'crash_test' / 'user_crash_test' modules could be subject to similar errors, because of this in cros_logging:

    if os.path.exists("/var/log/journal"):
        return JournalLogReader()
    else:
        return LogReader()

which suddenly means there are a lot more logs going into this log-parsing code. The surest fix would be to back out some of b/120686750 I expect. But that's really not the main problem, so do what you prefer.

I might also take a stab at tightening up the crash-log parsing in the meantime, to fix this properly.

Comment 12 by semenzato@chromium.org, Jan 17 (5 days ago)

I have two CLs.  This one disables the test:
https://crrev.com/c/1419220

This one fixes it.
https://crrev.com/c/1418986

We should probably just fix it because it's simple.

Comment 13 by cmtm@chromium.org, Jan 18 (5 days ago)

shill normally sends it's logs to syslog. We have syslog configured to put log messages from shill and certain other sources into  /var/log/net.log. Note that messages of ERR or more important also go into /var/log/messages. /var/log/messages is just the default location for log messages from sources that don't have their own file. journald shouldn't change any of this since it just forwards messages to syslog. As mentioned in comment#11, this seems to be due to existing code added by the lakitu folks to read journal logs when they exist. I had no idea that it was there, but it'll make some of the work to moving to journald easier :)

It was recommended that I back out the journald change and add it back in after the 73 branch next week in order to give it more bake time. I think it's a good idea, so I'm going to go ahead and do it. The underlying tests should still be fixed though.

Comment 14 by semenzato@chromium.org, Jan 18 (5 days ago)

The underlying test is so broken and the fix is (I think) so easy that I suggest we go that way instead.

Comment 15 by briannorris@chromium.org, Jan 18 (5 days ago)

> journald shouldn't change any of this since it just forwards messages to syslog.

The effective change is because shill's INFO logs now show up in journalctl (and, as noted, the existing test logic for lakitu is picking this up instead of /var/log/messages).

Comment 16 by vapier@chromium.org, Jan 18 (5 days ago)

re-comment #5: i'm not saying this is issue 740422.  because of that issue, i've ignored flakes in any of these tests because they've pretty much all been bugs in the autotest framework.
Project Member

Comment 17 by bugdroid1@chromium.org, Jan 18 (5 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f486553a988458f5ccae9089b84a0c76988a345d

commit f486553a988458f5ccae9089b84a0c76988a345d
Author: Luigi Semenzato <semenzato@chromium.org>
Date: Fri Jan 18 02:06:03 2019

crash_test: make RE searches more precise.

The immediate bug is caused by searching for "sig: (\S+)", which
incorrectly matches log lines from shill.  Other patterns were
at risk too.

BUG= chromium:921679 
TEST=ran the test

Change-Id: I30c7b35e02fb1b9d6e66dc4b010197a6f495f7b9
Reviewed-on: https://chromium-review.googlesource.com/c/1418986
Reviewed-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Commit-Queue: Luigi Semenzato <semenzato@chromium.org>
Tested-by: Luigi Semenzato <semenzato@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>

[modify] https://crrev.com/f486553a988458f5ccae9089b84a0c76988a345d/client/cros/crash/crash_test.py

Comment 18 by briannorris@chromium.org, Jan 18 (5 days ago)

Status: Fixed (was: Assigned)

Comment 19 by vapier@chromium.org, Jan 18 (5 days ago)

thanks guys for tracking it down

Sign in to add a comment