logging_UserCrash: Fails on peach_pit-chrome-pfq and veyron_minnie-tot-chrome-pfq-informational |
|||||||||
Issue descriptionP1 as it will prevent the PFQ from uprevving Chrome. - peach_pit-chrome-pfq: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924376480584237008 - veyron_minnie-tot-chrome-pfq-informational: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924366477814896480 Both failures produce: 01/14 02:13:51.791 WARNI| test:0606| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 600, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/common_lib/test.py", line 800, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/common_lib/test.py", line 464, in execute postprocess_profiled_run, args, dargs) File "/usr/local/autotest/common_lib/test.py", line 371, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/tests/logging_UserCrash/logging_UserCrash.py", line 322, in run_once initialize_crash_reporter=True) File "/usr/local/autotest/cros/crash/crash_test.py", line 731, in run_crash_tests getattr(self, '_test_' + test_name)() File "/usr/local/autotest/tests/logging_UserCrash/logging_UserCrash.py", line 64, in _test_chronos_crasher self._check_crashing_process('chronos') File "/usr/local/autotest/cros/crash/user_crash_test.py", line 510, in _check_crashing_process 'minidump') File "/usr/local/autotest/cros/crash/user_crash_test.py", line 368, in _check_generated_report_sending raise error.TestFail('Report should not have signature') TestFail: Report should not have signature - Example full log is here: https://00e9e64bacad4d2b3471ab9aaaeaf335caa99033ed0b4d237e-apidata.googleusercontent.com/download/storage/v1/b/chromeos-autotest-results/o/276800607-chromeos-test%2Fchromeos6-row2-rack11-host2%2Flogging_UserCrash%2Fdebug%2Flogging_UserCrash.DEBUG?qk=AD5uMEv7W19-ZC_lUdobN-kG9mxBfMcBlmEHP3KZydoFcpkwW83MinMYXyxUcOPTTyQu3jk3z8nwOPJjJ5x5QdDI5xz40hHcgsFGxSfnkhdF94NSM3D1fNzJUYQq_m-t8KzZPPsOXoRHLfvI8HjOXgx1A4SXofW_B5BHoMH41if1hPUVJSqgfafebD5ZPbAq6rbHdzNUoIQYEx7vSRMrjp-Ae8ZRL4zcWaMRgSPTMTnLIqL5zGsk1hVc3VpRTT5XoU6y3Wln1LMJgPbOm6A8l66Nre7AMnWsZQu74SVMfB53m-z-Plij2jqCdZX67jBCD89byDU4kIRa6Zn0lVBppE2meDYMJD5yRqxUI2d_VLbqrIB9_1TwhPEdhPrtfAqyTR4w-yjW9LbcI49G7aoVmo6XEuSF7uqtkzhP2Ty89P70dZKHPFPR8pZ8VXlu7XDSk6baXE3RKJPQ_iVEGjnPc3lplvuDx_SMgQDtOnHJYVBQo7nbjEWOfO3rr-kpknSUUzAvbcjmtc5T-ph6ugyO8G3G_jthiVVtaFVzwOWiUKJfndCMoOBhCev-QF11nzhEmHdSTHDmSe_cQOHD-F20PwGe0AukWnjXTwcPSC78dOG8lyE2FHbAVBdOg9p-L7G8msyUaJSfong4JknxOqIc8Up3YI7c0VTEfhbH-25tuGPa7H0JNjfYWfRQ9whzzxFVaYN72USKeALTOBwfadehr42DIHmxm0yZ7ZKKvIS4NBa2BLDtYi41s6LxVWv-ZpsnhUEzlresO52jvCHUAMmysuhlMFu5Liru551YtRoxbF8FmqqaoFZsaDfTpxdYpBCfiSU7vE3gSILDVlHHnSM4b7el41JKx-88qsdf-a-s98wZB9jB_F0059Q0SeK8cN5mKl_Ctsxb0SL8WJ_KlHHf1twf8jzpwCZvGA
,
Jan 16
(6 days ago)
vapier@ or benchan@ Could you please take a look and see if this something you can fix?
This same error ("Report should not have signature") caused:
- Last night's PFQ build to fail: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924180808655505632
- Most recent peach_pit-tot-chrome-pfq-informational: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924178119767498400
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
i stopped paying attention to flaky failures as they pretty much were all issue 740422. this one might be a "real" flake which is due to the test framework.
,
Jan 16
(6 days ago)
vapier@, issue 740422 seems to be related to a permission denied error when calling getcwd. This one however seems to be specific to crash reporting. Are they really related? I'm not sure which signature that is that shouldn't be in the report, because this line [1] in the test code produces the following log in the test logs: 01/14 02:13:47.690 DEBUG| user_crash_test:0311| Crasher process result: {'crashed': True, 'returncode': 139, 'output': 'pid=20731\nCrashing as requested.\n', 'crash_reporter_caught': True} As you can see, no 'sig' key in the result. Yet, this [2] later complains about its existence: TestFail: Report should not have signature [1]: https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/cros/crash/user_crash_test.py?type=cs&q=%22Report+should+not+have+signature%22+p:chromeos_&g=0&l=311 [2]: https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/cros/crash/user_crash_test.py?type=cs&q=%22Report+should+not+have+signature%22+p:chromeos_&g=0&l=368
,
Jan 17
(5 days ago)
We have a similar failure in the CQ TestFail: Report signature mismatch: -66, vs crash_reporter-user-collection https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924077289994238816 I am wondering if it's the same flake. Should I file a separate bug? Or maybe whoever looks at this can decide. BTW we need an owner for this. Suggestions are welcomed. Volunteers are even more welcomed!
,
Jan 17
(5 days ago)
Oh wait. kernel | dawid.kurek-AT-displaylink.com | 1411588 | drm/evdi: Log process executable name along with PID this might change the crash signature.
,
Jan 17
(5 days ago)
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1411588 Seems very specific though. Not sure this can happen in the autotest.
,
Jan 17
(5 days ago)
LOL, this has some terrible test code. It's basically grepping for 'sig:' in our logs. So we're triggering on stuff like this: Jan 14 05:51:03 shill[1579]: [INFO:wifi_service.cc(786)] Rep ep updated for 65. [SSID=GoogleGuest-Legacy], bssid: 94:b4:0f:ff:d0:e0, sig: -70, sec: none, freq: 2412 But...that's not normally in syslog. Oh, wait, we're starting to use journalctl, which merges in shill logs? I bet that caused this then.
,
Jan 17
(5 days ago)
Thank you Brian! Let's disable this test until it's fixed.
,
Jan 17
(5 days ago)
(BTW, I came here because I saw a paladin failing on this.)
@10: Sure, assigning to you disabling :) If I'm reading correctly, then that's the only test directly looking for 'sig:' in this way. But anything using the 'crash_test' / 'user_crash_test' modules could be subject to similar errors, because of this in cros_logging:
if os.path.exists("/var/log/journal"):
return JournalLogReader()
else:
return LogReader()
which suddenly means there are a lot more logs going into this log-parsing code. The surest fix would be to back out some of b/120686750 I expect. But that's really not the main problem, so do what you prefer.
I might also take a stab at tightening up the crash-log parsing in the meantime, to fix this properly.
,
Jan 17
(5 days ago)
I have two CLs. This one disables the test: https://crrev.com/c/1419220 This one fixes it. https://crrev.com/c/1418986 We should probably just fix it because it's simple.
,
Jan 18
(5 days ago)
shill normally sends it's logs to syslog. We have syslog configured to put log messages from shill and certain other sources into /var/log/net.log. Note that messages of ERR or more important also go into /var/log/messages. /var/log/messages is just the default location for log messages from sources that don't have their own file. journald shouldn't change any of this since it just forwards messages to syslog. As mentioned in comment#11, this seems to be due to existing code added by the lakitu folks to read journal logs when they exist. I had no idea that it was there, but it'll make some of the work to moving to journald easier :) It was recommended that I back out the journald change and add it back in after the 73 branch next week in order to give it more bake time. I think it's a good idea, so I'm going to go ahead and do it. The underlying tests should still be fixed though.
,
Jan 18
(5 days ago)
The underlying test is so broken and the fix is (I think) so easy that I suggest we go that way instead.
,
Jan 18
(5 days ago)
> journald shouldn't change any of this since it just forwards messages to syslog. The effective change is because shill's INFO logs now show up in journalctl (and, as noted, the existing test logic for lakitu is picking this up instead of /var/log/messages).
,
Jan 18
(5 days ago)
re-comment #5: i'm not saying this is issue 740422. because of that issue, i've ignored flakes in any of these tests because they've pretty much all been bugs in the autotest framework.
,
Jan 18
(5 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f486553a988458f5ccae9089b84a0c76988a345d commit f486553a988458f5ccae9089b84a0c76988a345d Author: Luigi Semenzato <semenzato@chromium.org> Date: Fri Jan 18 02:06:03 2019 crash_test: make RE searches more precise. The immediate bug is caused by searching for "sig: (\S+)", which incorrectly matches log lines from shill. Other patterns were at risk too. BUG= chromium:921679 TEST=ran the test Change-Id: I30c7b35e02fb1b9d6e66dc4b010197a6f495f7b9 Reviewed-on: https://chromium-review.googlesource.com/c/1418986 Reviewed-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Brian Norris <briannorris@chromium.org> Commit-Queue: Luigi Semenzato <semenzato@chromium.org> Tested-by: Luigi Semenzato <semenzato@chromium.org> Tested-by: Brian Norris <briannorris@chromium.org> [modify] https://crrev.com/f486553a988458f5ccae9089b84a0c76988a345d/client/cros/crash/crash_test.py
,
Jan 18
(5 days ago)
,
Jan 18
(5 days ago)
thanks guys for tracking it down |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by afakhry@chromium.org
, Jan 15