unexpected indent regression, aborting |
|||||||||||||
Issue descriptionIt's a one-off so I am not sure any action is required now, but I'd like to keep track of it, just in case. login_Cryptohome: retry_count: 2, 71 reports, FAIL: Unhandled LoginException: Timed out going through login screen. Cryptohome not mounted. OOBE not dismissed. http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=80699117 https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/80699117-chromeos-test/chromeos4-row8-rack3-host7/ autoserv.DEBUG has this, which is confusing---I think that the "unexpected indent regression" may be from the logging manager, which in this case is not helping at all---is it aborting and thus losing some more useful information? If that is so it would be sad. 10/13 06:08:48.754 ERROR| logging_manager:0626| STATUS: START login_Cryptohome login_Cryptohome timestamp=1476362072 localtime=Oct 13 05:34:32 10/13 06:08:48.754 ERROR| logging_manager:0626| parsing partial test login_Cryptohome login_Cryptohome 10/13 06:08:48.754 ERROR| logging_manager:0626| RUNNING: RUNNING 10/13 06:08:48.755 ERROR| logging_manager:0626| Subdir: login_Cryptohome 10/13 06:08:48.755 ERROR| logging_manager:0626| Testname: login_Cryptohome 10/13 06:08:48.755 ERROR| logging_manager:0626| 10/13 06:08:48.756 ERROR| logging_manager:0626| 10/13 06:08:48.756 ERROR| logging_manager:0626| STATUS: INFO ---- ---- Job aborted by autotest_system on 2016-10-13 06:08:41 10/13 06:08:48.756 ERROR| logging_manager:0626| Unexpected indent regression, aborting 10/13 06:08:48.756 ERROR| logging_manager:0626| 10/13 06:08:48.757 ERROR| logging_manager:0626| STATUS: END ABORT login_Cryptohome login_Cryptohome None 10/13 06:08:48.757 ERROR| logging_manager:0626| parsing test login_Cryptohome login_Cryptohome 10/13 06:08:48.757 ERROR| logging_manager:0626| ADD: ABORT
,
Jan 11 2017
It's really not clear what goes on. Is the "logging manager" parsing error messages? That is a very dangerous thing to do, because: 1. is the "unexpected indent" causing the failure? 2. is the "unexpected indent" possibly hiding the failure cause? Can we get rid of this "logging manager" completely?
,
Jan 11 2017
I found another. This is causing flaky failures of a test I added, desktopui_MashLogin: http://chromegw.corp.google.com/i/chromeos/builders/x86-zgb-release/builds/5358 https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/95684338-chromeos-test/chromeos6-row2-rack7-host8/debug/ 01/11 09:49:01.496 ERROR| logging_manager:0626| MACHINE NAME: chromeos6-row2-rack7-host8 01/11 09:49:01.496 ERROR| logging_manager:0626| MACHINE GROUP: x86-zgb 01/11 09:49:01.499 ERROR| logging_manager:0626| parsing partial test ---- SERVER_JOB 01/11 09:49:01.499 ERROR| logging_manager:0626| 01/11 09:49:01.499 ERROR| logging_manager:0626| STATUS: INFO ---- ---- kernel=3.8.11 localtime=Jan 11 09:28:08 timestamp=1484155688 01/11 09:49:01.499 ERROR| logging_manager:0626| 01/11 09:49:01.499 ERROR| logging_manager:0626| STATUS: START ---- ---- timestamp=1484155705 localtime=Jan 11 09:28:25 01/11 09:49:01.500 ERROR| logging_manager:0626| parsing partial test None CLIENT_JOB.0 01/11 09:49:01.500 ERROR| logging_manager:0626| RUNNING: RUNNING 01/11 09:49:01.500 ERROR| logging_manager:0626| CLIENT_JOB.0 01/11 09:49:01.500 ERROR| logging_manager:0626| 01/11 09:49:01.500 ERROR| logging_manager:0626| 01/11 09:49:01.501 ERROR| logging_manager:0626| STATUS: START desktopui_MashLogin desktopui_MashLogin timestamp=1484155705 localtime=Jan 11 09:28:25 01/11 09:49:01.501 ERROR| logging_manager:0626| parsing partial test desktopui_MashLogin desktopui_MashLogin 01/11 09:49:01.506 ERROR| logging_manager:0626| RUNNING: RUNNING 01/11 09:49:01.506 ERROR| logging_manager:0626| Subdir: desktopui_MashLogin 01/11 09:49:01.506 ERROR| logging_manager:0626| Testname: desktopui_MashLogin 01/11 09:49:01.507 ERROR| logging_manager:0626| 01/11 09:49:01.507 ERROR| logging_manager:0626| 01/11 09:49:01.507 ERROR| logging_manager:0626| STATUS: GOOD desktopui_MashLogin desktopui_MashLogin timestamp=1484156106 localtime=Jan 11 09:35:06 completed successfully 01/11 09:49:01.507 ERROR| logging_manager:0626| update RUNNING reason: completed successfully 01/11 09:49:01.507 ERROR| logging_manager:0626| 01/11 09:49:01.508 ERROR| logging_manager:0626| STATUS: END GOOD desktopui_MashLogin desktopui_MashLogin timestamp=1484156106 localtime=Jan 11 09:35:06 01/11 09:49:01.508 ERROR| logging_manager:0626| parsing test desktopui_MashLogin desktopui_MashLogin 01/11 09:49:01.508 ERROR| logging_manager:0626| ADD: GOOD 01/11 09:49:01.508 ERROR| logging_manager:0626| Subdir: desktopui_MashLogin 01/11 09:49:01.508 ERROR| logging_manager:0626| Testname: desktopui_MashLogin 01/11 09:49:01.508 ERROR| logging_manager:0626| completed successfully 01/11 09:49:01.509 ERROR| logging_manager:0626| 01/11 09:49:01.509 ERROR| logging_manager:0626| STATUS: INFO ---- ---- Job aborted by autotest_system on 2017-01-11 09:48:46 01/11 09:49:01.509 ERROR| logging_manager:0626| Unexpected indent regression, aborting 01/11 09:49:01.518 ERROR| logging_manager:0626| 01/11 09:49:01.518 ERROR| logging_manager:0626| STATUS: END ABORT desktopui_MashLogin desktopui_MashLogin None 01/11 09:49:01.518 ERROR| logging_manager:0626| parsing test desktopui_MashLogin desktopui_MashLogin 01/11 09:49:01.519 ERROR| logging_manager:0626| ADD: ABORT 01/11 09:49:01.519 ERROR| logging_manager:0626| Subdir: desktopui_MashLogin 01/11 09:49:01.519 ERROR| logging_manager:0626| Testname: desktopui_MashLogin 01/11 09:49:01.519 ERROR| logging_manager:0626| None 01/11 09:49:01.519 ERROR| logging_manager:0626| 01/11 09:49:01.521 ERROR| logging_manager:0626| STATUS: INFO ---- ---- Job aborted by autotest_system on 2017-01-11 09:48:46 01/11 09:49:01.521 ERROR| logging_manager:0626| parsing test ---- SERVER_JOB 01/11 09:49:06.831 ERROR| dev_server:0353| Devserver call failed: "http://100.108.1.152:8082/check_health?", timeout: 2.0 seconds, Error: retry exception (label="make_call"), timeout = 2s Also https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/95505854-chromeos-test/chromeos2-row6-rack6-host7/debug/ Issue 679213 is the MashLogin test. Here is the code that prints the message. It's some kind of result parser: https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/tko/parsers/version_1.py?q=%22Unexpected+indent+regression,+aborting%22&sq=package:%5Echromeos_(public%7Cinternal)$&l=286 Assigning to today's sheriff since it's causing bvt-cq failures.
,
Jan 11 2017
Reassigning to the Infra deputy: This is a bug in Autotest,
apparently caused by aborting the job. The abort is (I
presume) happening due to a timeout.
The relevant evidence is the status.log file. Here's the
important stuff:
====
INFO ---- ---- kernel=3.8.11 localtime=Jan 11 09:28:08 timestamp=1484155688
START ---- ---- timestamp=1484155705 localtime=Jan 11 09:28:25
START desktopui_MashLogin desktopui_MashLogin timestamp=1484155705 localtime=Jan 11 09:28:25
GOOD desktopui_MashLogin desktopui_MashLogin timestamp=1484156106 localtime=Jan 11 09:35:06 completed successfully
END GOOD desktopui_MashLogin desktopui_MashLogin timestamp=1484156106 localtime=Jan 11 09:35:06
INFO ---- ---- Job aborted by autotest_system on 2017-01-11 09:48:46
====
The error message means there's no 'END' in the first column
(corresponding to the first 'START') after the indented 'END GOOD'.
I'm not sure why the "aborted" message is labeled "INFO". I think
there's supposed to be separate syntax to indicate aborts. I don't
know why that wasn't done right.
For triage purposes, note that really, there are two bugs here:
* Something is causing periodic timeouts.
* The results from the aborted tests aren't being reported/parsed
correctly.
,
Jan 11 2017
More details in issue 679213 -- that test was generating a lot of core files, despite it passing. This filled the disk and causing an rsync timeout. Perhaps that's related to the lack of END.
,
Jan 18 2017
,
Jan 18 2017
Pass to deputy
,
Jan 18 2017
shuqianz@, Don was trying to find someone working on Autotest to fix the actual bug. I will let Aviv to decide who can work on this bug. Assign to Aviv.
,
Jan 30 2017
,
Feb 2 2017
Will take a stab at this.
,
Feb 3 2017
There is a choice of (at least) two directions here. 1. make sure that we only generate "correct" logs (is this even possible?) 2. make the tko parser more tolerant. What was the main purpose for making the parser picky? Consistency checks? Also, IIRC, when I started writing the log parser script for classifying build failures, it was remarked that such script would not be robust. I agreed with that, but indicated that I considered it a temporary tool (and in fact it is not committed anywhere). What are the thoughts regarding this particular tool, which does something similar, yet appears to be used as a long-term solution? Should it stay or go away completely? I noticed that unfortunately its authors have moved on. Is anybody claiming ownership? Thanks.
,
Feb 3 2017
Also look at issue 658549: "non-status line, ignoring". This is when status.log contains a python stack trace.
,
Feb 3 2017
FYI: There is a related issue 684311 . I think bad indent means that some form of concurrency is broken. In this case the job was aborted (which sounds like a reasonable event), but this interference from outside was not expected, which I think it should always be. So fixing parsing in the case of aborts seems reasonable. In issue 684311 maybe one could fix parsing, but we don't understand why the prints happened, which means likely something with the concurrency model is broken. Everything needs to be done on a case by case basis.
,
Feb 3 2017
#13 thank you. What do you mean by "some form of concurrency is broken"? Are multiple processes or threads writing to this file, which is then used as a legitimate data structure? The fact that python stack traces can also be dumped into this file makes one wonder about the entire approach. Are my worries possibly unreasonable/unfounded?
,
Feb 3 2017
I think your worries are my worries as well.
,
Feb 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/e70648162260bfd46a56fd1a1d1d7f86ec01e600 commit e70648162260bfd46a56fd1a1d1d7f86ec01e600 Author: Luigi Semenzato <semenzato@chromium.org> Date: Sat Feb 11 08:35:23 2017 TKO parser: improve some error messages in debug log These changes try to make it clearer that the errors aren't critical to the execution of the test, and also slightly reduce the spew to improve log readability. BUG=chromium:659741 TEST=none Change-Id: I1a2332becb6d6d1fbd5f23f39dbbc502a631d37c Reviewed-on: https://chromium-review.googlesource.com/437707 Commit-Ready: Luigi Semenzato <semenzato@chromium.org> Tested-by: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> [modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/parsers/version_1.py [modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/server/server_job.py [modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/status_lib_unittest.py [modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/parse.py [modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/parsers/test/scenario_base.py [modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/status_lib.py [add] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/parser_lib.py [modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/utils.py [modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/parsers/version_0.py
,
Mar 22 2017
,
Jul 17 2017
ChromeOS Infra P1 Bugscrub. P1 Bugs in this component should be important enough to get weekly status updates. Is this already fixed? -> Fixed Is this no longer relevant? -> Archived or WontFix Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority. Is this a Feature Request rather than a bug? Type -> Feature Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign. Does this bug have the wrong owner? -> reassign. Bugs that remain in this state next week will be downgraded to P2.
,
Jul 24 2017
ChromeOS Infra P1 Bugscrub. Issue untouched in a week after previous message. Downgrading to P2.
,
Jun 8 2018
Hi, this bug has not been updated recently. Please acknowledge the bug and provide status within two weeks (6/22/2018), or the bug will be archived. Thank you.
,
Jun 8 2018
|
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by semenzato@chromium.org
, Jan 6 2017Components: -OS>Kernel Infra>Client>ChromeOS
Labels: -Pri-3 OS-Chrome Pri-2
Owner: ----
Summary: unexpected indent regression, aborting (was: login_Cryptohome failed in carndy-release build 479)