Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Starred by 1 user
Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment
unexpected indent regression, aborting
Project Member Reported by semenzato@chromium.org, Oct 26 2016 Back to list
It's a one-off so I am not sure any action is required now, but I'd like to keep track of it, just in case.


login_Cryptohome: retry_count: 2, 71 reports, FAIL: Unhandled LoginException: Timed out going through login screen. Cryptohome not mounted. OOBE not dismissed.

http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=80699117
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/80699117-chromeos-test/chromeos4-row8-rack3-host7/

autoserv.DEBUG has this, which is confusing---I think that the "unexpected indent regression" may be from the logging manager, which in this case is not helping at all---is it aborting and thus losing some more useful information?  If that is so it would be sad.


10/13 06:08:48.754 ERROR|   logging_manager:0626| STATUS: START	login_Cryptohome	login_Cryptohome	timestamp=1476362072	localtime=Oct 13 05:34:32
10/13 06:08:48.754 ERROR|   logging_manager:0626| parsing partial test login_Cryptohome login_Cryptohome
10/13 06:08:48.754 ERROR|   logging_manager:0626| RUNNING: RUNNING
10/13 06:08:48.755 ERROR|   logging_manager:0626| Subdir: login_Cryptohome
10/13 06:08:48.755 ERROR|   logging_manager:0626| Testname: login_Cryptohome
10/13 06:08:48.755 ERROR|   logging_manager:0626| 
10/13 06:08:48.756 ERROR|   logging_manager:0626| 
10/13 06:08:48.756 ERROR|   logging_manager:0626| STATUS: INFO	----	----	Job aborted by autotest_system on 2016-10-13 06:08:41
10/13 06:08:48.756 ERROR|   logging_manager:0626| Unexpected indent regression, aborting
10/13 06:08:48.756 ERROR|   logging_manager:0626| 
10/13 06:08:48.757 ERROR|   logging_manager:0626| STATUS: END ABORT	login_Cryptohome	login_Cryptohome	None
10/13 06:08:48.757 ERROR|   logging_manager:0626| parsing test login_Cryptohome login_Cryptohome
10/13 06:08:48.757 ERROR|   logging_manager:0626| ADD: ABORT
 
Cc: nxia@chromium.org dgarr...@chromium.org jrbarnette@chromium.org
Components: -OS>Kernel Infra>Client>ChromeOS
Labels: -Pri-3 OS-Chrome Pri-2
Owner: ----
Summary: unexpected indent regression, aborting (was: login_Cryptohome failed in carndy-release build 479)
I found another:

https://uberchromegw.corp.google.com/i/chromeos/builders/stumpy-paladin/builds/27076

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/94722720-chromeos-test/chromeos4-row2-rack9-host1/debug/

01/06 11:00:18.783 ERROR|   logging_manager:0626| {'aborted_by': 'autotest_system', 'job_started': 1483724240, 'offload_failures_only': 'True', 'parent_job_id': 94722714, 'user': 'chromeos-test', 'builds': "{'cros-version': 'stumpy-paladin/R57-9156.0.0-rc3'}", 'aborted_on': 1483729207, 'job_finished': 1483729207, 'hostname': 'chromeos4-row2-rack9-host1', 'status_version': 1, 'label': 'stumpy-paladin/R57-9156.0.0-rc3/suite_attr_wrapper/audio_Microphone', 'drone': 'chromeos-server12.cbf.corp.google.com', 'build': 'stumpy-paladin/R57-9156.0.0-rc3', 'suite': 'suite_attr_wrapper', 'experimental': 'False', 'job_queued': 1483723652}
01/06 11:00:18.786 ERROR|   logging_manager:0626| MACHINE NAME: chromeos4-row2-rack9-host1
01/06 11:00:18.786 ERROR|   logging_manager:0626| MACHINE GROUP: stumpy
01/06 11:00:18.787 ERROR|   logging_manager:0626| parsing partial test ---- SERVER_JOB
01/06 11:00:18.787 ERROR|   logging_manager:0626| 
01/06 11:00:18.787 ERROR|   logging_manager:0626| STATUS: INFO	----	----	kernel=3.8.11	localtime=Jan 06 09:37:26	timestamp=1483724246
01/06 11:00:18.788 ERROR|   logging_manager:0626| 
01/06 11:00:18.788 ERROR|   logging_manager:0626| STATUS: START	----	----	timestamp=1483724259	localtime=Jan 06 09:37:39
01/06 11:00:18.789 ERROR|   logging_manager:0626| parsing partial test None CLIENT_JOB.0
01/06 11:00:18.789 ERROR|   logging_manager:0626| RUNNING: RUNNING
01/06 11:00:18.790 ERROR|   logging_manager:0626| CLIENT_JOB.0
01/06 11:00:18.790 DEBUG|          ssh_host:0177| Running (ssh) 'grep -q CHROMEOS /etc/lsb-release && ! test -f /mnt/stateful_partition/.android_tester && ! grep -q moblab /etc/lsb-release'
01/06 11:00:18.790 ERROR|   logging_manager:0626| 
01/06 11:00:18.790 ERROR|   logging_manager:0626| 
01/06 11:00:18.791 ERROR|   logging_manager:0626| STATUS: START	audio_Microphone	audio_Microphone	timestamp=1483724259	localtime=Jan 06 09:37:39
01/06 11:00:18.791 ERROR|   logging_manager:0626| parsing partial test audio_Microphone audio_Microphone
01/06 11:00:18.792 ERROR|   logging_manager:0626| RUNNING: RUNNING
01/06 11:00:18.792 ERROR|   logging_manager:0626| Subdir: audio_Microphone
01/06 11:00:18.793 ERROR|   logging_manager:0626| Testname: audio_Microphone
01/06 11:00:18.793 ERROR|   logging_manager:0626| 
01/06 11:00:18.794 ERROR|   logging_manager:0626| 
01/06 11:00:18.794 ERROR|   logging_manager:0626| STATUS: INFO	----	----	Job aborted by autotest_system on 2017-01-06 11:00:07
01/06 11:00:18.795 ERROR|   logging_manager:0626| Unexpected indent regression, aborting
01/06 11:00:18.795 ERROR|   logging_manager:0626| 
01/06 11:00:18.795 ERROR|   logging_manager:0626| STATUS: END ABORT	audio_Microphone	audio_Microphone	None

What's the deal with all the output from the logging manager?  This is very confusing.


Labels: Hotlist-Fixit
It's really not clear what goes on.  Is the "logging manager" parsing error messages?  That is a very dangerous thing to do, because:

1. is the "unexpected indent" causing the failure?
2. is the "unexpected indent" possibly hiding the failure cause?

Can we get rid of this "logging manager" completely?

Cc: rspangler@chromium.org ihf@chromium.org
Labels: -Pri-2 Pri-1
Owner: snanda@chromium.org
Status: Assigned
I found another. This is causing flaky failures of a test I added, desktopui_MashLogin:

http://chromegw.corp.google.com/i/chromeos/builders/x86-zgb-release/builds/5358

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/95684338-chromeos-test/chromeos6-row2-rack7-host8/debug/

01/11 09:49:01.496 ERROR|   logging_manager:0626| MACHINE NAME: chromeos6-row2-rack7-host8
01/11 09:49:01.496 ERROR|   logging_manager:0626| MACHINE GROUP: x86-zgb
01/11 09:49:01.499 ERROR|   logging_manager:0626| parsing partial test ---- SERVER_JOB
01/11 09:49:01.499 ERROR|   logging_manager:0626| 
01/11 09:49:01.499 ERROR|   logging_manager:0626| STATUS: INFO	----	----	kernel=3.8.11	localtime=Jan 11 09:28:08	timestamp=1484155688
01/11 09:49:01.499 ERROR|   logging_manager:0626| 
01/11 09:49:01.499 ERROR|   logging_manager:0626| STATUS: START	----	----	timestamp=1484155705	localtime=Jan 11 09:28:25
01/11 09:49:01.500 ERROR|   logging_manager:0626| parsing partial test None CLIENT_JOB.0
01/11 09:49:01.500 ERROR|   logging_manager:0626| RUNNING: RUNNING
01/11 09:49:01.500 ERROR|   logging_manager:0626| CLIENT_JOB.0
01/11 09:49:01.500 ERROR|   logging_manager:0626| 
01/11 09:49:01.500 ERROR|   logging_manager:0626| 
01/11 09:49:01.501 ERROR|   logging_manager:0626| STATUS: START	desktopui_MashLogin	desktopui_MashLogin	timestamp=1484155705	localtime=Jan 11 09:28:25
01/11 09:49:01.501 ERROR|   logging_manager:0626| parsing partial test desktopui_MashLogin desktopui_MashLogin
01/11 09:49:01.506 ERROR|   logging_manager:0626| RUNNING: RUNNING
01/11 09:49:01.506 ERROR|   logging_manager:0626| Subdir: desktopui_MashLogin
01/11 09:49:01.506 ERROR|   logging_manager:0626| Testname: desktopui_MashLogin
01/11 09:49:01.507 ERROR|   logging_manager:0626| 
01/11 09:49:01.507 ERROR|   logging_manager:0626| 
01/11 09:49:01.507 ERROR|   logging_manager:0626| STATUS: GOOD	desktopui_MashLogin	desktopui_MashLogin	timestamp=1484156106	localtime=Jan 11 09:35:06	completed successfully
01/11 09:49:01.507 ERROR|   logging_manager:0626| update RUNNING reason: completed successfully
01/11 09:49:01.507 ERROR|   logging_manager:0626| 
01/11 09:49:01.508 ERROR|   logging_manager:0626| STATUS: END GOOD	desktopui_MashLogin	desktopui_MashLogin	timestamp=1484156106	localtime=Jan 11 09:35:06
01/11 09:49:01.508 ERROR|   logging_manager:0626| parsing test desktopui_MashLogin desktopui_MashLogin
01/11 09:49:01.508 ERROR|   logging_manager:0626| ADD: GOOD
01/11 09:49:01.508 ERROR|   logging_manager:0626| Subdir: desktopui_MashLogin
01/11 09:49:01.508 ERROR|   logging_manager:0626| Testname: desktopui_MashLogin
01/11 09:49:01.508 ERROR|   logging_manager:0626| completed successfully
01/11 09:49:01.509 ERROR|   logging_manager:0626| 
01/11 09:49:01.509 ERROR|   logging_manager:0626| STATUS: INFO	----	----	Job aborted by autotest_system on 2017-01-11 09:48:46
01/11 09:49:01.509 ERROR|   logging_manager:0626| Unexpected indent regression, aborting
01/11 09:49:01.518 ERROR|   logging_manager:0626| 
01/11 09:49:01.518 ERROR|   logging_manager:0626| STATUS: END ABORT	desktopui_MashLogin	desktopui_MashLogin	None
01/11 09:49:01.518 ERROR|   logging_manager:0626| parsing test desktopui_MashLogin desktopui_MashLogin
01/11 09:49:01.519 ERROR|   logging_manager:0626| ADD: ABORT
01/11 09:49:01.519 ERROR|   logging_manager:0626| Subdir: desktopui_MashLogin
01/11 09:49:01.519 ERROR|   logging_manager:0626| Testname: desktopui_MashLogin
01/11 09:49:01.519 ERROR|   logging_manager:0626| None
01/11 09:49:01.519 ERROR|   logging_manager:0626| 
01/11 09:49:01.521 ERROR|   logging_manager:0626| STATUS: INFO	----	----	Job aborted by autotest_system on 2017-01-11 09:48:46
01/11 09:49:01.521 ERROR|   logging_manager:0626| parsing test ---- SERVER_JOB
01/11 09:49:06.831 ERROR|        dev_server:0353| Devserver call failed: "http://100.108.1.152:8082/check_health?", timeout: 2.0 seconds, Error: retry exception (label="make_call"), timeout = 2s

Also https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/95505854-chromeos-test/chromeos2-row6-rack6-host7/debug/

Issue 679213 is the MashLogin test.

Here is the code that prints the message. It's some kind of result parser:

https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/tko/parsers/version_1.py?q=%22Unexpected+indent+regression,+aborting%22&sq=package:%5Echromeos_(public%7Cinternal)$&l=286

Assigning to today's sheriff since it's causing bvt-cq failures.


Cc: dshi@chromium.org
Owner: dgarr...@chromium.org
Reassigning to the Infra deputy:  This is a bug in Autotest,
apparently caused by aborting the job.  The abort is (I
presume) happening due to a timeout.

The relevant evidence is the status.log file.  Here's the
important stuff:

====
INFO	----	----	kernel=3.8.11	localtime=Jan 11 09:28:08	timestamp=1484155688	
START	----	----	timestamp=1484155705	localtime=Jan 11 09:28:25	
	START	desktopui_MashLogin	desktopui_MashLogin	timestamp=1484155705	localtime=Jan 11 09:28:25	
		GOOD	desktopui_MashLogin	desktopui_MashLogin	timestamp=1484156106	localtime=Jan 11 09:35:06	completed successfully
	END GOOD	desktopui_MashLogin	desktopui_MashLogin	timestamp=1484156106	localtime=Jan 11 09:35:06	
INFO	----	----	Job aborted by autotest_system on 2017-01-11 09:48:46
====

The error message means there's no 'END' in the first column
(corresponding to the first 'START') after the indented 'END GOOD'.

I'm not sure why the "aborted" message is labeled "INFO".  I think
there's supposed to be separate syntax to indicate aborts.  I don't
know why that wasn't done right.

For triage purposes, note that really, there are two bugs here:
  * Something is causing periodic timeouts.
  * The results from the aborted tests aren't being reported/parsed
    correctly.

More details in issue 679213 -- that test was generating a lot of core files, despite it passing. This filled the disk and causing an rsync timeout. Perhaps that's related to the lack of END.

Owner: shuqianz@chromium.org
Owner: dshi@chromium.org
Pass to deputy
Comment 8 by dshi@chromium.org, Jan 18 2017
Owner: akes...@chromium.org
shuqianz@,
Don was trying to find someone working on Autotest to fix the actual bug.

I will let Aviv to decide who can work on this bug.

Assign to Aviv.
Owner: ----
Status: Available
Owner: semenzato@chromium.org
Status: Started
Will take a stab at this.

There is a choice of (at least) two directions here.

1. make sure that we only generate "correct" logs (is this even possible?)

2. make the tko parser more tolerant.

What was the main purpose for making the parser picky?  Consistency checks?

Also, IIRC, when I started writing the log parser script for classifying build failures, it was remarked that such script would not be robust.  I agreed with that, but indicated that I considered it a temporary tool (and in fact it is not committed anywhere).  What are the thoughts regarding this particular tool, which does something similar, yet appears to be used as a long-term solution?  Should it stay or go away completely?

I noticed that unfortunately its authors have moved on.  Is anybody claiming ownership?

Thanks.


Also look at issue 658549: "non-status line, ignoring".  This is when status.log contains a python stack trace.

Comment 13 by ihf@chromium.org, Feb 3 2017
FYI: There is a related issue 684311.

I think bad indent means that some form of concurrency is broken. In this case the job was aborted (which sounds like a reasonable event), but this interference from outside was not expected, which I think it should always be. So fixing parsing in the case of aborts seems reasonable.

In issue 684311 maybe one could fix parsing, but we don't understand why the prints happened, which means likely something with the concurrency model is broken. Everything needs to be done on a case by case basis.
#13 thank you.  What do you mean by "some form of concurrency is broken"?  Are multiple processes or threads writing to this file, which is then used as a legitimate data structure?

The fact that python stack traces can also be dumped into this file makes one wonder about the entire approach.  Are my worries possibly unreasonable/unfounded?
Comment 15 by ihf@chromium.org, Feb 3 2017
I think your worries are my worries as well.
Project Member Comment 16 by bugdroid1@chromium.org, Feb 11 2017
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/e70648162260bfd46a56fd1a1d1d7f86ec01e600

commit e70648162260bfd46a56fd1a1d1d7f86ec01e600
Author: Luigi Semenzato <semenzato@chromium.org>
Date: Sat Feb 11 08:35:23 2017

TKO parser: improve some error messages in debug log

These changes try to make it clearer that the errors
aren't critical to the execution of the test, and also
slightly reduce the spew to improve log readability.

BUG=chromium:659741
TEST=none

Change-Id: I1a2332becb6d6d1fbd5f23f39dbbc502a631d37c
Reviewed-on: https://chromium-review.googlesource.com/437707
Commit-Ready: Luigi Semenzato <semenzato@chromium.org>
Tested-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/parsers/version_1.py
[modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/server/server_job.py
[modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/status_lib_unittest.py
[modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/parse.py
[modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/parsers/test/scenario_base.py
[modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/status_lib.py
[add] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/parser_lib.py
[modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/utils.py
[modify] https://crrev.com/e70648162260bfd46a56fd1a1d1d7f86ec01e600/tko/parsers/version_0.py

Labels: Logging-improv
Labels: akeshet-pending-downgrade
ChromeOS Infra P1 Bugscrub.

P1 Bugs in this component should be important enough to get weekly status updates.

Is this already fixed?  -> Fixed
Is this no longer relevant? -> Archived or WontFix
Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority.
Is this a Feature Request rather than a bug? Type -> Feature
Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign.
Does this bug have the wrong owner? -> reassign.

Bugs that remain in this state next week will be downgraded to P2.
Labels: -akeshet-pending-downgrade Pri-2
ChromeOS Infra P1 Bugscrub.

Issue untouched in a week after previous message. Downgrading to P2.
Sign in to add a comment