New issue
Advanced search Search tips

Issue 682417 link

Starred by 1 user

Issue metadata

Status: Archived
Owner: ----
Closed: Sep 12
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Task



Sign in to add a comment

Tracker bug: Make privision, reset, verify errors easier to debug

Project Member Reported by pprabhu@chromium.org, Jan 18 2017

Issue description

By auditing our code for two kinds of errors:
- Incorrect logging levels (things that ought to be error/warning/info/debug levels, reassign)
- Multiple stack traces where later trace often doesn't even include information from the earlier one.
- "Expected to fail" / "retry loop" shelled out commands that spew out a whole truckload of stuff.

I will collect anecdotal complaints below to be handled via this bug.
 
provision fails due to no devserver:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/96306324-chromeos-test/chromeos4-row11-rack7-host3/debug/

Issues with autoserv.ERROR:
- The error log has 441 lines.
- Most of them are devserver retry failures. Move to warning, or maybe even info.
- The stack trace is thrown twice. Unify.
- The stack trace doesn't contain any actual information about the error. Fixit.
- Logging manager dumps the whole status.log marker lines into error log. Remove.

autoserv.WARNING issues:
- The actual error "DevServerException: CrOS auto-update failed for host chromeos4-row11-rack7-host3: SSHConnectionError: ssh: connect to host chromeos4-row11-rack7-host3 port 22: Connection timed out" is only available in autoserv.WARNING. Move and unify with the exceptions in autoserv.ERROR
- many many logs of the form "Failed to generate stack trace for /usr/local/autotest/results/hosts..." unify into one warning with the number of failed crash collections. autoserv.INFO can contain the names of all the failed files (in one log line / as a block rather than as annoying repeated log lines).
- Same for "Crash detection failed with: [Errno 2] No such file or director..."
- Same for "Failed to copy /usr/local/autotest/results/hosts/chromeos4-row1..."


autoserv.INFO issues:
- Look for CmdError, we fail 100s of times because minidump_stackwalk isn't available. If shelling out fails with error 127, or some such error, we should just bail on the operation for the whole batch of files. This will save a lot of time and reduce a lot of clutter from the logs.
- Lots of messages about "copying file..." "collection crashinfo for..." are repeated 100s of times, so I'm sure no one can make any sense of them. Instead, summarize the operation in the end (how many copied, how many failed, name a few and say "...and 104 others etc).

autoupdate test error (host did not return from reboot):
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/87549244-chromeos-test/

Only listing issues beyond those mentioned already:

autoserv.ERROR issues:
- base_utils dumps to error log when it fails to cat some checksum files. This seems to be benign, so isn't an error. Also, once again, why complain 20 times about the same issue?

autoserv.WARNING issues:
- The actual stack trace (AutoservRebootError) is here. Should be in autoserv.ERROR

autoserv.INFO issues:
- The same exception is printed 3 times in this file.  

The autoupdate log can also be cleaned up quite a bit:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row7-rack8-host5/1088404-provision/20171601130541/autoupdate_logs/

tl;dr: Currently, the log looks like a wall of SSH command arguments and "unknown host" warnings.


- cros_build_lib.RunCommand log lines contains the full SSH command line in the log. This makes it hard to see the actual command we were trying to run. Perhaps split this log into INFO/DEBUG. And dump the full command only in DEBUG. (we can do the splitting by log level later, when we upload the logs to GS).
- There is no need to dump the full incremental file list being sent over by rsync.
- we check if we can run devserver by running 'devserver --help', and dump the whole help output in the log. Only dump any errors, if they occur (A CL for this is already up: https://chromium-review.googlesource.com/#/c/427971/)
- The command "update_engine_client --status" is run very often. Perhaps we should only dump it to the logs every once in a while (every 5th attempt say). So you get some idea of the progress if your eyeballing the output, but half of your log file is not just waiting for the update to complete.
- "Warning: Permanently added yadayada to the list of known hosts." appears 100s of times in the log.
- Relegate all RunCommand's stdout/stderr to DEBUG level. So that when I read INFO, I know what is being done, without being distracted by weird newlines and such due to the output from the commands being run.

Comment 4 Deleted

Labels: -Pri-2 Pri-1
Merging part of the stream from  issue 656829 

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/80585806-chromeos-test/chromeos4-row6-rack9-host17/

status.log issues:
- Contains the same exception multiple times. Which one do I trust? which do I ignore? Are they all the same?
  - There is even a badly formatted stringified exception in there. Ugh.

Finally, while you're at it, take a look at https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/80585806-chromeos-test/chromeos4-row6-rack9-host17/debug/

autoserv.INFO issues:
This is a 1118 line file.
- About 30% of the log is just one command trying to check if devserver has staged an artifact (search for is_staged?)
- Search for "trying scp, rsync failed" -- these errors seem to be ignored (especially the scp one, we claim we're trying rsync). Then why show me the stack trace?
- Search for 'File "/usr/local': > 30% of the log is simply repeating the exception stack trace!
Can we follow the standard practices for the logging levels?

https://docs.python.org/3/howto/logging.html#when-to-use-logging

Error for swallowing fatal exceptions only.
Yes, and this would be a good time to do it, since we get to audit a large chunk of the logs in one fell swoop.
Status: Started (was: Available)
This is going to be lots of tiny CLs, and we get to declare victory any time we want ;)

Project Member

Comment 9 by bugdroid1@chromium.org, Feb 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/99583e5c18b5c2a4ae4c9d54296f0e32f5c8b0c9

commit 99583e5c18b5c2a4ae4c9d54296f0e32f5c8b0c9
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Mon Feb 06 18:37:46 2017

[tko] Fix imports, path mangling

BUG= chromium:682417 
TEST=None

Change-Id: I50e086b01045370605f9c0c77e4a245d8be630db
Reviewed-on: https://chromium-review.googlesource.com/437673
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/99583e5c18b5c2a4ae4c9d54296f0e32f5c8b0c9/tko/query_lib.py

Project Member

Comment 10 by bugdroid1@chromium.org, Feb 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/97fe9cbb37496658519a070f6d00895d1d549572

commit 97fe9cbb37496658519a070f6d00895d1d549572
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Mon Feb 06 18:37:45 2017

[tko] Sanitize imports

BUG= chromium:682417 
TEST=None

Change-Id: Ib819a655f5047455d314cdc902e868e9ff577351
Reviewed-on: https://chromium-review.googlesource.com/437683
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/97fe9cbb37496658519a070f6d00895d1d549572/tko/perf_upload/perf_uploader.py

Project Member

Comment 11 by bugdroid1@chromium.org, Feb 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/5ef88d1d1455f5677ab3da0f71fea22a7e2492ec

commit 5ef88d1d1455f5677ab3da0f71fea22a7e2492ec
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Mon Feb 06 18:37:46 2017

[tko] Fix import order

BUG= chromium:682417 
TEST=None

Change-Id: I7062223852cbdc000b4cf7efd9d8df0a0e034d5f
Reviewed-on: https://chromium-review.googlesource.com/437907
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/5ef88d1d1455f5677ab3da0f71fea22a7e2492ec/tko/utils.py

Project Member

Comment 12 by bugdroid1@chromium.org, Feb 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/81e138bb5ed29d0b9447b57f7b5933862c27d4d5

commit 81e138bb5ed29d0b9447b57f7b5933862c27d4d5
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Mon Feb 06 20:47:03 2017

[tko] Delete dead code.

Some code in tko/query_lib.py has been commented out since 2009. I think
it is safe to delete it now.

BUG= chromium:682417 
TEST=Local job. Parsing still works.

Change-Id: I89c3328eb95ea03db815d707ab854f955397975a
Reviewed-on: https://chromium-review.googlesource.com/437675
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/81e138bb5ed29d0b9447b57f7b5933862c27d4d5/tko/query_lib.py

Project Member

Comment 13 by bugdroid1@chromium.org, May 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/0afec4251b91fbbe55223325055ba1d88f857598

commit 0afec4251b91fbbe55223325055ba1d88f857598
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Fri May 26 00:15:43 2017

server_job: Use new style except statements

BUG= chromium:682417 
TEST=None

Change-Id: I47465b5163db7d74f7e0afce4ed801bd7e4271aa
Reviewed-on: https://chromium-review.googlesource.com/437607
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>

[modify] https://crrev.com/0afec4251b91fbbe55223325055ba1d88f857598/server/server_job.py

Project Member

Comment 14 by bugdroid1@chromium.org, May 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/271962fa1dd0e9f690f164d672fb3ddad8968ef6

commit 271962fa1dd0e9f690f164d672fb3ddad8968ef6
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Fri May 26 00:15:44 2017

[autotest] Handle exceptions directly in run_test

Instead of obtain sys.exc_info() and then recreating an exception
immediately from it.

BUG= chromium:682417 
TEST=None

Change-Id: Iccc0ff4c1b7a84e22ef8b3f812a244e6b0446254
Reviewed-on: https://chromium-review.googlesource.com/437609
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>

[modify] https://crrev.com/271962fa1dd0e9f690f164d672fb3ddad8968ef6/server/server_job.py

Project Member

Comment 15 by bugdroid1@chromium.org, May 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/2396d5683cbb90e4dabeb17fd6680b4e368fd243

commit 2396d5683cbb90e4dabeb17fd6680b4e368fd243
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Fri May 26 00:15:44 2017

[autotest] Delete redundant call to parent's __str__

BUG= chromium:682417 
TEST=None

Change-Id: I2f01023ffaedb9ab572e6110849805399876bc39
Reviewed-on: https://chromium-review.googlesource.com/437611
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>

[modify] https://crrev.com/2396d5683cbb90e4dabeb17fd6680b4e368fd243/client/common_lib/error.py

Project Member

Comment 16 by bugdroid1@chromium.org, May 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b9d993b7ccefb291752107d0f7249940ab16d09d

commit b9d993b7ccefb291752107d0f7249940ab16d09d
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Fri May 26 00:15:59 2017

[tko] Sanitize imports

BUG= chromium:682417 
TEST=python tko/site_parse.py doesn't blow up.

Change-Id: Id581e98fe49e06b3426c623390d45030f04a2f1b
Reviewed-on: https://chromium-review.googlesource.com/437676
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/b9d993b7ccefb291752107d0f7249940ab16d09d/tko/site_parse.py

Project Member

Comment 17 by bugdroid1@chromium.org, May 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/84a85956f1c44de3645dd8eea255bbd2148a3e55

commit 84a85956f1c44de3645dd8eea255bbd2148a3e55
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Fri May 26 00:15:52 2017

[tko] Sanitize imports

BUG= chromium:682417 
TEST=None

Change-Id: I9d1ce06054e7d5cddcc3e09c5543c376362cc0a3
Reviewed-on: https://chromium-review.googlesource.com/437680
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/84a85956f1c44de3645dd8eea255bbd2148a3e55/tko/parsers/version_1.py

Project Member

Comment 18 by bugdroid1@chromium.org, May 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d8561576404bcb7c3c56d1c691129ba285693d03

commit d8561576404bcb7c3c56d1c691129ba285693d03
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Fri May 26 00:15:52 2017

[tko] Sanitize imports

BUG= chromium:682417 
TEST=None

Change-Id: I876698190f052d5673083bbe91729866457e5af6
Reviewed-on: https://chromium-review.googlesource.com/437909
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>

[modify] https://crrev.com/d8561576404bcb7c3c56d1c691129ba285693d03/tko/parsers/version_0.py

Labels: -Pri-1 -Type-Bug Pri-2 Type-Feature
Owner: pprabhu@chromium.org
What's left on this particular trek?
Labels: -Pri-2 -Type-Feature Pri-3 Type-Task
Lots actually.
This is an undead, low priority cleanup task.
Marking it as such.
Hi, this bug has not been updated recently. Please acknowledge the bug and provide status within two weeks (6/22/2018), or the bug will be archived. Thank you.
Owner: ----
Status: Archived (was: Started)
Really an undead bug...

Sign in to add a comment