New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 598122 link

Starred by 3 users

Issue metadata

Status: Archived
Owner: ----
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

telemetry benchmarks failing on pre-flight branch

Project Member Reported by llozano@chromium.org, Mar 26 2016

Issue description

Note that this will affect all telemetry benchmarks. 

For the last couple of days, the AFDO profile tests have been failing with the following error:

https://uberchromegw.corp.google.com/i/chromiumos.release/builders/lumpy-pre-flight-branch%20release-R50-7978.B/builds/101
see:
https://uberchromegw.corp.google.com/i/chromiumos.release/builders/lumpy-pre-flight-branch%20release-R50-7978.B/builds/101/steps/HWTest%20%5BAFDO_record%5D/logs/stdio

telemetry_AFDOGenerate: FAIL: Error occurred while sending DEPs to dut.@http://cautotest/tko/retrieve_logs.cgi?job=/results/57865374-chromeos-test/@@@

if you see the detailed log:
https://00e9e64bac7601dc47ea3cbf35a76a45a612b1f9a0bdf395e5-apidata.googleusercontent.com/download/storage/v1_internal/b/chromeos-autotest-results/o/57865374-chromeos-test%2Fchromeos2-row5-rack6-host10%2Ftelemetry_AFDOGenerate%2Fdebug%2Ftelemetry_AFDOGenerate.DEBUG?qk=AD5uMEuM8f6c81nM85ARmZQXDEnDE3rfv-Ufsc0op3zttfK3LQDye-xx9l4IJgYP5oXYqud_GkMJ5zFly7CGYus-8YtHqo2W1ZJ-RqWstgpllaOecZAEYj7Jc3SzdPfyLtZbmlu51GaWP8HyH2d3BZxPZc6Ax0VIO1dcLwfHUP2sDpu_0gqJK-5PwC-IijDcduBlV8dlvTuifBt_Z7m_0q_VIAWH__NwofbUK3YvI-Zva9bD8scA1Dsrs90lbXzLz5mB5YP4O04jg49NvCzlwPNQ05gHmLstizdGPmZKyEJFPP1fn-E7aUTMzxHT-rSpSvr8EG2spLFMJGbe0Ls2L7Q5niZ_8waGTcMgrkAZVIylxEhFFkHGTfFLVQxQmW1i5ApzeQkyEg-NoJMDE0vtskN2OlH0xoUgxlIv3WM-ozkEUJv_ztDGI04VxDJ9s1kddysF7pCJK8HKjjhpgonvaoMY7T_hWDgXcZTzhEuOP6r6IFkiPDxEFqzQJTvhfgVeuj6CPjtYLJhwklcpkpJmEC4DkuP3Z-dvfvJHJdF7Ltq7-Ik2C1yo_7d-oCJWZzpvtOz6Fm2r92xO8Ge6z5LD6X_cIRYK5o8EsyZIjZUshJ9dcd4BjnJfF_BvSlJxQwNufr5h9k6zfMzXzERYG7zspsUs8QoL6_DHLs22azBd2zrwb3UbznwR4hychx8-PMKeLKKTXBgaiF5Ce9s9MLpWAByC2CuHbpIPEzHWeV8_6D_GSZglyoHr-UZcZfp26EjVnRC3IUCzJSSV3PxwtFTiSFl6ExTBaQmpfSaROZRk8WXWmWL4efv24wzAYfNE9zP4iTgu3HZBfchGfu1V5M31xA9bZCA0bJL4WMYZiXz2Nhv9BFQCYv4M7NcR8ozuFvUZBWrwanuByxhijVVPHEySMvVA7jQnSB5Czjv2RySs7DlT2DVaGgeYLzue8U4GQBhRyLOHWp8A7qUVH0uSfNd5Aj2lwAp9491tzw

you will see the following error:

03/25 14:22:35.428 INFO |  telemetry_runner:0427| Copying: /home/chromeos-test/images/lumpy-pre-flight-branch/R50-7978.31.0-rc2/telemetry_src/src/tools/perf/page_sets/data/typical_25_002.wpr -> /usr/local/telemetry/src/tools/perf/page_sets/data/typical_25_002.wpr
03/25 14:22:35.429 DEBUG|        base_utils:0177| Running 'ssh 172.17.40.27 rsync /home/chromeos-test/images/lumpy-pre-flight-branch/R50-7978.31.0-rc2/telemetry_src/src/tools/perf/page_sets/data/typical_25_002.wpr chromeos2-row5-rack6-host10:/usr/local/telemetry/src/tools/perf/page_sets/data/typical_25_002.wpr'
03/25 14:22:42.452 INFO |telemetry_AFDOGene:0200| Got exception from Telemetry benchmark page_cycler.typical_25 after 16.523780 seconds. Exception: Error occurred while sending DEPs to dut.

I had seen a similar problem a few weeks that was fixed in here:

https://bugs.chromium.org/p/chromium/issues/detail?id=581863

so, is it possible that autotest got updated in the pre-flight branch and it does not match the changes in Telemetry (chrome)?

Started happening on the 23th with this builder:

https://uberchromegw.corp.google.com/i/chromiumos.release/builders/lumpy-pre-flight-branch%20release-R50-7978.B/builds/95


 
Cc: bccheng@chromium.org llozano@chromium.org laszio@chromium.org

Comment 2 by laszio@chromium.org, Mar 28 2016

The problematic rsync took 7s to fail. Reasons other than out-of-space usually lead to fails quickly (path not found, permission problems, etc.) or after timeout.

May I have access to the machine, chromeos2-row5-rack6-host10, to examine the the problem closely?

Logs:

03/25 14:22:35.429 DEBUG|        base_utils:0177| Running 'ssh 172.17.40.27 rsync /home/chromeos-test/images/lumpy-pre-flight-branch/R50-7978.31.0-rc2/telemetry_src/src/tools/perf/page_sets/data/typical_25_002.wpr chromeos2-row5-rack6-host10:/usr/local/telemetry/src/tools/perf/page_sets/data/typical_25_002.wpr'
03/25 14:22:42.452 INFO |telemetry_AFDOGene:0200| Got exception from Telemetry benchmark page_cycler.typical_25 after 16.523780 seconds. Exception: Error occurred while sending DEPs to dut.
......
03/25 14:22:44.265 DEBUG|        base_utils:0177| Running 'ssh 172.17.40.27 rsync /home/chromeos-test/images/lumpy-pre-flight-branch/R50-7978.31.0-rc2/telemetry_src/src/tools/perf/page_sets/data/typical_25_002.wpr chromeos2-row5-rack6-host10:/usr/local/telemetry/src/tools/perf/page_sets/data/typical_25_002.wpr'
03/25 14:22:51.164 INFO |telemetry_AFDOGene:0200| Got exception from Telemetry benchmark page_cycler.typical_25 after 8.711011 seconds. Exception: Error occurred while sending DEPs to dut.

Comment 3 by laszio@chromium.org, Mar 28 2016

I logged into the DUTs and found that there are plenty of space left (several GBs). Keith also mentioned that this problem occurs on every boards.  It seems that my previous guess is wrong.

I executed the commands which are generated to run on devserver on my local machine individually and it works fine. It seems hard to reproduce the problem without a real devserver. The second suspect is the difference between devserver and my local machine. A major one should be .ssh/config. I removed most of the config except the private key for auth and it was blocked when calling ssh, scp or rsync, although not fail as quick as a few seconds. Anyway, here is an attempt to align the ssh, scp and rsync options with CrosHost.make_ssh_command:

https://chromium-review.googlesource.com/335534

The best way would be debugging and testing on a devserver directly. May I know (if possible) how to get access to one of them? It seems that my local machine has it's ssh server which does not allow public key auth and can't be used in batch mode. I'm also not sure how to fully replicate the whole in-production environment, too.

If the above change is still too risky, we should disable the telemetry_on_dut first:

https://chromium-review.googlesource.com/335385

or revert the change altogether:

https://chromium-review.googlesource.com/323026/

Project Member

Comment 4 by bugdroid1@chromium.org, Mar 28 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/a6898c6a37d4894364dffd0885c4e480a5c03ff5

commit a6898c6a37d4894364dffd0885c4e480a5c03ff5
Author: Ting-Yuan Huang <laszio@chromium.org>
Date: Mon Mar 28 10:13:01 2016

Disable telemetry_on_dut by default

telemetry_on_dut is buggy on devserver.

BUG= chromium:598122 
TEST=test_that --board=squawks [dut] telemetry_Benchmarks.* \
     --args=" local=True telemetry_on_dut=[|True|False] "

Change-Id: If09e8aa1e4f1d0c7a282ffc1075ff5c7930372a9
Reviewed-on: https://chromium-review.googlesource.com/335385
Tested-by: Ting-Yuan Huang <laszio@chromium.org>
Reviewed-by: Keith Haddow <haddowk@chromium.org>
Commit-Queue: Luis Lozano <llozano@chromium.org>

[modify] https://crrev.com/a6898c6a37d4894364dffd0885c4e480a5c03ff5/server/cros/telemetry_runner.py
[modify] https://crrev.com/a6898c6a37d4894364dffd0885c4e480a5c03ff5/server/site_tests/telemetry_Benchmarks/telemetry_Benchmarks.py

Comment 5 by hadd...@google.com, Mar 28 2016

Initial results show that after telemetry_on_dut was switched off the tests pass, confirming the suspicion that it was the source of the test failures.
I still got this failure around 1 PM.. is this expected? how long does it take to propagate this fix?

https://uberchromegw.corp.google.com/i/chromiumos.release/builders/lumpy-pre-flight-branch%20release-R50-7978.B/builds/110


Ting-Yuan,

Is your DUT directly connected to your workstation with the DHCP hack? If so you might want to try to connect it to a port on the wall to see if the problem can be reproduced.

Comment 8 by laszio@chromium.org, Mar 29 2016

The failure happens before a devserver talks to a DUT; The script tries to ssh into a devserver to which I have no access.

I also used Luis's machine in the lab but still couldn't reproduce. The reason is the same.
Cc: fdeng@chromium.org dshi@chromium.org sbasi@chromium.org
problem is definetly still happening:

https://uberchromegw.corp.google.com/i/chromeos/builders/lumpy-chrome-pfq/builds/8461

Ting-Yuan, 

can you please provide as much information as you have? what is the command that is failing? to which machine?

Adding simran, fdeng, dshi.

How can we reproduce this problem? this only seems to reproduce when the devserver is used.
Cc: achuith@chromium.org
also adding achuith, since he was the original author of this functionality. 
Ting-Yuan is blocked because he has no way to reproduce this problem.

Comment 11 by sbasi@chromium.org, Mar 29 2016

Test logs here: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/58216014-chromeos-test/chromeos2-row5-rack6-host10/debug/?project=chromeos-bot&debugUI=DEVELOPERS

So its launching the telemetry script from devserver 172.17.40.27 against DUT chromeos2-row5-rack6-host10

What exactly is the sending DEPs step doing in the telemetry script?
Project Member

Comment 12 by bugdroid1@chromium.org, Mar 29 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/8a2c7f7fecd0dbc1759425cb57163af0a8d6002d

commit 8a2c7f7fecd0dbc1759425cb57163af0a8d6002d
Author: Ting-Yuan Huang <laszio@chromium.org>
Date: Mon Mar 28 14:01:07 2016

Insert ssh, scp and rsync options when invoking telemetry on dut

Also removed unnecessary exception handlers to expose more info.

BUG= chromium:598122 
TEST=Tested with local=True
     Also tested every commands which are generated to run on
     devserver on a local machine individually.

Change-Id: Id06ea36bb4205dbceb927c0ffbf12fa25e2f79ca
Reviewed-on: https://chromium-review.googlesource.com/335534
Commit-Ready: Ting-Yuan Huang <laszio@chromium.org>
Tested-by: Ting-Yuan Huang <laszio@chromium.org>
Reviewed-by: Keith Haddow <haddowk@chromium.org>
Reviewed-by: Dan Shi <dshi@chromium.org>

[modify] https://crrev.com/8a2c7f7fecd0dbc1759425cb57163af0a8d6002d/server/cros/telemetry_runner.py

Project Member

Comment 13 by bugdroid1@chromium.org, Mar 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/8a2c7f7fecd0dbc1759425cb57163af0a8d6002d

commit 8a2c7f7fecd0dbc1759425cb57163af0a8d6002d
Author: Ting-Yuan Huang <laszio@chromium.org>
Date: Mon Mar 28 14:01:07 2016

Insert ssh, scp and rsync options when invoking telemetry on dut

Also removed unnecessary exception handlers to expose more info.

BUG= chromium:598122 
TEST=Tested with local=True
     Also tested every commands which are generated to run on
     devserver on a local machine individually.

Change-Id: Id06ea36bb4205dbceb927c0ffbf12fa25e2f79ca
Reviewed-on: https://chromium-review.googlesource.com/335534
Commit-Ready: Ting-Yuan Huang <laszio@chromium.org>
Tested-by: Ting-Yuan Huang <laszio@chromium.org>
Reviewed-by: Keith Haddow <haddowk@chromium.org>
Reviewed-by: Dan Shi <dshi@chromium.org>

[modify] https://crrev.com/8a2c7f7fecd0dbc1759425cb57163af0a8d6002d/server/cros/telemetry_runner.py

The failing command is:

ssh 172.17.40.27 rsync /home/chromeos-test/images/lumpy-chrome-pfq/R51-8124.0.0-rc2/telemetry_src/src/tools/perf/page_sets/data/typical_25_002.wpr chromeos2-row5-rack6-host10:/usr/local/telemetry/src/tools/perf/page_sets/data/typical_25_002.wpr

where 172.17.40.27 is a devserver on which chromeos-test is logged in. It looks like that rsync failed when trying to access chromeos2-row5-rack6-host10 using chromeos-test (should be root). The above CL added required ssh login info and options. Does the conjecture make sense?

Components: Infra>Client>ChromeOS
Hi, this bug has not been updated recently and remains untriaged. Please acknowledge the bug and provide status within two weeks (6/8/2018), or the bug will be closed. Thank you.
Status: Archived (was: Untriaged)

Sign in to add a comment