telemetry benchmarks failing on pre-flight branch |
|||||
Issue descriptionNote that this will affect all telemetry benchmarks. For the last couple of days, the AFDO profile tests have been failing with the following error: https://uberchromegw.corp.google.com/i/chromiumos.release/builders/lumpy-pre-flight-branch%20release-R50-7978.B/builds/101 see: https://uberchromegw.corp.google.com/i/chromiumos.release/builders/lumpy-pre-flight-branch%20release-R50-7978.B/builds/101/steps/HWTest%20%5BAFDO_record%5D/logs/stdio telemetry_AFDOGenerate: FAIL: Error occurred while sending DEPs to dut.@http://cautotest/tko/retrieve_logs.cgi?job=/results/57865374-chromeos-test/@@@ if you see the detailed log: https://00e9e64bac7601dc47ea3cbf35a76a45a612b1f9a0bdf395e5-apidata.googleusercontent.com/download/storage/v1_internal/b/chromeos-autotest-results/o/57865374-chromeos-test%2Fchromeos2-row5-rack6-host10%2Ftelemetry_AFDOGenerate%2Fdebug%2Ftelemetry_AFDOGenerate.DEBUG?qk=AD5uMEuM8f6c81nM85ARmZQXDEnDE3rfv-Ufsc0op3zttfK3LQDye-xx9l4IJgYP5oXYqud_GkMJ5zFly7CGYus-8YtHqo2W1ZJ-RqWstgpllaOecZAEYj7Jc3SzdPfyLtZbmlu51GaWP8HyH2d3BZxPZc6Ax0VIO1dcLwfHUP2sDpu_0gqJK-5PwC-IijDcduBlV8dlvTuifBt_Z7m_0q_VIAWH__NwofbUK3YvI-Zva9bD8scA1Dsrs90lbXzLz5mB5YP4O04jg49NvCzlwPNQ05gHmLstizdGPmZKyEJFPP1fn-E7aUTMzxHT-rSpSvr8EG2spLFMJGbe0Ls2L7Q5niZ_8waGTcMgrkAZVIylxEhFFkHGTfFLVQxQmW1i5ApzeQkyEg-NoJMDE0vtskN2OlH0xoUgxlIv3WM-ozkEUJv_ztDGI04VxDJ9s1kddysF7pCJK8HKjjhpgonvaoMY7T_hWDgXcZTzhEuOP6r6IFkiPDxEFqzQJTvhfgVeuj6CPjtYLJhwklcpkpJmEC4DkuP3Z-dvfvJHJdF7Ltq7-Ik2C1yo_7d-oCJWZzpvtOz6Fm2r92xO8Ge6z5LD6X_cIRYK5o8EsyZIjZUshJ9dcd4BjnJfF_BvSlJxQwNufr5h9k6zfMzXzERYG7zspsUs8QoL6_DHLs22azBd2zrwb3UbznwR4hychx8-PMKeLKKTXBgaiF5Ce9s9MLpWAByC2CuHbpIPEzHWeV8_6D_GSZglyoHr-UZcZfp26EjVnRC3IUCzJSSV3PxwtFTiSFl6ExTBaQmpfSaROZRk8WXWmWL4efv24wzAYfNE9zP4iTgu3HZBfchGfu1V5M31xA9bZCA0bJL4WMYZiXz2Nhv9BFQCYv4M7NcR8ozuFvUZBWrwanuByxhijVVPHEySMvVA7jQnSB5Czjv2RySs7DlT2DVaGgeYLzue8U4GQBhRyLOHWp8A7qUVH0uSfNd5Aj2lwAp9491tzw you will see the following error: 03/25 14:22:35.428 INFO | telemetry_runner:0427| Copying: /home/chromeos-test/images/lumpy-pre-flight-branch/R50-7978.31.0-rc2/telemetry_src/src/tools/perf/page_sets/data/typical_25_002.wpr -> /usr/local/telemetry/src/tools/perf/page_sets/data/typical_25_002.wpr 03/25 14:22:35.429 DEBUG| base_utils:0177| Running 'ssh 172.17.40.27 rsync /home/chromeos-test/images/lumpy-pre-flight-branch/R50-7978.31.0-rc2/telemetry_src/src/tools/perf/page_sets/data/typical_25_002.wpr chromeos2-row5-rack6-host10:/usr/local/telemetry/src/tools/perf/page_sets/data/typical_25_002.wpr' 03/25 14:22:42.452 INFO |telemetry_AFDOGene:0200| Got exception from Telemetry benchmark page_cycler.typical_25 after 16.523780 seconds. Exception: Error occurred while sending DEPs to dut. I had seen a similar problem a few weeks that was fixed in here: https://bugs.chromium.org/p/chromium/issues/detail?id=581863 so, is it possible that autotest got updated in the pre-flight branch and it does not match the changes in Telemetry (chrome)? Started happening on the 23th with this builder: https://uberchromegw.corp.google.com/i/chromiumos.release/builders/lumpy-pre-flight-branch%20release-R50-7978.B/builds/95
,
Mar 28 2016
The problematic rsync took 7s to fail. Reasons other than out-of-space usually lead to fails quickly (path not found, permission problems, etc.) or after timeout. May I have access to the machine, chromeos2-row5-rack6-host10, to examine the the problem closely? Logs: 03/25 14:22:35.429 DEBUG| base_utils:0177| Running 'ssh 172.17.40.27 rsync /home/chromeos-test/images/lumpy-pre-flight-branch/R50-7978.31.0-rc2/telemetry_src/src/tools/perf/page_sets/data/typical_25_002.wpr chromeos2-row5-rack6-host10:/usr/local/telemetry/src/tools/perf/page_sets/data/typical_25_002.wpr' 03/25 14:22:42.452 INFO |telemetry_AFDOGene:0200| Got exception from Telemetry benchmark page_cycler.typical_25 after 16.523780 seconds. Exception: Error occurred while sending DEPs to dut. ...... 03/25 14:22:44.265 DEBUG| base_utils:0177| Running 'ssh 172.17.40.27 rsync /home/chromeos-test/images/lumpy-pre-flight-branch/R50-7978.31.0-rc2/telemetry_src/src/tools/perf/page_sets/data/typical_25_002.wpr chromeos2-row5-rack6-host10:/usr/local/telemetry/src/tools/perf/page_sets/data/typical_25_002.wpr' 03/25 14:22:51.164 INFO |telemetry_AFDOGene:0200| Got exception from Telemetry benchmark page_cycler.typical_25 after 8.711011 seconds. Exception: Error occurred while sending DEPs to dut.
,
Mar 28 2016
I logged into the DUTs and found that there are plenty of space left (several GBs). Keith also mentioned that this problem occurs on every boards. It seems that my previous guess is wrong. I executed the commands which are generated to run on devserver on my local machine individually and it works fine. It seems hard to reproduce the problem without a real devserver. The second suspect is the difference between devserver and my local machine. A major one should be .ssh/config. I removed most of the config except the private key for auth and it was blocked when calling ssh, scp or rsync, although not fail as quick as a few seconds. Anyway, here is an attempt to align the ssh, scp and rsync options with CrosHost.make_ssh_command: https://chromium-review.googlesource.com/335534 The best way would be debugging and testing on a devserver directly. May I know (if possible) how to get access to one of them? It seems that my local machine has it's ssh server which does not allow public key auth and can't be used in batch mode. I'm also not sure how to fully replicate the whole in-production environment, too. If the above change is still too risky, we should disable the telemetry_on_dut first: https://chromium-review.googlesource.com/335385 or revert the change altogether: https://chromium-review.googlesource.com/323026/
,
Mar 28 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/a6898c6a37d4894364dffd0885c4e480a5c03ff5 commit a6898c6a37d4894364dffd0885c4e480a5c03ff5 Author: Ting-Yuan Huang <laszio@chromium.org> Date: Mon Mar 28 10:13:01 2016 Disable telemetry_on_dut by default telemetry_on_dut is buggy on devserver. BUG= chromium:598122 TEST=test_that --board=squawks [dut] telemetry_Benchmarks.* \ --args=" local=True telemetry_on_dut=[|True|False] " Change-Id: If09e8aa1e4f1d0c7a282ffc1075ff5c7930372a9 Reviewed-on: https://chromium-review.googlesource.com/335385 Tested-by: Ting-Yuan Huang <laszio@chromium.org> Reviewed-by: Keith Haddow <haddowk@chromium.org> Commit-Queue: Luis Lozano <llozano@chromium.org> [modify] https://crrev.com/a6898c6a37d4894364dffd0885c4e480a5c03ff5/server/cros/telemetry_runner.py [modify] https://crrev.com/a6898c6a37d4894364dffd0885c4e480a5c03ff5/server/site_tests/telemetry_Benchmarks/telemetry_Benchmarks.py
,
Mar 28 2016
Initial results show that after telemetry_on_dut was switched off the tests pass, confirming the suspicion that it was the source of the test failures.
,
Mar 28 2016
I still got this failure around 1 PM.. is this expected? how long does it take to propagate this fix? https://uberchromegw.corp.google.com/i/chromiumos.release/builders/lumpy-pre-flight-branch%20release-R50-7978.B/builds/110
,
Mar 29 2016
Ting-Yuan, Is your DUT directly connected to your workstation with the DHCP hack? If so you might want to try to connect it to a port on the wall to see if the problem can be reproduced.
,
Mar 29 2016
The failure happens before a devserver talks to a DUT; The script tries to ssh into a devserver to which I have no access. I also used Luis's machine in the lab but still couldn't reproduce. The reason is the same.
,
Mar 29 2016
problem is definetly still happening: https://uberchromegw.corp.google.com/i/chromeos/builders/lumpy-chrome-pfq/builds/8461 Ting-Yuan, can you please provide as much information as you have? what is the command that is failing? to which machine? Adding simran, fdeng, dshi. How can we reproduce this problem? this only seems to reproduce when the devserver is used.
,
Mar 29 2016
also adding achuith, since he was the original author of this functionality. Ting-Yuan is blocked because he has no way to reproduce this problem.
,
Mar 29 2016
Test logs here: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/58216014-chromeos-test/chromeos2-row5-rack6-host10/debug/?project=chromeos-bot&debugUI=DEVELOPERS So its launching the telemetry script from devserver 172.17.40.27 against DUT chromeos2-row5-rack6-host10 What exactly is the sending DEPs step doing in the telemetry script?
,
Mar 29 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/8a2c7f7fecd0dbc1759425cb57163af0a8d6002d commit 8a2c7f7fecd0dbc1759425cb57163af0a8d6002d Author: Ting-Yuan Huang <laszio@chromium.org> Date: Mon Mar 28 14:01:07 2016 Insert ssh, scp and rsync options when invoking telemetry on dut Also removed unnecessary exception handlers to expose more info. BUG= chromium:598122 TEST=Tested with local=True Also tested every commands which are generated to run on devserver on a local machine individually. Change-Id: Id06ea36bb4205dbceb927c0ffbf12fa25e2f79ca Reviewed-on: https://chromium-review.googlesource.com/335534 Commit-Ready: Ting-Yuan Huang <laszio@chromium.org> Tested-by: Ting-Yuan Huang <laszio@chromium.org> Reviewed-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> [modify] https://crrev.com/8a2c7f7fecd0dbc1759425cb57163af0a8d6002d/server/cros/telemetry_runner.py
,
Mar 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/8a2c7f7fecd0dbc1759425cb57163af0a8d6002d commit 8a2c7f7fecd0dbc1759425cb57163af0a8d6002d Author: Ting-Yuan Huang <laszio@chromium.org> Date: Mon Mar 28 14:01:07 2016 Insert ssh, scp and rsync options when invoking telemetry on dut Also removed unnecessary exception handlers to expose more info. BUG= chromium:598122 TEST=Tested with local=True Also tested every commands which are generated to run on devserver on a local machine individually. Change-Id: Id06ea36bb4205dbceb927c0ffbf12fa25e2f79ca Reviewed-on: https://chromium-review.googlesource.com/335534 Commit-Ready: Ting-Yuan Huang <laszio@chromium.org> Tested-by: Ting-Yuan Huang <laszio@chromium.org> Reviewed-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Dan Shi <dshi@chromium.org> [modify] https://crrev.com/8a2c7f7fecd0dbc1759425cb57163af0a8d6002d/server/cros/telemetry_runner.py
,
Mar 30 2016
The failing command is: ssh 172.17.40.27 rsync /home/chromeos-test/images/lumpy-chrome-pfq/R51-8124.0.0-rc2/telemetry_src/src/tools/perf/page_sets/data/typical_25_002.wpr chromeos2-row5-rack6-host10:/usr/local/telemetry/src/tools/perf/page_sets/data/typical_25_002.wpr where 172.17.40.27 is a devserver on which chromeos-test is logged in. It looks like that rsync failed when trying to access chromeos2-row5-rack6-host10 using chromeos-test (should be root). The above CL added required ssh login info and options. Does the conjecture make sense?
,
Nov 10 2017
,
May 31 2018
Hi, this bug has not been updated recently and remains untriaged. Please acknowledge the bug and provide status within two weeks (6/8/2018), or the bug will be closed. Thank you.
,
Jun 1 2018
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by bccheng@chromium.org
, Mar 28 2016