autotest_SyncCount test failing to run |
||||||||||||
Issue description
This test broke at R61-9745.0.0.
The server job fails with:
07/14 22:39:37.013 ERROR| server_job:0932| Exception escaped control file, job aborting:
Traceback (most recent call last):
File "/usr/local/autotest/server/server_job.py", line 884, in run
self._execute_code(GET_NETWORK_STATS_CONTROL_FILE, namespace)
File "/usr/local/autotest/server/server_job.py", line 1425, in _execute_code
execfile(code_file, namespace, namespace)
File "/usr/local/autotest/server/control_segments/get_network_stats", line 57, in <module>
job.parallel_simple(get_network_stats, machines)
File "/usr/local/autotest/server/server_job.py", line 657, in parallel_simple
return_results=return_results)
File "/usr/local/autotest/server/subcommand.py", line 103, in parallel_simple
subcommands.append(subcommand(function, args, subdir))
File "/usr/local/autotest/server/subcommand.py", line 116, in __init__
os.mkdir(self.subdir)
OSError: [Errno 36] File name too long: "/usr/local/autotest/results/128553426-chromeos-test/{'host_info_store': <autotest_lib.server.hosts.afe_store.AfeStore object at 0x7f827443d290>, 'hostname': 'chromeos6-row2-rack4-host21', 'connection_pool': <autotest_lib.server.hosts.ssh_multiplex.ConnectionPool object at 0x7f827443d1d0>, 'afe_host': HOST OBJECT: chromeos6-row2-rack4-host21}"
It looks like the problem occurs when trying to create a subdirectory for each of the machines involved in this multi-DUT test. However, a host dictionary is passed rather than a machine name, so the directory name ends up being the string representation of the host dictionary.
It seems the test failed to run when this string became too long after connection_pool info was added to the host dictionary in this change:
https://chromium-review.googlesource.com/c/547077/7/server/server_job.py
The subdirectory name is set here:
https://chromium.git.corp.google.com/chromiumos/third_party/autotest/+/5a2cac10377d910d6d7f2553b99b595cb570dae2/server/subcommand.py#100
Example failure:
http://cautotest/afe/#tab_id=view_job&object_id=128553426
,
Jul 20 2017
I have had a CL up for this, but it didn't seem to fix everything: https://chromium-review.googlesource.com/c/521706/ Now that this _is_ broken, we can try fixing it ;)
,
Jul 20 2017
,
Jul 20 2017
Adding an observation: If this change is moved up a level (before the call to _make_parallel_wrapper), it corrects the problem: https://chromium-review.googlesource.com/c/317579/2/server/server_job.py insert before this line: https://chromium.git.corp.google.com/chromiumos/third_party/autotest/+/5a2cac10377d910d6d7f2553b99b595cb570dae2/server/server_job.py#654 but there may be some functions that take advantage of being passed a machine dictionary rather than just a machine name, so this approach might have some undesirable side effects.
,
Jul 21 2017
Any thoughts on a solution approach? Our mesh tests are currently down due to this issue.
,
Jul 21 2017
,
Jul 21 2017
,
Jul 21 2017
Locally repro'ed after some hiccups: http://pprabhu.mtv.corp.google.com/results/70-pprabhu/group0/debug/autoserv.DEBUG
,
Jul 21 2017
,
Jul 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d08c86bd1b0371b6c7b77aae3b6c92366a18e887 commit d08c86bd1b0371b6c7b77aae3b6c92366a18e887 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Tue Jul 25 05:54:03 2017 [autotest] Respect machine dict in parallel_simple A long time ago, the machines list passed in to server_job was converted from a list of strings to possibly a list of dicts. Some places didn't get this update. Fix another one. BUG= chromium:746751 TEST=(1) (new) unittests (2) Ran a local autotest_SyncCount job and verified that subdirs names are correctly inferred. Change-Id: I7479241d97155e639c85a6b5469d134c1f48eba8 Reviewed-on: https://chromium-review.googlesource.com/582234 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> Reviewed-by: Laurence Goodby <lgoodby@chromium.org> [modify] https://crrev.com/d08c86bd1b0371b6c7b77aae3b6c92366a18e887/server/server_job.py [modify] https://crrev.com/d08c86bd1b0371b6c7b77aae3b6c92366a18e887/server/subcommand_unittest.py [modify] https://crrev.com/d08c86bd1b0371b6c7b77aae3b6c92366a18e887/server/subcommand.py
,
Jul 25 2017
I guess. lgoodby@ can verify in his lab.
,
Jul 25 2017
Verified fix in R62-9777.0.0.
,
Jul 25 2017
,
Jul 26 2017
Pls apply appropriate OSs label. Thank you.
,
Jul 26 2017
,
Jul 26 2017
Your change meets the bar and is auto-approved for M61. Please go ahead and merge the CL to branch 3163 manually. Please contact milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), ketakid @(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/16fd2155abe7c16729efa6891ff217f1b0b28cd0 commit 16fd2155abe7c16729efa6891ff217f1b0b28cd0 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Fri Jul 28 23:20:20 2017 [autotest] Respect machine dict in parallel_simple A long time ago, the machines list passed in to server_job was converted from a list of strings to possibly a list of dicts. Some places didn't get this update. Fix another one. BUG= chromium:746751 TEST=(1) (new) unittests (2) Ran a local autotest_SyncCount job and verified that subdirs names are correctly inferred. Change-Id: I7479241d97155e639c85a6b5469d134c1f48eba8 Reviewed-on: https://chromium-review.googlesource.com/582234 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> Reviewed-by: Laurence Goodby <lgoodby@chromium.org> (cherry picked from commit d08c86bd1b0371b6c7b77aae3b6c92366a18e887) Reviewed-on: https://chromium-review.googlesource.com/588003 Commit-Queue: Laurence Goodby <lgoodby@chromium.org> Tested-by: Laurence Goodby <lgoodby@chromium.org> [modify] https://crrev.com/16fd2155abe7c16729efa6891ff217f1b0b28cd0/server/server_job.py [modify] https://crrev.com/16fd2155abe7c16729efa6891ff217f1b0b28cd0/server/subcommand_unittest.py [modify] https://crrev.com/16fd2155abe7c16729efa6891ff217f1b0b28cd0/server/subcommand.py
,
Jul 31 2017
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 31 2017
|
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by akes...@chromium.org
, Jul 20 2017Labels: -Pri-3 Pri-1