New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 2 users
Status: Fixed
Owner:
Closed: Dec 11
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment
Chrome PFQ failing in cheets_CTS_N tests ("No such file or directory: ... .#copy_images.sh")
Project Member Reported by kinaba@chromium.org, Dec 11 Back to list
+ARC constables (risan, levarum)
+Those who landed CTS autotest changes over the weekend (rohitbm, ihf)

https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-chrome-pfq/builds/2134

Unhandled OSError: [Errno 2] No such file or directory: '/usr/local/autotest/results/shared/cache/cache/XXXXXXXXXX/android-cts-media-1.3/android-cts-media-1.3/.#copy_images.sh'
 
12/10 13:18:20.494 WARNI|              test:0637| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 598, in _exec
    _cherry_pick_call(self.initialize, *args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 746, in _cherry_pick_call
    return func(*p_args, **p_dargs)
  File "/usr/local/autotest/server/cros/tradefed_test.py", line 472, in initialize
    self._clean_download_cache_if_needed()
  File "/usr/local/autotest/server/cros/tradefed_test.py", line 708, in _clean_download_cache_if_needed
    size = self._dir_size(self._tradefed_cache)
  File "/usr/local/autotest/server/cros/tradefed_test.py", line 681, in _dir_size
    os.path.getsize(os.path.join(root, name)) for name in files)
  File "/usr/local/autotest/server/cros/tradefed_test.py", line 681, in <genexpr>
    os.path.getsize(os.path.join(root, name)) for name in files)
  File "/usr/lib/python2.7/genericpath.py", line 49, in getsize
    return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: '/usr/local/autotest/results/shared/cache/cache/3a05d334ad72728dd6b03104185219b8/android-cts-media-1.3/android-cts-media-1.3/.#copy_images.sh'



Not yet sure what's causing this failure but the failing code path is from
https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/523762
so I think we should revert it for now as a quick recovery.
Hmmm but the same failure is seen on the versions before the CL 523762 (the _clea**r**_download_cache_if_needed() stacktrace below is from the old rev.)

https://stainless.corp.google.com/search?exclude_retried=true&first_date=20171112&master_builder_name=&builder_name_number=&shard=&exclude_acts=true&builder_name=&master_builder_name_number=&owner=&retry=&exclude_cts=false&exclude_non_production=true&hostname=&board=%5Easuka%24&test=cheets_GTS.5&exclude_not_run=false&build=%5ER65%5C-10193%5C.0%5C.0%24&status=FAIL&status=ERROR&status=ABORT&reason=&waterfall=&suite=&last_date=20171211&exclude_non_release=true&exclude_au=true&view=list
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/160889759-chromeos-test/chromeos6-row2-rack22-host6/debug/

12/10 00:09:14.900 ERROR|   logging_manager:0626| tko parser: update RUNNING reason: Unhandled OSError: [Errno 2] No such file or directory: '/usr/local/autotest/results/shared/cache/cache/3a05d334ad72728dd6b03104185219b8/android-cts-media-1.3/android-cts-media-1.3/.#copy_images.sh'
12/10 00:09:14.901 ERROR|   logging_manager:0626| tko parser: The following lines were ignored:
12/10 00:09:14.901 ERROR|   logging_manager:0626| tko parser:   Traceback (most recent call last):
12/10 00:09:14.901 ERROR|   logging_manager:0626| 
12/10 00:09:14.902 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/client/common_lib/test.py", line 598, in _exec
12/10 00:09:14.902 ERROR|   logging_manager:0626| 
12/10 00:09:14.903 ERROR|   logging_manager:0626| tko parser:       _cherry_pick_call(self.initialize, *args, **dargs)
12/10 00:09:14.903 ERROR|   logging_manager:0626| 
12/10 00:09:14.904 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/client/common_lib/test.py", line 746, in _cherry_pick_call
12/10 00:09:14.904 ERROR|   logging_manager:0626| 
12/10 00:09:14.905 ERROR|   logging_manager:0626| tko parser:       return func(*p_args, **p_dargs)
12/10 00:09:14.905 ERROR|   logging_manager:0626| 
12/10 00:09:14.905 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/cros/tradefed_test.py", line 455, in initialize
12/10 00:09:14.906 ERROR|   logging_manager:0626| 
12/10 00:09:14.906 ERROR|   logging_manager:0626| tko parser:       self._clear_download_cache_if_needed()
12/10 00:09:14.907 ERROR|   logging_manager:0626| 
12/10 00:09:14.907 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/cros/tradefed_test.py", line 653, in _clear_download_cache_if_needed
12/10 00:09:14.908 ERROR|   logging_manager:0626| 
12/10 00:09:14.908 ERROR|   logging_manager:0626| tko parser:       size = self._dir_size(self._tradefed_cache)
12/10 00:09:14.908 ERROR|   logging_manager:0626| 
12/10 00:09:14.909 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/cros/tradefed_test.py", line 644, in _dir_size
12/10 00:09:14.909 ERROR|   logging_manager:0626| 
12/10 00:09:14.910 ERROR|   logging_manager:0626| tko parser:       os.path.getsize(os.path.join(root, name)) for name in files)
12/10 00:09:14.911 ERROR|   logging_manager:0626| 
12/10 00:09:14.911 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/cros/tradefed_test.py", line 644, in <genexpr>
12/10 00:09:14.912 ERROR|   logging_manager:0626| 
12/10 00:09:14.913 ERROR|   logging_manager:0626| tko parser:       os.path.getsize(os.path.join(root, name)) for name in files)
12/10 00:09:14.913 ERROR|   logging_manager:0626| 
12/10 00:09:14.914 ERROR|   logging_manager:0626| tko parser:     File "/usr/lib/python2.7/genericpath.py", line 49, in getsize
12/10 00:09:14.915 ERROR|   logging_manager:0626| 
12/10 00:09:14.916 ERROR|   logging_manager:0626| tko parser:       return os.stat(filename).st_size
12/10 00:09:14.916 ERROR|   logging_manager:0626| 
Cc: deanliao@chromium.org dtor@chromium.org
+ChromeOS sheriffs
Downloaded and extracted the file locally:

https://source.android.com/compatibility/cts/downloads#cts-media-files

kinaba: ~/Desktop> unzip android-cts-media-1.3.zip
Archive:  android-cts-media-1.3.zip
   creating: android-cts-media-1.3/
  inflating: android-cts-media-1.3/copy_media.sh
  inflating: android-cts-media-1.3/make_zip.sh
    linking: android-cts-media-1.3/.#copy_images.sh  -> jinpark@jinpark.seo.corp.google.com.153778:1471402662
  inflating: android-cts-media-1.3/README.txt
...


so the .#copy_images.sh is a presumably unintended symlink that may be causing weird outcome.
Though not sure why it wasn't causing the problem since now.
OK I think I got the whole picture now.

(1) The broken symlink has always been included in the CTS-media zip.
(2) We didn't hit the issue until https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/523762, because we only cached and shared the zip file; each job decompressed the zip file and just delete the decompressed tree each time.
(3) Now after 523762, the decompressed zip tree is put also onto the shared cache dir, which subjects to the recursive getsize() for occasional cache-cleanup for not to fill up the server disk space.

(2) is the reason why we recently started seeing the failure, and (3) is the reason why we see the failure even with older versions. (A job scheduled after (2) will fail regardless of 523762 is used or not, because the cleanup code for (3) has always been there from the older versions.)

The server-side directory is already tainted by the broken symlink,
so the fix has to deal with it at least. Just reverting the mentioned change does not fix the issue.

As the quickest fix I'll ignore the error at (3). Here's the CL:

https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/818670/2/server/cros/tradefed_test.py



CTS Media 1.4 (https://source.android.com/compatibility/cts/downloads#cts-media-files) does not contain the problematic .# link.
So a longer term fix may be to switch to it.
Owner: kinaba@chromium.org
Status: Started
Cc: cros-cts-te@google.com
Locally confirmed
https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/818670/2/server/cros/tradefed_test.py
will make the tests surviving even with the existence of bad symlinks.


It still causes the "directly affected" CTS tests (i.e., CtsMediaStressTestCases and cheets_CTS_N.all that uses the media files) because shuitl.copytree() fails due to the file, but since it is not a part of CQ. We should be able to deal with them separately.
Labels: ReleaseBlock-Stable
Project Member Comment 12 by bugdroid1@chromium.org, Dec 11
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/6fd2f1e240b9e4064a595d1be8535b752abe7b2d

commit 6fd2f1e240b9e4064a595d1be8535b752abe7b2d
Author: Kazuhiro Inaba <kinaba@chromium.org>
Date: Mon Dec 11 03:53:28 2017

cheets_CTS: workaround for broken symlink file in CTS media zip.

For some reason the archive contains a broken symlink which causes
our recursive stat to fail. As a very quick workaround, catch the
exception and assume the file to be size 0.

BUG= chromium:793696 
TEST=trybot

Change-Id: I25e259ca9de77bf4adb36cf1960a75ecbd5e861c
Reviewed-on: https://chromium-review.googlesource.com/818670
Tested-by: Kazuhiro Inaba <kinaba@chromium.org>
Trybot-Ready: Kazuhiro Inaba <kinaba@chromium.org>
Reviewed-by: Yuichiro Hanada <yhanada@chromium.org>
Reviewed-by: Shuo-Peng Liao <deanliao@chromium.org>

[modify] https://crrev.com/6fd2f1e240b9e4064a595d1be8535b752abe7b2d/server/cros/tradefed_test.py

Landed to M65 (ToT). This should at least unblock the Chrome PFQ.


CTS failures will still remain on M63 and M64 until we cherry-pick the CL above,
but let's wait a moment until we confirm the M65 status.

I'm not familiar with how each bot picks up the version to run,
but 10204.0.0-rc2 paladin runs include my autotest fix:
https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-paladin/builds/4729

so CQs/PFQs using 10204.0.0-rc2 or above should pass. (In other words, 10204.0.0-rc1 will fail.)
Status: Fixed
Let's see if ongoing master-paladin passes:
https://chromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/17165
Project Member Comment 16 by bugdroid1@chromium.org, Dec 11
Labels: merge-merged-release-R63-10032.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/a146c0e9aa187a9920cf81d44aa9a3f1015817a6

commit a146c0e9aa187a9920cf81d44aa9a3f1015817a6
Author: Kazuhiro Inaba <kinaba@chromium.org>
Date: Mon Dec 11 11:51:16 2017

cheets_CTS: workaround for broken symlink file in CTS media zip.

For some reason the archive contains a broken symlink which causes
our recursive stat to fail. As a very quick workaround, catch the
exception and assume the file to be size 0.

BUG= chromium:793696 
TEST=trybot

Change-Id: I25e259ca9de77bf4adb36cf1960a75ecbd5e861c
Reviewed-on: https://chromium-review.googlesource.com/818670
Tested-by: Kazuhiro Inaba <kinaba@chromium.org>
Trybot-Ready: Kazuhiro Inaba <kinaba@chromium.org>
Reviewed-by: Yuichiro Hanada <yhanada@chromium.org>
Reviewed-by: Shuo-Peng Liao <deanliao@chromium.org>
(cherry picked from commit 6fd2f1e240b9e4064a595d1be8535b752abe7b2d)
Reviewed-on: https://chromium-review.googlesource.com/819071
Reviewed-by: Kazuhiro Inaba <kinaba@chromium.org>

[modify] https://crrev.com/a146c0e9aa187a9920cf81d44aa9a3f1015817a6/server/cros/tradefed_test.py

Project Member Comment 17 by bugdroid1@chromium.org, Dec 11
Labels: merge-merged-release-R64-10176.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/e2b5536702b07f8ed365e18702c96dc1e92868f7

commit e2b5536702b07f8ed365e18702c96dc1e92868f7
Author: Kazuhiro Inaba <kinaba@chromium.org>
Date: Mon Dec 11 11:51:34 2017

cheets_CTS: workaround for broken symlink file in CTS media zip.

For some reason the archive contains a broken symlink which causes
our recursive stat to fail. As a very quick workaround, catch the
exception and assume the file to be size 0.

BUG= chromium:793696 
TEST=trybot

Change-Id: I25e259ca9de77bf4adb36cf1960a75ecbd5e861c
Reviewed-on: https://chromium-review.googlesource.com/818670
Tested-by: Kazuhiro Inaba <kinaba@chromium.org>
Trybot-Ready: Kazuhiro Inaba <kinaba@chromium.org>
Reviewed-by: Yuichiro Hanada <yhanada@chromium.org>
Reviewed-by: Shuo-Peng Liao <deanliao@chromium.org>
(cherry picked from commit 6fd2f1e240b9e4064a595d1be8535b752abe7b2d)
Reviewed-on: https://chromium-review.googlesource.com/819070
Reviewed-by: Kazuhiro Inaba <kinaba@chromium.org>

[modify] https://crrev.com/e2b5536702b07f8ed365e18702c96dc1e92868f7/server/cros/tradefed_test.py

master-paladin went green, as well as all other PFQ bots dead due to this failure.

Things should be all set now.
I still don't fully understand how this slowly fell over, but thank you for fixing!
Project Member Comment 20 by bugdroid1@chromium.org, Dec 11
Labels: merge-merged-stabilize-10032.71.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/63d0c669f5f9af01f69318fb0649e028ff013c53

commit 63d0c669f5f9af01f69318fb0649e028ff013c53
Author: Kazuhiro Inaba <kinaba@chromium.org>
Date: Mon Dec 11 23:44:13 2017

cheets_CTS: workaround for broken symlink file in CTS media zip.

For some reason the archive contains a broken symlink which causes
our recursive stat to fail. As a very quick workaround, catch the
exception and assume the file to be size 0.

BUG= chromium:793696 
TEST=trybot

Change-Id: I25e259ca9de77bf4adb36cf1960a75ecbd5e861c
Reviewed-on: https://chromium-review.googlesource.com/818670
Tested-by: Kazuhiro Inaba <kinaba@chromium.org>
Trybot-Ready: Kazuhiro Inaba <kinaba@chromium.org>
Reviewed-by: Yuichiro Hanada <yhanada@chromium.org>
Reviewed-by: Shuo-Peng Liao <deanliao@chromium.org>
(cherry picked from commit 6fd2f1e240b9e4064a595d1be8535b752abe7b2d)
Reviewed-on: https://chromium-review.googlesource.com/819071
Reviewed-by: Kazuhiro Inaba <kinaba@chromium.org>
(cherry picked from commit a146c0e9aa187a9920cf81d44aa9a3f1015817a6)
Reviewed-on: https://chromium-review.googlesource.com/820685
Reviewed-by: Grace Kihumba <gkihumba@chromium.org>
Commit-Queue: Grace Kihumba <gkihumba@chromium.org>
Tested-by: Grace Kihumba <gkihumba@chromium.org>

[modify] https://crrev.com/63d0c669f5f9af01f69318fb0649e028ff013c53/server/cros/tradefed_test.py

Re #19: the media file archive is expanded only when cheets_CTS_N.CtsMediaStressTestCases was run.
Only each time a job for MediaStress test is scheduled to a different instance it breaks the environment. So it somehow gradually killed the tests.
Sign in to add a comment