Cheets container mount is broken in caroline-tot-chrome-pfq-informational suite |
|||||||||||
Issue descriptionChromeOS Version: R68-10595.0.0 OS: Chrome https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8948885117255187712 http://cautotest-prod/new_tko/#tab_id=test_detail_view&object_id=718910943 Test cheets_ContainerMount (job 192958510-chromeos-test/chromeos6-row2-rack23-host13) Test: cheets_ContainerMount Job tag: 192958510-chromeos-test/chromeos6-row2-rack23-host13 Job name: caroline-tot-chrome-pfq-informational/R68-10595.0.0-b2490531/bvt-arc/cheets_ContainerMount Status: FAIL Reason: Mount points are mismatched with the expected list: expected: set(['root', 'android-data']), actual: set(['root', 'android-data', 'android-data/data/dalvik-cache/x86']), extra: set(['android-data/data/dalvik-cache/x86']), missing: set([]) Test started: 2018-04-18 14:01:24 Test finished: 2018-04-18 14:02:11 Host: chromeos6-row2-rack23-host13 Platform: caroline Kernel: 3.18.0-17549-gc766a263ddb9 Test labels: none I know you touch this test, recently. Can you take a look? It has been failing for quite awhile.
,
Apr 19 2018
,
Apr 19 2018
cheets_ContainerMount is passing on bvt-arc, is this a chrome change that's actually causing this? https://stainless.corp.google.com/search?exclude_retried=true&first_date=2018-04-17&master_builder_name=&builder_name_number=&shard=&exclude_acts=true&builder_name=&master_builder_name_number=&owner=&retry=&exclude_cts=true&exclude_non_production=false&hostname=&board=&test=%5Echeets_ContainerMount%24&exclude_not_run=false&build=%5ER68%5C-10595%5C.0%5C.0%24&status=GOOD&reason=&waterfall=&suite=&last_date=2018-04-19&exclude_non_release=true&exclude_au=true&model=&view=list
,
Apr 19 2018
hmm.. shouldn't it be ignored via client/site_tests/cheets_ContainerMount/cheets_ContainerMount.py#103 IGNORED_MOUNTS ?
,
Apr 19 2018
Investigated. There are several things in background. - IGNROED_MOUNTS should be updated. It used to be root/data/dalvik-cache/... before run_oci migration. After the migration, it should be android-data/data/dalvik-cache. - We switched the container two phase ARC container boot. On mini container start, dalvik-cache dir is mounted in init mount namespace. On upgrading to full container, it is unmounted. https://chromium.googlesource.com/chromiumos/platform2/+/master/arc/setup/arc_setup.cc#1782 So the solution should be; - Remove those entries from IGNORED_MOUNTS. Will send a fix.
,
Apr 19 2018
The CL https://chrome-internal-review.googlesource.com/c/chromeos/autotest-cheets/+/611547 itself is fine because we should remove the obsolete paths from the set, but does that really fix the test failure? 'extra: set(['android-data/data/dalvik-cache/x86'])' will still be there even with your CL. Is this a race between lazy umount vs the check???
,
Apr 20 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/autotest-cheets/+/3ead6218315d5001f16d35eeccc7e74922be6363 commit 3ead6218315d5001f16d35eeccc7e74922be6363 Author: Hidehiko Abe <hidehiko@chromium.org> Date: Fri Apr 20 07:33:05 2018
,
Apr 23 2018
I saw an instance of this failure on veyron_minnie-tot-chrome-pfq-informational: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8948434308122422560 As already mentioned by yusukes, the cl in comment #7 does not really fix the test failure, it just removes reference to an obsolete mount point path.
,
Apr 27 2018
Any updates on this? I've seen another instance on informational Chrome pfq: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8948089791067296816
,
Apr 30 2018
+this week gardener, still happens in (caroline, veyron_minnie)-tot-chrome-pfq-informational
,
May 1 2018
Re #6, #8; I was confused and indeed #7 didn't fix the issue.
Though, the cause is not yet clear to me.
IIUC, this is not race between lazy-umount vs the check. The check reads /proc/${PID}/mountinfo, which is cleaned on umount call, even if it is busy.
# mkdir -t /tmp/test
# sudo mount -t tmpfs tmpfs /tmp/test
# touch /tmp/test/foo
(on different shell)# python
f = open('/tmp/test/foo')
# cat /proc/self/mountinfo | grep /tmp/test
... (an entry is found) ...
# sudo umount /tmp/test
umount: /tmp/test: target is busy.
# sudo umount --lazy /tmp/test
# cat /proc/self/mountinfo | grep /tmp/test
... (no entry is found) ...
Interestingly, in the log attached to #8,
in android-run_oci.20180423-130619:
[0423/130622:INFO:arc_setup.cc(571)] Setting up /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/arm
[0423/130622:INFO:arc_setup.cc(572)] Running !base::PathExists(dest_directory)...
[0423/130622:INFO:arc_setup.cc(575)] Running !arc_mounter_->BindMount(src_directory, dest_directory)...
so the mount point is created at this point, and in arc-boot-continue.log
[0423/130625:INFO:arc_setup.cc(599)] Running arc_mounter_->UmountLazily(dalvik_cache_directory.Append("arm"))...
[0423/130625:INFO:arc_setup.cc(601)] Running arc_mounter_->UmountLazily(dalvik_cache_directory.Append("x86"))...
[0423/130625:ERROR:arc_setup_util.cc(380)] Failed to lazy-umount /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/x86: No such file or directory
[0423/130625:INFO:arc_setup.cc(601)] Ignoring failures: arc_mounter_->UmountLazily(dalvik_cache_directory.Append("x86"))
[0423/130625:INFO:arc_setup.cc(603)] Running arc_mounter_->UmountLazily(dalvik_cache_directory.Append("x86_64"))...
so ArcSetUp::CleanUpDalvikCache() looks working as intended.
At this point, the mount point should be removed from the /proc/.../mountinfo. (Yusuke, do you think it's reasonable to add logging to make sure if the mount point is actually gone for further investigation?)
,
May 1 2018
> (Yusuke, do you think it's reasonable to add logging to make sure if the mount point is actually gone for further investigation?) Yes, as long as the logging does not slow down the boot.
,
May 7 2018
Issue 840436 has been merged into this issue.
,
May 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/89bbc0ef616bdb2d3b56a1b36ac2676b01df387a commit 89bbc0ef616bdb2d3b56a1b36ac2676b01df387a Author: Hidehiko Abe <hidehiko@chromium.org> Date: Wed May 09 21:40:10 2018 arc-setup: Output mount_info log for dalvik-cache. For further investigation, this outputs more logs for dalvik-cache mount points. BUG= chromium:834479 TEST=mount -t tmpfs tmpfs /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/x86, \ then run cheets_ContainerMount. Made sure fail and log is remained. \ Made sure added log in arc-continue-boot.log and arc-lifetime.log. CQ-DEPEND=CL:1049985 Change-Id: Iad9630be4636dcfa3fc265dde7d29f74e4f35499 Reviewed-on: https://chromium-review.googlesource.com/1039224 Commit-Ready: Hidehiko Abe <hidehiko@chromium.org> Tested-by: Hidehiko Abe <hidehiko@chromium.org> Reviewed-by: Yusuke Sato <yusukes@chromium.org> [modify] https://crrev.com/89bbc0ef616bdb2d3b56a1b36ac2676b01df387a/arc/setup/arc_setup.cc
,
May 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/5b7604c2f3525e929d01100a6974d05e9e29f824 commit 5b7604c2f3525e929d01100a6974d05e9e29f824 Author: Hidehiko Abe <hidehiko@chromium.org> Date: Wed May 09 21:40:10 2018 arc-setup: Expose FindLine from arc_setup_util. To use in arc_setup for logging purpose. BUG= chromium:834479 TEST=Trybot. Ran "cros_run_unittest --package arc-setup" locally. Change-Id: I637ef4c8249ca335540027d3c75b70ab49cb9860 Reviewed-on: https://chromium-review.googlesource.com/1049985 Commit-Ready: Hidehiko Abe <hidehiko@chromium.org> Tested-by: Hidehiko Abe <hidehiko@chromium.org> Reviewed-by: Hidehiko Abe <hidehiko@chromium.org> [modify] https://crrev.com/5b7604c2f3525e929d01100a6974d05e9e29f824/arc/setup/arc_setup_util.h [modify] https://crrev.com/5b7604c2f3525e929d01100a6974d05e9e29f824/arc/setup/arc_setup_util.cc [modify] https://crrev.com/5b7604c2f3525e929d01100a6974d05e9e29f824/arc/setup/arc_setup_util_unittest.cc
,
May 14 2018
,
May 18 2018
What's the status of fixing this? I'm still seeing slightly-different failures on multiple builders: caroline-tot-chrome-pfq-informational: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8946184811704548112 Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 631, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/common_lib/test.py", line 495, in execute dargs) File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/tests/cheets_ContainerMount/cheets_ContainerMount.py", line 72, in run_once self._assert_arc_not_leak_mounts(global_mountinfo_list) File "/usr/local/autotest/tests/cheets_ContainerMount/cheets_ContainerMount.py", line 120, in _assert_arc_not_leak_mounts WHITELISTED_MOUNTS - mount_paths)) TestFail: Mount points are mismatched with the expected list: expected: set(['root', 'android-data']), actual: set(['root', 'android-data', 'android-data/data/dalvik-cache/x86']), extra: set(['android-data/data/dalvik-cache/x86']), missing: set([]) eve-tot-chrome-pfq-informational: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8946184347926179664 Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 631, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/common_lib/test.py", line 495, in execute dargs) File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/tests/cheets_ContainerMount/cheets_ContainerMount.py", line 72, in run_once self._assert_arc_not_leak_mounts(global_mountinfo_list) File "/usr/local/autotest/tests/cheets_ContainerMount/cheets_ContainerMount.py", line 120, in _assert_arc_not_leak_mounts WHITELISTED_MOUNTS - mount_paths)) TestFail: Mount points are mismatched with the expected list: expected: set(['root', 'android-data']), actual: set(['android-data/data/dalvik-cache/x86_64', 'root', 'android-data', 'android-data/data/dalvik-cache/x86']), extra: set(['android-data/data/dalvik-cache/x86_64', 'android-data/data/dalvik-cache/x86']), missing: set([])
,
May 18 2018
Should I add android-data/data/dalvik-cache/x86 and android-data/data/dalvik-cache/x86_64 to IGNORED_MOUNTS in the test?
,
May 18 2018
I talked to Luis and there might be a way to fix this (in arc-setup). Let me check that first.
,
May 18 2018
hidehiko@ I don't see your stale mount point LOG in arc-boot-continue.log but the test still failed :/
,
May 18 2018
#19 seems working. I'm going to remove the mount point from the init namespace.
,
May 19 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/af1c6868417bbfc3599299b29ee07d04e1f3c767 commit af1c6868417bbfc3599299b29ee07d04e1f3c767 Author: yusukes <yusukes@google.com> Date: Sat May 19 10:37:39 2018 arc-base: Unconditionally create all isa directories The latest config.json for ARC needs all of these directories. BUG= chromium:834479 TEST=ARC still starts Change-Id: I955d051f744e1a96a495b2a5a38405be796569e2 Reviewed-on: https://chromium-review.googlesource.com/1065582 Commit-Ready: Yusuke Sato <yusukes@chromium.org> Tested-by: Yusuke Sato <yusukes@chromium.org> Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org> [modify] https://crrev.com/af1c6868417bbfc3599299b29ee07d04e1f3c767/chromeos-base/arc-base/arc-base-9999.ebuild
,
May 21 2018
Another failure example. https://stainless.corp.google.com/search?view=list&first_date=2018-05-15&last_date=2018-05-21&build=%5ER68%5C-10698%5C.0%5C.0%24&hostname=chromeos4-row10-rack5-host7&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false According to the log, 201680083 does not have the unexpected mount message, and 201680129 has. 201680105 does not contain the ARC related logs.
,
May 21 2018
Note: Hmm... This looks not easily reproducible on my local env...
,
May 21 2018
#25 Do you know the reason of the failure now?
,
Jun 8 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/3031d6000a517cb20d4d61d6e80704fa828817fb commit 3031d6000a517cb20d4d61d6e80704fa828817fb Author: yusukes <yusukes@google.com> Date: Fri Jun 08 02:56:16 2018 arc: Mount /data/dalvik-cache/<isa> in the container namespace With run_oci, there's no reason to do that in the init namespace. This will fix the occasional mount point leak reported at crbug.com/834479 too. BUG= chromium:834479 BUG= chromium:842927 TEST=ARC++ still starts, cheets_ContainerMount CQ-DEPEND=CL:1065582 Change-Id: I4b442d4702c3a09f020a5cc7a40463b5bd5dff59 Reviewed-on: https://chromium-review.googlesource.com/1065587 Commit-Ready: Yusuke Sato <yusukes@chromium.org> Tested-by: Yusuke Sato <yusukes@chromium.org> Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org> [modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/container-bundle/pi/config.json [modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/container-bundle/nyc/config.json [modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/container-bundle/master/config.json [modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/setup/arc_setup.cc
,
Jun 8 2018
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by victorhsieh@chromium.org
, Apr 18 2018