New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 834479 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Cheets container mount is broken in caroline-tot-chrome-pfq-informational suite

Project Member Reported by phshah@chromium.org, Apr 18 2018

Issue description

ChromeOS Version: R68-10595.0.0
OS: Chrome

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8948885117255187712

http://cautotest-prod/new_tko/#tab_id=test_detail_view&object_id=718910943

Test cheets_ContainerMount (job 192958510-chromeos-test/chromeos6-row2-rack23-host13)

Test: cheets_ContainerMount
Job tag: 192958510-chromeos-test/chromeos6-row2-rack23-host13
Job name: caroline-tot-chrome-pfq-informational/R68-10595.0.0-b2490531/bvt-arc/cheets_ContainerMount
Status: FAIL
Reason: Mount points are mismatched with the expected list: expected: set(['root', 'android-data']), actual: set(['root', 'android-data', 'android-data/data/dalvik-cache/x86']), extra: set(['android-data/data/dalvik-cache/x86']), missing: set([]) 
Test started: 2018-04-18 14:01:24
Test finished: 2018-04-18 14:02:11
Host: chromeos6-row2-rack23-host13
Platform: caroline
Kernel: 3.18.0-17549-gc766a263ddb9
Test labels: none

I know you touch this test, recently. Can you take a look? It has been failing for quite awhile.
 
Has it only been failing on informational builder?  I won't have time to look into this very soon though.

Comment 2 by uekawa@google.com, Apr 19 2018

Cc: hidehiko@chromium.org

Comment 4 by uekawa@google.com, Apr 19 2018

Labels: ArcConstable
hmm.. shouldn't it be ignored via client/site_tests/cheets_ContainerMount/cheets_ContainerMount.py#103 IGNORED_MOUNTS ?
Cc: cmtm@chromium.org victorhsieh@chromium.org
Owner: hidehiko@chromium.org
Status: Started (was: Untriaged)
Investigated.

There are several things in background.
- IGNROED_MOUNTS should be updated. It used to be root/data/dalvik-cache/... before run_oci migration. After the migration, it should be android-data/data/dalvik-cache.

- We switched the container two phase ARC container boot.
  On mini container start, dalvik-cache dir is mounted in init mount namespace.
  On upgrading to full container, it is unmounted.
  https://chromium.googlesource.com/chromiumos/platform2/+/master/arc/setup/arc_setup.cc#1782

So the solution should be;
- Remove those entries from IGNORED_MOUNTS.

Will send a fix.
The CL https://chrome-internal-review.googlesource.com/c/chromeos/autotest-cheets/+/611547 itself is fine because we should remove the obsolete paths from the set, but does that really fix the test failure? 

'extra: set(['android-data/data/dalvik-cache/x86'])' will still be there even with your CL. Is this a race between lazy umount vs the check???
Project Member

Comment 7 by bugdroid1@chromium.org, Apr 20 2018

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/autotest-cheets/+/3ead6218315d5001f16d35eeccc7e74922be6363

commit 3ead6218315d5001f16d35eeccc7e74922be6363
Author: Hidehiko Abe <hidehiko@chromium.org>
Date: Fri Apr 20 07:33:05 2018

Cc: wutao@chromium.org tbarzic@chromium.org
I saw an instance of this failure on veyron_minnie-tot-chrome-pfq-informational:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8948434308122422560

As already mentioned by yusukes, the cl in comment #7 does not really fix the test failure, it just removes reference to an obsolete mount point path.
Any updates on this? I've seen another instance on informational Chrome pfq:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8948089791067296816

Comment 10 by warx@chromium.org, Apr 30 2018

Cc: warx@chromium.org minch@chromium.org
+this week gardener, still happens in (caroline, veyron_minnie)-tot-chrome-pfq-informational
Re #6, #8; I was confused and indeed #7 didn't fix the issue.

Though, the cause is not yet clear to me.
IIUC, this is not race between lazy-umount vs the check. The check reads /proc/${PID}/mountinfo, which is cleaned on umount call, even if it is busy.

# mkdir -t /tmp/test
# sudo mount -t tmpfs tmpfs /tmp/test
# touch /tmp/test/foo
(on different shell)# python
  f = open('/tmp/test/foo')
# cat /proc/self/mountinfo | grep /tmp/test
... (an entry is found) ...
# sudo umount /tmp/test
umount: /tmp/test: target is busy.
# sudo umount --lazy /tmp/test
# cat /proc/self/mountinfo | grep /tmp/test
... (no entry is found) ...

Interestingly, in the log attached to #8,

in android-run_oci.20180423-130619:

[0423/130622:INFO:arc_setup.cc(571)] Setting up /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/arm
[0423/130622:INFO:arc_setup.cc(572)] Running !base::PathExists(dest_directory)...
[0423/130622:INFO:arc_setup.cc(575)] Running !arc_mounter_->BindMount(src_directory, dest_directory)...

so the mount point is created at this point, and in arc-boot-continue.log

[0423/130625:INFO:arc_setup.cc(599)] Running arc_mounter_->UmountLazily(dalvik_cache_directory.Append("arm"))...
[0423/130625:INFO:arc_setup.cc(601)] Running arc_mounter_->UmountLazily(dalvik_cache_directory.Append("x86"))...
[0423/130625:ERROR:arc_setup_util.cc(380)] Failed to lazy-umount /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/x86: No such file or directory
[0423/130625:INFO:arc_setup.cc(601)] Ignoring failures: arc_mounter_->UmountLazily(dalvik_cache_directory.Append("x86"))
[0423/130625:INFO:arc_setup.cc(603)] Running arc_mounter_->UmountLazily(dalvik_cache_directory.Append("x86_64"))...

so ArcSetUp::CleanUpDalvikCache() looks working as intended.

At this point, the mount point should be removed from the /proc/.../mountinfo. (Yusuke, do you think it's reasonable to add logging to make sure if the mount point is actually gone for further investigation?)

> (Yusuke, do you think it's reasonable to add logging to make sure if the mount point is actually gone for further investigation?)

Yes, as long as the logging does not slow down the boot.

Cc: khmel@chromium.org sammiequon@chromium.org malaykeshav@chromium.org
 Issue 840436  has been merged into this issue.
Project Member

Comment 14 by bugdroid1@chromium.org, May 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/89bbc0ef616bdb2d3b56a1b36ac2676b01df387a

commit 89bbc0ef616bdb2d3b56a1b36ac2676b01df387a
Author: Hidehiko Abe <hidehiko@chromium.org>
Date: Wed May 09 21:40:10 2018

arc-setup: Output mount_info log for dalvik-cache.

For further investigation, this outputs more logs for dalvik-cache
mount points.

BUG= chromium:834479 
TEST=mount -t tmpfs tmpfs /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/x86, \
     then run cheets_ContainerMount. Made sure fail and log is remained. \
     Made sure added log in arc-continue-boot.log and arc-lifetime.log.
CQ-DEPEND=CL:1049985

Change-Id: Iad9630be4636dcfa3fc265dde7d29f74e4f35499
Reviewed-on: https://chromium-review.googlesource.com/1039224
Commit-Ready: Hidehiko Abe <hidehiko@chromium.org>
Tested-by: Hidehiko Abe <hidehiko@chromium.org>
Reviewed-by: Yusuke Sato <yusukes@chromium.org>

[modify] https://crrev.com/89bbc0ef616bdb2d3b56a1b36ac2676b01df387a/arc/setup/arc_setup.cc

Project Member

Comment 15 by bugdroid1@chromium.org, May 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/5b7604c2f3525e929d01100a6974d05e9e29f824

commit 5b7604c2f3525e929d01100a6974d05e9e29f824
Author: Hidehiko Abe <hidehiko@chromium.org>
Date: Wed May 09 21:40:10 2018

arc-setup: Expose FindLine from arc_setup_util.

To use in arc_setup for logging purpose.

BUG= chromium:834479 
TEST=Trybot. Ran "cros_run_unittest --package arc-setup" locally.

Change-Id: I637ef4c8249ca335540027d3c75b70ab49cb9860
Reviewed-on: https://chromium-review.googlesource.com/1049985
Commit-Ready: Hidehiko Abe <hidehiko@chromium.org>
Tested-by: Hidehiko Abe <hidehiko@chromium.org>
Reviewed-by: Hidehiko Abe <hidehiko@chromium.org>

[modify] https://crrev.com/5b7604c2f3525e929d01100a6974d05e9e29f824/arc/setup/arc_setup_util.h
[modify] https://crrev.com/5b7604c2f3525e929d01100a6974d05e9e29f824/arc/setup/arc_setup_util.cc
[modify] https://crrev.com/5b7604c2f3525e929d01100a6974d05e9e29f824/arc/setup/arc_setup_util_unittest.cc

Cc: -sammiequon@chromium.org derat@chromium.org

Comment 17 by derat@chromium.org, May 18 2018

Cc: levarum@chromium.org nya@chromium.org kinaba@chromium.org
Labels: -Pri-3 Pri-1
Owner: yusukes@chromium.org
What's the status of fixing this? I'm still seeing slightly-different failures on multiple builders:

caroline-tot-chrome-pfq-informational: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8946184811704548112

Traceback (most recent call last):
  File "/usr/local/autotest/common_lib/test.py", line 631, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/common_lib/test.py", line 495, in execute
    dargs)
  File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/tests/cheets_ContainerMount/cheets_ContainerMount.py", line 72, in run_once
    self._assert_arc_not_leak_mounts(global_mountinfo_list)
  File "/usr/local/autotest/tests/cheets_ContainerMount/cheets_ContainerMount.py", line 120, in _assert_arc_not_leak_mounts
    WHITELISTED_MOUNTS - mount_paths))
TestFail: Mount points are mismatched with the expected list: expected: set(['root', 'android-data']), actual: set(['root', 'android-data', 'android-data/data/dalvik-cache/x86']), extra: set(['android-data/data/dalvik-cache/x86']), missing: set([]) 

eve-tot-chrome-pfq-informational: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8946184347926179664

Traceback (most recent call last):
  File "/usr/local/autotest/common_lib/test.py", line 631, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/common_lib/test.py", line 495, in execute
    dargs)
  File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/tests/cheets_ContainerMount/cheets_ContainerMount.py", line 72, in run_once
    self._assert_arc_not_leak_mounts(global_mountinfo_list)
  File "/usr/local/autotest/tests/cheets_ContainerMount/cheets_ContainerMount.py", line 120, in _assert_arc_not_leak_mounts
    WHITELISTED_MOUNTS - mount_paths))
TestFail: Mount points are mismatched with the expected list: expected: set(['root', 'android-data']), actual: set(['android-data/data/dalvik-cache/x86_64', 'root', 'android-data', 'android-data/data/dalvik-cache/x86']), extra: set(['android-data/data/dalvik-cache/x86_64', 'android-data/data/dalvik-cache/x86']), missing: set([])

Comment 18 by derat@chromium.org, May 18 2018

Should I add android-data/data/dalvik-cache/x86 and android-data/data/dalvik-cache/x86_64 to IGNORED_MOUNTS in the test?
I talked to Luis and there might be a way to fix this (in arc-setup). Let me check that first.
hidehiko@
I don't see your stale mount point LOG in arc-boot-continue.log but the test still failed :/
#19 seems working. I'm going to remove the mount point from the init namespace.
Project Member

Comment 23 by bugdroid1@chromium.org, May 19 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/af1c6868417bbfc3599299b29ee07d04e1f3c767

commit af1c6868417bbfc3599299b29ee07d04e1f3c767
Author: yusukes <yusukes@google.com>
Date: Sat May 19 10:37:39 2018

arc-base: Unconditionally create all isa directories

The latest config.json for ARC needs all of these directories.

BUG= chromium:834479 
TEST=ARC still starts

Change-Id: I955d051f744e1a96a495b2a5a38405be796569e2
Reviewed-on: https://chromium-review.googlesource.com/1065582
Commit-Ready: Yusuke Sato <yusukes@chromium.org>
Tested-by: Yusuke Sato <yusukes@chromium.org>
Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org>

[modify] https://crrev.com/af1c6868417bbfc3599299b29ee07d04e1f3c767/chromeos-base/arc-base/arc-base-9999.ebuild

Note: Hmm... This looks not easily reproducible on my local env...
#25 Do you know the reason of the failure now?
Project Member

Comment 27 by bugdroid1@chromium.org, Jun 8 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/3031d6000a517cb20d4d61d6e80704fa828817fb

commit 3031d6000a517cb20d4d61d6e80704fa828817fb
Author: yusukes <yusukes@google.com>
Date: Fri Jun 08 02:56:16 2018

arc: Mount /data/dalvik-cache/<isa> in the container namespace

With run_oci, there's no reason to do that in the init namespace.
This will fix the occasional mount point leak reported at
 crbug.com/834479  too.

BUG= chromium:834479 
BUG= chromium:842927 
TEST=ARC++ still starts, cheets_ContainerMount
CQ-DEPEND=CL:1065582

Change-Id: I4b442d4702c3a09f020a5cc7a40463b5bd5dff59
Reviewed-on: https://chromium-review.googlesource.com/1065587
Commit-Ready: Yusuke Sato <yusukes@chromium.org>
Tested-by: Yusuke Sato <yusukes@chromium.org>
Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org>

[modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/container-bundle/pi/config.json
[modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/container-bundle/nyc/config.json
[modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/container-bundle/master/config.json
[modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/setup/arc_setup.cc

Status: Fixed (was: Started)

Sign in to add a comment