rlz autotest is failing a lot with "DUT is missing brand_code" since R70 |
||||
Issue descriptionResults: https://stainless.corp.google.com/search?test=%5Erlz%5C_&exclude_non_release=true&exclude_cts=true&col=build&row=test&view=matrix&first_date=2018-08-10&last_date=2018-08-16 Failure stack: Running (ssh) 'mosys -k platform brand | grep brand' from '_call_run_once_with_retry|_call_run_once|run_once|_check_rlz_brand_code|run|run_very_slowly' 08/15 19:21:16.885 ERROR| utils:0286| [stderr] mosys invocation was: ["-k", "platform", "brand"] 08/15 19:21:16.886 ERROR| utils:0286| [stderr] Application error: Subcommand execution finished with error -1 Did something change in the way we are supposed to get the brand code?
,
Aug 16
Appears to have been broken starting in 10924.0.0. Separate note, at least in M67 the same commands that are failing now (except for brand) showed the following message error message: "Command not supported on this platform". That's a much better error message than "Subcommand execution finished with error -1". Should file a separate but for this and get it fixed.
,
Aug 16
Jason, Samantha, Can you take a look at this? Looking a the diff between 10923.0.0 and 10924.0.0 I see a couple of mosys changes though they don't look like they should affect brand. https://chromium-review.googlesource.com/c/chromiumos/platform/mosys/+/1110337 https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1121211
,
Aug 16
At first glance, the 'mosys -k platform brand' issue is hopefully dealt with by crrev.com/c/1176112, which was submitted last night.
,
Aug 16
That CL landed in 10794.0.0, but it is showing the same error: https://stainless.corp.google.com/search?view=list&first_date=2018-08-10&last_date=2018-08-16&test=%5Erlz_CheckPing%24&build=%5ER70%5C-10974%5C.0%5C.0%24&board=veyron_minnie&status=FAIL&status=ERROR&status=ABORT&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Aug 16
There's now crrev.com/c/1178966 for another bug. That might fix the issue. Otherwise, we can look into things more later.
,
Aug 22
Jason, crrev.com/c/1178966 landed in 10988.0.0. I tested this on minnie and got the same error.
,
Aug 23
It seems that SELinux is interfering with Minijail. Mosys is spawning some subprocesses to do work and then those subprocesses fail to start because SELinux is trying to tighten the screws on those subprocesses before they get to work. But, Mosys has already put everything in to a Minijail. In this case, the subprocess is "/bin/head" (pid 6261 in the output below):
[pid 6261] open("/lib/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
[pid 6261] read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\\\t\0\0004\0\0\0"..., 512) = 512
[pid 6261] fstat64(3, {st_mode=S_IFREG|0755, st_size=9608, ...}) = 0
[pid 6261] mmap2(NULL, 73780, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xaa698000
[pid 6261] mprotect(0xaa69a000, 61440, PROT_NONE) = 0
[pid 6261] mmap2(0xaa6a9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0xaa6a9000
[pid 6261] close(3) = 0
[pid 6261] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xaa697000
[pid 6261] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xaa696000
[pid 6261] set_tls(0xaa6964c0) = 0
[pid 6261] mprotect(0xaa7f8000, 8192, PROT_READ) = 0
[pid 6261] mprotect(0xaa6a9000, 4096, PROT_READ) = 0
[pid 6261] mprotect(0xaa81d000, 4096, PROT_READ) = 0
[pid 6261] mprotect(0xaa70f000, 4096, PROT_READ) = 0
[pid 6261] mprotect(0xaa836000, 4096, PROT_READ) = 0
[pid 6261] mprotect(0xaa83b000, 4096, PROT_READ) = 0
[pid 6261] mprotect(0xaa857000, 4096, PROT_READ) = 0
[pid 6261] mprotect(0x301d000, 28672, PROT_READ) = 0
[pid 6261] mprotect(0xaa882000, 4096, PROT_READ) = 0
[pid 6261] munmap(0xaa872000, 53644) = 0
[pid 6261] set_tid_address(0xaa696068) = 6261
[pid 6261] set_robust_list(0xaa696070, 12) = 0
[pid 6261] rt_sigaction(SIGRTMIN, {sa_handler=0xaa8021bd, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0xaa7372a1}, NULL, 8) = 0
[pid 6261] rt_sigaction(SIGRT_1, {sa_handler=0xaa80225d, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0xaa7372a1}, NULL, 8) = 0
[pid 6261] rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
[pid 6261] ugetrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
[pid 6261] statfs("/sys/fs/selinux", {f_type=SELINUX_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NOEXEC|ST_RELATIME}) = 0
[pid 6261] statfs("/sys/fs/selinux", {f_type=SELINUX_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NOEXEC|ST_RELATIME}) = 0
[pid 6261] brk(NULL) = 0x4b8d000
[pid 6261] brk(0x4bae000) = 0x4bae000
[pid 6261] access("/etc/selinux/config", F_OK) = 0
[pid 6261] prctl(PR_SET_NAME, "head") = 0
[pid 6261] prctl(PR_SET_KEEPCAPS, 3196737360) = -1 EINVAL (Invalid argument)
,
Aug 23
Actually, the plot thickens: that failure isn't enough to cause the Mosys command failure and it will gracefully recover. The root cause isn't related to Mosys at all. On recent builds, "vpd_get_value rlz_brand_code" (which is called by Mosys) fails on a Minnie device:
localhost ~ # VPD_IGNORE_CACHE=1 vpd_get_value rlz_brand_code
[WARN] VPD partition not formatted. It's fine.
findString(): Vpd data 'rlz_brand_code' was not found.
localhost ~ # vpd_get_value rlz_brand_code
localhost ~ #
This was on chromeos1-row3-rack10-host3.cros in the lab.
OS version built from head today:
# cat /etc/lsb-release
CHROMEOS_RELEASE_APPID={432FF9F1-4D2E-7E74-6F98-32E56E904BFB}
CHROMEOS_BOARD_APPID={432FF9F1-4D2E-7E74-6F98-32E56E904BFB}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
DEVICETYPE=CHROMEBOOK
CHROMEOS_ARC_VERSION=4972417
CHROMEOS_ARC_ANDROID_SDK_VERSION=25
GOOGLE_RELEASE=10994.0.2018_08_23_1349
CHROMEOS_DEVSERVER=http://jclinton-desktop.bld.corp.google.com:8080
CHROMEOS_RELEASE_BOARD=veyron_minnie
CHROMEOS_RELEASE_BUILD_NUMBER=10994
CHROMEOS_RELEASE_BRANCH_NUMBER=0
CHROMEOS_RELEASE_CHROME_MILESTONE=70
CHROMEOS_RELEASE_PATCH_NUMBER=2018_08_23_1349
CHROMEOS_RELEASE_TRACK=testimage-channel
CHROMEOS_RELEASE_DESCRIPTION=10994.0.2018_08_23_1349 (Test Build - jclinton) developer-build veyron_minnie
CHROMEOS_RELEASE_BUILD_TYPE=Test Build - jclinton
CHROMEOS_RELEASE_NAME=Chromium OS
CHROMEOS_RELEASE_VERSION=10994.0.2018_08_23_1349
CHROMEOS_AUSERVER=http://jclinton-desktop.bld.corp.google.com:8080/update
,
Aug 23
Re c#9, is your VPD actually valid though? That warning message "[WARN] VPD partition not formatted. It's fine." usually shows up when the VPD isn't present.
,
Aug 23
If it's in the lab, shouldn't it be? stephenlin@, can you confirm that your device has the same output for vpd_get_value?
,
Aug 23
FWIW, on my nocturne running R70-10984.0.0 test image with an intact VPD, I can read VPD keys with no error messages.
,
Aug 23
Also, Samantha found another system call that might be needed in some cases and that's being added in https://chromium-review.googlesource.com/c/chromiumos/platform/mosys/+/1187518 . So, we have two issues here.
,
Aug 23
Running "vpd_get_value rlz_brand_code" on minnie with 10988.0.0 I get "ASUF" which is the RO_VPD value of rlz_brand_code. Verified "mosys platform brand" continues to give an application error. Didn't notice it before, but it also says "Bad system call (core dumped)".
,
Aug 26
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/mosys/+/6d9f54709809fc0020425f7e89ed758d526f5841 commit 6d9f54709809fc0020425f7e89ed758d526f5841 Author: Samantha Miller <samanthamiller@google.com> Date: Sun Aug 26 23:23:42 2018 mosys: Add _llseek to allow mosys platform brand on minnie On minnie, 'mosys platform' commands need to use _llseek, so add it to the seccomp filter. BUG= chromium:874824 TEST=none Change-Id: I1f49ef5215b86fb554afcdd8ce105555d484dcf8 Reviewed-on: https://chromium-review.googlesource.com/1187518 Reviewed-by: Jason Clinton <jclinton@chromium.org> Tested-by: Samantha Miller <samanthamiller@google.com> [modify] https://crrev.com/6d9f54709809fc0020425f7e89ed758d526f5841/seccomp/mosys-seccomp-arm.policy
,
Aug 30
Results look much better |
||||
►
Sign in to add a comment |
||||
Comment 1 by stephenlin@chromium.org
, Aug 16Status: Assigned (was: Untriaged)