New issue
Advanced search Search tips

Issue 892390 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 23
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Kernel-4.19: ARC++ does not start

Project Member Reported by groeck@chromium.org, Oct 4

Issue description

Possibly relevant error messages:

NOTICE arc-kmsg-logger[3127]: <5>init: write_file: Unable to write to '/proc/1/oom_score_adj': Permission denied
NOTICE arc-kmsg-logger[3127]: <5>init: write_file: Unable to open '/proc/sys/kernel/sysrq': Permission denied
NOTICE arc-kmsg-logger[3127]: <5>init: write_file: Unable to open '/proc/sys/kernel/selinux_enforcing': No such file or directory


NOTICE arc-kmsg-logger[4305]: <3>init: Failed to set pid 28482 ioprio = 1,4: Operation not permitted

ArtContainerLog:ERROR cutils-trace Error opening container trace socket: No such file or directory (2)


2018-10-04T17:19:16.138137+00:00 ERR arc-container[3650]: [ERROR:run_oci.cc(437)] Timeout exceeded running precreate hook Hook{path="/usr/sbin/arc-setup", args=[arc-setup, --log_tag=arc-setup-precreate, --mode=setup]}
2018-10-04T17:19:16.152404+00:00 WARNING arc-container[3650]: [WARNING:run_oci.cc(469)] Error running precreate hooks
2018-10-04T17:19:16.153017+00:00 ERR arc-container[3650]: [ERROR:run_oci.cc(558)] Error running precreate hooks

 
Attached arc.log from login session. Reason for failure is unknown.

arc.log
64.4 KB View Download

Comment 2 Deleted

Possibly relevant: /var/log/ui/ui.LATEST 

[3638:3638:1016/093331.854341:ERROR:child_process_security_policy_impl.cc(1239)] Invalid isolated origin: https://appspot.com
[3638:3638:1016/093331.854773:ERROR:child_process_security_policy_impl.cc(1239)] Invalid isolated origin: https://googleapis.com
[3638:3638:1016/093331.855132:ERROR:child_process_security_policy_impl.cc(1239)] Invalid isolated origin: https://withgoogle.com
[3638:3638:1016/093331.855457:ERROR:child_process_security_policy_impl.cc(1239)] Invalid isolated origin: https://withgoogle.com
[3638:3638:1016/093332.329168:ERROR:configuration_policy_handler_list.cc(91)] Unknown policy: NTPContentSuggestionsEnabled
[3638:3638:1016/093332.984542:ERROR:input_method_manager_impl.cc(1113)] IMEEngine for "jkghodnilhceideoidjikpgommlajknk" is not registered
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
[3638:3638:1016/093334.127230:ERROR:configuration_policy_handler_list.cc(91)] Unknown policy: NTPContentSuggestionsEnabled
[3674:3681:1016/093334.490843:ERROR:drm_gpu_display_manager.cc(209)] There is no display with ID 21691195950291200
  device-enumerator: scanning /sys/class
[3638:3638:1016/093335.232333:ERROR:gpu_interface_provider.cc(87)] Not implemented reached in virtual void content::GpuInterfaceProvider::RegisterOzoneGpuInterfaces(service_manager::Bin
derRegistry *)
[3674:3681:1016/093335.403982:ERROR:drm_gpu_display_manager.cc(209)] There is no display with ID 21691195950291200
[3674:3681:1016/093335.404450:ERROR:drm_gpu_display_manager.cc(209)] There is no display with ID 21691195950291200
[3638:3695:1016/093338.256836:ERROR:external_policy_data_updater.cc(263)] The fetched data doesn't match the expected hash.
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
[3638:3638:1016/093356.870837:ERROR:object_proxy.cc(621)] Failed to call method: org.chromium.SessionManagerInterface.StartArcMiniContainer: object_path= /org/chromium/SessionManager: o
rg.chromium.SessionManagerInterface.ContainerStartupFail: Starting Android container failed.
[3638:3638:1016/093818.719274:ERROR:textfield.cc(1762)] Not implemented reached in virtual bool views::Textfield::ShouldDoLearning()
[3638:3638:1016/093819.206680:ERROR:device_event_log_impl.cc(159)] [09:38:19.206] USB: usb_service_linux.cc:220 Got duplicate add event for path: /dev/bus/usb/001/005
[3638:3638:1016/093819.338049:ERROR:device_event_log_impl.cc(159)] [09:38:19.338] USB: usb_service_linux.cc:220 Got duplicate add event for path: /dev/bus/usb/001/003
[3638:3638:1016/093820.588937:ERROR:device_event_log_impl.cc(159)] [09:38:20.588] USB: usb_service_linux.cc:220 Got duplicate add event for path: /dev/bus/usb/001/005
[3638:3638:1016/093820.616801:ERROR:device_event_log_impl.cc(159)] [09:38:20.616] USB: usb_service_linux.cc:220 Got duplicate add event for path: /dev/bus/usb/001/003
[3638:3638:1016/093820.733816:ERROR:device_event_log_impl.cc(159)] [09:38:20.733] USB: usb_service_linux.cc:220 Got duplicate add event for path: /dev/bus/usb/002/003
[3638:3638:1016/094346.257950:ERROR:model_association_manager.cc(411)] Arc Package datatype error was encountered: Association timed out.
[3638:3653:1016/104801.649551:ERROR:mcs_client.cc(755)] Received close command, resetting connection.
[3638:3653:1016/114124.758364:ERROR:mcs_client.cc(755)] Received close command, resetting connection.

ui.LATEST
10.4 KB Download
The problem is seen with all trybots, so it is definitely a persistent and release related issue.

Some more log data:

[ 1466.959431] audit: type=1401 audit(1539719160.523:1526): op=security_bounded_transition seresult=denied oldcontext=u:r:cros_session_manager:s0 newcontext=u:r:chromeos:s0
[ 1467.298748] audit: type=1401 audit(1539719160.862:1527): op=security_bounded_transition seresult=denied oldcontext=u:r:cros_session_manager:s0 newcontext=u:r:chromeos:s0
[ 1467.321338] audit: type=1401 audit(1539719160.885:1528): op=security_bounded_transition seresult=denied oldcontext=u:r:cros_session_manager:s0 newcontext=u:r:chromeos:s0


How can I compile the said 4.19 kernel?
Where is the branch?
https://docs.google.com/document/d/1k4D8O4R0r9RBo8ADLtE-WswHFqRrkAGa6F5JL4XScB0/edit?usp=sharing
should help. Look for "Working in chroot".
The source will then be in src/third_party/kernel/next, and it can be built using emerge as chromeos-kernel-next.

The following messages are only seen when booting 4.19 kernels.

NOTICE kernel: [   23.441372] audit: type=1401 audit(1539814158.805:3): op=security_bounded_transition seresult=denied oldcontext=u:r:cros_session_manager:s0 newcontext=u:r:chromeos:s0
NOTICE kernel: [   23.595650] audit: type=1401 audit(1539814158.960:4): op=security_bounded_transition seresult=denied oldcontext=u:r:cros_session_manager:s0 newcontext=u:r:chromeos:s0

This is seen even if selinux is in permissive mode.

... and it turns out that starting ARC++ does make progress after applying the following source code change/hack.

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index f3def298a90e9..7919e950f0c62 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -870,7 +870,7 @@ int security_bounded_transition(struct selinux_state *state,
 	int index;
 	int rc;
 
-	if (!state->initialized)
+	if (true || !state->initialized)
 		return 0;
 
 	read_lock(&state->ss->policy_rwlock);

There are still other problems; after the above hack, log message sequence looks as follows.
2018-10-19T17:30:05.289886+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Starting service 'zygote'...
2018-10-19T17:30:05.300530+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'cameraserver' (pid 154) killed by signal 9
2018-10-19T17:30:05.301096+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'cameraserver' (pid 154) killing any children in process group
2018-10-19T17:30:05.302431+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'media' (pid 156) killed by signal 9
2018-10-19T17:30:05.304515+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'media' (pid 156) killing any children in process group
2018-10-19T17:30:05.305590+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'netd' (pid 157) killed by signal 9
2018-10-19T17:30:05.306085+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'netd' (pid 157) killing any children in process group
2018-10-19T17:30:05.308833+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'audioserver' (pid 153) killed by signal 9
2018-10-19T17:30:05.308996+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'audioserver' (pid 153) killing any children in process group
2018-10-19T17:30:05.309135+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Untracked pid 162 killed by signal 9
2018-10-19T17:30:05.944772+00:00 NOTICE arc-sdcard[4033]: missing "/data/system/packages.list"; retrying
2018-10-19T17:30:06.310925+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Starting service 'audioserver'...
2018-10-19T17:30:06.313425+00:00 NOTICE arc-kmsg-logger[3940]: <3>init: Failed to set pid 230 ioprio = 1,4: Operation not permitted
2018-10-19T17:30:06.313596+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Starting service 'cameraserver'...
2018-10-19T17:30:06.316228+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Starting service 'mediaextractor'...
2018-10-19T17:30:06.318976+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Starting service 'media'...
2018-10-19T17:30:06.322966+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Starting service 'netd'...
2018-10-19T17:30:06.326085+00:00 NOTICE arc-kmsg-logger[3940]: <3>init: couldn't write 231 to /dev/cpuset/camera-daemon/tasks: No such file or directory
2018-10-19T17:30:06.326786+00:00 NOTICE arc-kmsg-logger[3940]: <3>init: Failed to set pid 231 ioprio = 1,4: Operation not permitted
2018-10-19T17:30:06.328522+00:00 NOTICE arc-kmsg-logger[3940]: <3>init: Failed to set pid 232 ioprio = 1,4: Operation not permitted
2018-10-19T17:30:06.330803+00:00 NOTICE arc-kmsg-logger[3940]: <3>init: Failed to set pid 233 ioprio = 1,4: Operation not permitted
2018-10-19T17:30:07.326980+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Starting service 'logd'...
2018-10-19T17:30:07.414379+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'logd' (pid 275) exited with status 255
2018-10-19T17:30:07.414602+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'logd' (pid 275) killing any children in process group
2018-10-19T17:30:07.498374+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'mediaextractor' (pid 232) killed by signal 6
2018-10-19T17:30:07.512021+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'mediaextractor' (pid 232) killing any children in process group
2018-10-19T17:30:08.945038+00:00 NOTICE arc-sdcard[4033]: missing "/data/system/packages.list"; retrying
2018-10-19T17:30:09.901464+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'zygote' (pid 224) killed by signal 9
2018-10-19T17:30:09.901624+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'zygote' (pid 224) killing any children in process group
2018-10-19T17:30:09.901813+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: write_file: Unable to open '/sys/android_power/request_state': No such file or directory
2018-10-19T17:30:09.901941+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: write_file: Unable to open '/sys/power/state': Permission denied
2018-10-19T17:30:09.902066+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'audioserver' is being killed...
2018-10-19T17:30:09.903328+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'cameraserver' is being killed...
2018-10-19T17:30:09.907727+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'media' is being killed...
2018-10-19T17:30:09.908234+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Service 'netd' is being killed...
2018-10-19T17:30:09.937619+00:00 NOTICE arc-kmsg-logger[3940]: <5>init: Starting service 'zygote'...

Additional data points:

- "Failed to set pid 231 ioprio = 1,4: Operation not permitted" is also seen with chromeos-4.14 and is thus irrelevant for the problem.
- "arc-sdcard[xxxx]: missing "/data/system/packages.list"; retrying" is seen as well and also seems to be irrelevant.
- "init: Service 'logd' (pid 275) exited with status 255" is a problem.
- "init: Service 'mediaextractor' (pid 232) killed by signal 6" may also be a problem.

Kernel built with gcc:

2018-10-19T20:00:58.382365+00:00 INFO arc-setup-precreate[3714]: [INFO:main.cc(69)] Starting /usr/sbin/arc-setup --log_tag=arc-setup-precreate --mode=setup
...
2018-10-19T20:01:18.179407+00:00 ERR arc-container[3672]: [ERROR:run_oci.cc(437)] Timeout exceeded running precreate hook Hook{path="/usr/sbin/arc-setup", args=[arc-setup, --log_tag=arc-setup-precreate, --mode=setup]}
2018-10-19T20:01:18.200986+00:00 WARNING arc-container[3672]: [WARNING:run_oci.cc(469)] Error running precreate hooks
2018-10-19T20:01:18.201102+00:00 ERR arc-container[3672]: [ERROR:run_oci.cc(558)] Error running precreate hooks

---------------
Kernel built with clang (otherwise identical):

2018-10-19T18:36:33.200488+00:00 INFO arc-setup-precreate[3739]: [INFO:main.cc(69)] Starting /usr/sbin/arc-setup --log_tag=arc-setup-precreate --mode=setup
...
2018-10-19T18:36:49.432867+00:00 INFO arc-setup-precreate[3739]: [INFO:main.cc(74)] /usr/sbin/arc-setup --log_tag=arc-setup-precreate --mode=setup took 16234ms

This was observed several times in a row. Conclusion is that ARC++ won't start at all if the kernel is built with gcc (or, rather, our old version of gcc).

Here is my initial observation, if attach such patch for debugging, for 4.19 kernel,
it showed a weird option, (which should be PR_SET_VMA), and that weird option will change from process to process, it looks like something screw up the system call stack sometimes...


diff --git a/kernel/sys.c b/kernel/sys.c
index 9be4beaabead..78859f33d0a3 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2425,6 +2425,8 @@ int ksys_prctl(int option, unsigned long arg2, unsigned long arg3,
 	unsigned char comm[sizeof(me->comm)];
 	long error;
 
+        printk(KERN_ERR "try %s %x\n", current->comm, option);
+
 	error = security_task_prctl(option, arg2, arg3, arg4, arg5);
 	if (error != -ENOSYS)
 		return error;
@@ -2646,6 +2648,7 @@ int ksys_prctl(int option, unsigned long arg2, unsigned long arg3,
 		error = prctl_set_vma(arg2, arg3, arg4, arg5);
 		break;
 	default:
+		printk(KERN_ERR "unknown %s %x\n", current->comm, option);
 		error = -EINVAL;
 		break;


Cc: lepton@chromium.org
lepton@: Do you see than when building the image with gcc, clang, or both ?

Re #13, it's gcc, let me try clang
same with clang.
Excellent find. Looks like something may be wrong with alt-syscall handing of 32-bit applications.

Re #16, actually I was thinking about that and I tried novato-arc64 but it also doesn't work.
#17: Weird, but thanks for checking!
The values as seen by alt_sys_prctl() are just as bad bad:

[   25.254142] pid 2192: arc-setup: sys_prctl: option=dc37f58 arg2=4 arg3=dffffc0000000000 arg4=1ffff10021b86fd4 arg5=ffff88014d50a288
[   25.267639] pid 2192: arc-setup: unknown dc37f58

I guess something must have changed in how parameters are passed from Assembler to C code.

Confirmed. Parameter passing from do_syscall_64() and do_syscall_32_irqs_on() to the actual syscalls has changed. Parameters are correct in registers in do_syscall_32_irqs_on() but messed up in alt_sys_prctl().

[   23.147943] prctl(32): bx=53564d41 cx=0
[   23.184181] pid 2114: patchoat: sys_prctl: option=10c9ff58 arg2=8 arg3=0 arg4=5af9619cf9b25d00 arg5=1
[   23.268870] pid 2114: patchoat: unknown 10c9ff58

Commit ebeb8c82ffaf ("syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32") and fa697140f9a2 ("syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls") are likely relevant.

As of chromeos-4.19-merge-181021, ARC++ now starts (or at least I think so, from trybot results). However, adb can still not connect to it. Thanks a lot to lepton@ for finding the prctl problem!

Re #21,

For that adb issue, it seems there is something wrong with arc++ network setup so it has a broken network setup then adb fail to connect.  Inside a "good" chromebook android session, after run "ip rule", I got:

0:	from all lookup local 
10000:	from all fwmark 0xc0000/0xd0000 lookup legacy_system 
10500:	from all oif arc0 uidrange 0-0 lookup arc0 
13000:	from all fwmark 0x10063/0x1ffff lookup local_network 
13000:	from all fwmark 0x10064/0x1ffff lookup arc0 
14000:	from all oif arc0 lookup arc0 
15000:	from all fwmark 0x0/0x10000 lookup legacy_system 
16000:	from all fwmark 0x0/0x10000 lookup legacy_network 
17000:	from all fwmark 0x0/0x10000 lookup local_network 
19000:	from all fwmark 0x64/0x1ffff lookup arc0 
22000:	from all fwmark 0x0/0xffff lookup arc0 
23000:	from all fwmark 0x0/0xffff uidrange 0-0 lookup main 
32000:	from all unreachable

While on 4.19 kernel, all rules related with arc0 are missing. I can manually
enable network by running "ip rule add from all lookup main", then adb works
for me.  But we still need to figure out the root cause of broken arc++ networking setup in 4.19
Re #23,

It seems netd has some trouble to set iptables rule.

Update:

A difference between 4.14 and 4.19 kernel, inside android container (by running android-sh),  on 4.19:

ip6tables -L                                                                           
ip6tables v1.6.1: can't initialize ip6tables table `filter': Permission denied (you must be root)
Perhaps ip6tables or your kernel needs to be upgraded.

While on 4.14 it works.
Also, the works out side of container for both 4.14/4.19

On 4.19 inside container:
socket(PF_INET6, SOCK_RAW, IPPROTO_RAW) = -1 EPERM (Operation not permitted)
It seems we did a wrong merge for https://github.com/powerc20/raw-android-patches/blob/master/0064-net-Replace-AID_NET_RAW-checks-with-capable-CAP_NET_.patch

This is the patch on top of merge-181021 will fix it by restoring that line
to previous version same with chromeos-4.14 and chromeos-4.4 branch. But actually I think perhaps we should change to use android_ns_capable for that
line?  Since IPV4 version of this functions is using that.

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 5a9ad00677f5..3065ac125c37 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -172,7 +172,8 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
        }
 
        err = -EPERM;
-       if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW))
+       if (sock->type == SOCK_RAW && !kern &&
+           !ns_capable(net->user_ns, CAP_NET_RAW))
                goto out_rcu_unlock;
 
        sock->ops = answer->ops;



It seems in 4.19, we missed 

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/319223

for net/ipv6/af_inet6.c, that's the reason of adb brokeness.
FYI,  the whole story of that ns_capable check:

We change capable to ns_capable in https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/319223,

Then we change ns_capable to android_ns_capable for ipv4 in
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/802777

In 4.19 merge, we missed review.googlesource.com/c/chromiumos/third_party/kernel/+/319223,

patch in #26 will fix it.



lepton@: Again thanks a lot for tracking this down! I'll try using android_ns_capable(); hopefully that should work as well since it is more inclusive than ns_capable().

Status: Fixed (was: Assigned)
trybot shows that arc++ now starts. trybot summary (from banon-paladin-trybot):

HWTest [bvt-arc]
TestFailure
stdout
Link to suite
[Test-Logs]: cheets_CTS_N.7.1_r22.x86.CtsAccountManagerTestCases: WARN: Passed: after 1 retries passing 87 tests, waived=0. run(p=84, f=3, w=0) retry(p=87, f=0, w=0)
[Test-History]: cheets_CTS_N.7.1_r22.x86.CtsAccountManagerTestCases
[Test-Logs]: cheets_CameraOrientation: retry_count: 1, GOOD: completed successfully
[Test-History]: cheets_CameraOrientation
[Test-Logs]: cheets_SELinuxTest: retry_count: 2, FAIL: Unexpected value: Expected: Enforcing, Actual: Permissive
[Test-History]: cheets_SELinuxTest
[Test-Logs]: cheets_CTS_N.7.1_r22.arm.CtsAccountManagerTestCases: retry_count: 1, WARN: Passed: after 1 retries passing 87 tests, waived=0. run(p=86, f=1, w=0) retry(p=87, f=0, w=0)
[Test-History]: cheets_CTS_N.7.1_r22.arm.CtsAccountManagerTestCases

Sign in to add a comment