kip provision failed | device didn't respond to initial ssh | fixed by servoreset |
|||||||||||||||||||||||||
Issue descriptionBuild https://luci-milo.appspot.com/buildbot/chromeos/kip-paladin/832 Provision job http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack12-host1/519946-provision/ In particular, the autoupdate log is very short: 2017/02/24 21:18:43.890 DEBUG| cros_update:0193| Remote device chromeos4-row1-rack12-host1 is connected 2017/02/24 21:18:43.890 DEBUG| cros_build_lib:0564| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmp6Ee7ph/testing_rsa root@chromeos4-row1-rack12-host1 -- mkdir -p /mnt/stateful_partition/unencrypted/preserve/cros-update '&&' mktemp -d '--tmpdir=/mnt/stateful_partition/unencrypted/preserve/cros-update' 2017/02/24 21:20:47.047 ERROR| remote_access:0856| Error connecting to device chromeos4-row1-rack12-host1 2017/02/24 21:20:47.048 DEBUG| cros_update:0253| Error happens in CrOS auto-update: SSHConnectionError('ssh: connect to host chromeos4-row1-rack12-host1 port 22: Connection timed out\r\n',) Repair job: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row1-rack12-host1/519955-repair/20172402212328/ status log indicates that required power cycle START ---- repair timestamp=1488000213 localtime=Feb 24 21:23:33 GOOD ---- verify.ssh timestamp=1488000214 localtime=Feb 24 21:23:34 GOOD ---- verify.brd_config timestamp=1488000214 localtime=Feb 24 21:23:34 GOOD ---- verify.ser_config timestamp=1488000214 localtime=Feb 24 21:23:34 GOOD ---- verify.job timestamp=1488000215 localtime=Feb 24 21:23:35 GOOD ---- verify.servod timestamp=1488000219 localtime=Feb 24 21:23:39 GOOD ---- verify.pwr_button timestamp=1488000219 localtime=Feb 24 21:23:39 GOOD ---- verify.lid_open timestamp=1488000219 localtime=Feb 24 21:23:39 GOOD ---- verify.update timestamp=1488000222 localtime=Feb 24 21:23:42 GOOD ---- verify.PASS timestamp=1488000222 localtime=Feb 24 21:23:42 FAIL ---- verify.ssh timestamp=1488000924 localtime=Feb 24 21:35:24 No answer to ping from chromeos4-row1-rack12-host1 START ---- repair.rpm timestamp=1488000924 localtime=Feb 24 21:35:24 FAIL ---- repair.rpm timestamp=1488001169 localtime=Feb 24 21:39:29 chromeos4-row1-rack12-host1 is still offline after powercycling END FAIL ---- repair.rpm timestamp=1488001169 localtime=Feb 24 21:39:29 START ---- repair.sysrq timestamp=1488001169 localtime=Feb 24 21:39:29 FAIL ---- repair.sysrq timestamp=1488001456 localtime=Feb 24 21:44:16 chromeos4-row1-rack12-host1 is still offline after reset. END FAIL ---- repair.sysrq timestamp=1488001456 localtime=Feb 24 21:44:16 START ---- repair.servoreset timestamp=1488001456 localtime=Feb 24 21:44:16 GOOD ---- verify.ssh timestamp=1488001485 localtime=Feb 24 21:44:45 END GOOD ---- repair.servoreset timestamp=1488001485 localtime=Feb 24 21:44:45 GOOD ---- verify.fwstatus timestamp=1488001485 localtime=Feb 24 21:44:45 GOOD ---- verify.good_au timestamp=1488001485 localtime=Feb 24 21:44:45 GOOD ---- verify.writable timestamp=1488001486 localtime=Feb 24 21:44:46 GOOD ---- verify.tpm timestamp=1488001486 localtime=Feb 24 21:44:46 GOOD ---- verify.ext4 timestamp=1488001486 localtime=Feb 24 21:44:46 GOOD ---- verify.power timestamp=1488001486 localtime=Feb 24 21:44:46 GOOD ---- verify.rwfw timestamp=1488001487 localtime=Feb 24 21:44:47 GOOD ---- verify.python timestamp=1488001487 localtime=Feb 24 21:44:47 GOOD ---- verify.cros timestamp=1488001489 localtime=Feb 24 21:44:49 GOOD ---- verify.PASS timestamp=1488001489 localtime=Feb 24 21:44:49 INFO ---- repair timestamp=1488001489 localtime=Feb 24 21:44:49 Can't repair label 'cros-version:kip-paladin/R58-9315.0.0-rc1'. INFO ---- repair timestamp=1488001489 localtime=Feb 24 21:44:49 Can't repair label 'pool:cq'. INFO ---- repair timestamp=1488001489 localtime=Feb 24 21:44:49 Can't repair label 'board:kip'. END GOOD ---- repair timestamp=1488001489 localtime=Feb 24 21:44:49 chromeos4-row1-rack12-host1 repaired successfully Didn't dig further yet as to the cause of the device being hung.
Showing comments 25 - 124
of 124
Older ›
,
Mar 1 2017
#1: All the systems that this problem is showing, does have smsc95xx controller (some also have an additional smsc75xx). But I could not ascertain for sure if it is the same ssh issue. #2: On a separate note, I left the kip I had overnight for a continuous loop of reinstalling between 9310.0.0 -> 9315.0.0 -> 9310.0.0 and somewhere after 90th iteration, the EC went unresponsive (the system does not respond, Alt = x + x + x does not work, Atl + r does not work). (Kip being an old product, so board does not have a servo header to see if EC console works). But when I reboot, as expected, there is no console-ramoops :-( and the last part of var/log/messages (attached) before reboot is not very helpful: ... 2017-03-01T03:34:07.258346-08:00 INFO sshd[4784]: Accepted publickey for root from 172.22.64.13 port 59743 ssh2: RSA SHA256:Fp1qWjFLyK1cTpiI5rdk7iEJwoK9lcnYAgbQtGC3jzU 2017-03-01T03:34:07.309018-08:00 INFO sshd[4784]: Received disconnect from 172.22.64.13 port 59743:11: disconnected by user 2017-03-01T03:34:07.309088-08:00 INFO sshd[4784]: Disconnected from 172.22.64.13 port 59743 2017-03-01T03:34:07.622225-08:00 INFO sshd[4788]: Accepted publickey for root from 172.22.64.13 port 59744 ssh2: RSA SHA256:Fp1qWjFLyK1cTpiI5rdk7iEJwoK9lcnYAgbQtGC3jzU 2017-03-01T03:34:07.683773-08:00 INFO sshd[4788]: Received disconnect from 172.22.64.13 port 59744:11: disconnected by user 2017-03-01T03:34:07.683904-08:00 INFO sshd[4788]: Disconnected from 172.22.64.13 port 59744 2017-03-01T03:34:07.954961-08:00 INFO sshd[4791]: Accepted publickey for root from 172.22.64.13 port 59745 ssh2: RSA SHA256:Fp1qWjFLyK1cTpiI5rdk7iEJwoK9lcnYAgbQtGC3jzU 2017-03-01T03:34:08.044863-08:00 INFO sshd[4791]: Received disconnect from 172.22.64.13 port 59745:11: disconnected by user 2017-03-01T03:34:08.044942-08:00 INFO sshd[4791]: Disconnected from 172.22.64.13 port 59745 2017-03-01T03:34:08.066729-08:00 WARNING kernel: [ 391.666654] init: debugd main process (940) killed by TERM signal 2017-03-01T03:34:08.268632-08:00 ERR shill[1172]: [ERROR:object_proxy.cc(582)] Failed to call method: org.chromium.dhcpcd.Release: object_path= /org/chromium/dhcpcd: org.freedesktop.DBus.Error.ServiceUnknown: The name :1.46 was not provided by any .service files 2017-03-01T03:34:08.268680-08:00 ERR shill[1172]: [ERROR:dbus_method_invoker.h(111)] CallMethodAndBlockWithTimeout(...): Domain=dbus, Code=org.freedesktop.DBus.Error.ServiceUnknown, Message=The name :1.46 was not provided by any .service files 2017-03-01T03:34:08.269626-08:00 INFO avahi-daemon[1511]: Withdrawing address record for 100.96.49.86 on eth0. 2017-03-01T03:34:08.269735-08:00 INFO avahi-daemon[1511]: Leaving mDNS multicast group on interface eth0.IPv4 with address 100.96.49.86. 2017-03-01T03:34:08.269827-08:00 INFO avahi-daemon[1511]: Interface eth0.IPv4 no longer relevant for mDNS. 2017-03-01T03:34:08.270712-08:00 INFO kernel: [ 391.871421] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup 2017-03-01T03:34:08.328211-08:00 ERR wpa_supplicant[570]: rmdir[ctrl_interface=/var/run/wpa_supplicant]: Permission denied 2017-03-01T03:34:08.510850-08:00 NOTICE shill_logout_user[4810]: Failed to pop shill user profiles 2017-03-01T03:34:08.517193-08:00 INFO session_manager[895]: [INFO:browser_job.cc(157)] Terminating process: 2017-03-01T03:34:08.517225-08:00 INFO session_manager[895]: [INFO:system_utils_impl.cc(110)] Sending 15 to 942 as 1000 2017-03-01T03:34:09.162755-08:00 INFO session_manager[895]: [INFO:session_manager_service.cc(488)] SessionManagerService quitting run loop 2017-03-01T03:34:09.163005-08:00 INFO session_manager[895]: [INFO:session_manager_service.cc(214)] SessionManagerService exiting 2017-03-01T03:34:09.186591-08:00 INFO session_manager[895]: [INFO:policy_service.cc(202)] Persisted policy to disk. (I rebooted here) 2017-03-01T09:53:30.366267-08:00 INFO kernel: [ 0.000000] Initializing cgroup subsys cpuset 2017-03-01T09:53:30.367751-08:00 INFO kernel: [ 0.000000] Initializing cgroup subsys cpu 2017-03-01T09:53:30.367757-08:00 INFO kernel: [ 0.000000] Initializing cgroup subsys cpuacct 2017-03-01T09:53:30.367760-08:00 NOTICE kernel: [ 0.000000] Linux version 4.4.44-07219-g1abd782 (chrome-bot@cros-beefy4-c2) (gcc version 4.9.x 20150123 (prerelease) (4.9.2_cos_gg_4.9.2-r150-41f3e25635616c067b9ee272304e6f86ac8ee9db_4.9.2-r150) ) #1 SMP PREEMPT Fri Feb 24 20:36:00 PST 2017 2017-03-01T09:53:30.367763-08:00 INFO kernel: [ 0.000000] Command line: cros_secure console= loglevel=7 init=/sbin/init cros_secure oops=panic panic=-1 root=/dev/dm-0 rootwait ro dm_verity.error_behavior=3 dm_verity.max_bios=-1 dm_verity.dev_wait=1 dm="1 vroot none ro 1,0 3584000 verity payload=PARTUUID=050379b1-58a1-ed46-9981-a77ee66b71aa/PARTNROFF=1 hashtree=PARTUUID=050379b1-58a1-ed46-9981-a77ee66b71aa/PARTNROFF=1 hashstart=3584000 alg=sha1 root_hexdigest=d59a1e1f4a88ebca130fc55e70bfc53e3b91d87e salt=bdb67d8ea702870cc2fc6a10ef1e772ab16389d2f78c991d77393c25fbe11a73" noinitrd cros_debug vt.global_cursor_default=0 kern_guid=050379b1-58a1-ed46-9981-a77ee66b71aa add_efi_memmap boot=local noresume noswap i915.modeset=1 tpm_tis.force=1 tpm_tis.interrupts=0 nmi_watchdog=panic,lapic ... This behaviour might match this bug, so I'm going to try to reproduce this again. However, is there a way to confirm that this issue was present (or not present) say, a month ago? Buildbot seems to maintain info for only last 200 builds, and looking at that seems to indicate that this has been always flaky.
,
Mar 2 2017
,
Mar 2 2017
OK, I've been looking at this (looking at the CQ failures) and I think that we're dealing with 2 issues here: 1) For some reason the Kip reaches a wedged state. The kernel is defnitely not responding (autotest tries to reset using sysrq aka Alt+VolUp+x+x+x, but it does not come back to life). This is the issue that this bug intends to track. 2) Autotest does a servoreset, which I looked at the code and it seems to be rebooting both the EC and the AP, which should definitely bring it back to life. But in some cases, it doesn't, and then it goes on to do a recovery using USB which works. I believe the reason for this is that we may be hitting the issue 677572 , because after a servo reset, we give up in < 5 min if the DUT does not respond, but in case of USB reset, we wait until ~9-10 mins? As below. FAIL ---- verify.ssh timestamp=1488180035 localtime=Feb 26 23:20:35 No answer to ping from chromeos4-row1-rack12-host9 START ---- repair.rpm timestamp=1488180035 localtime=Feb 26 23:20:35 FAIL ---- repair.rpm timestamp=1488180282 localtime=Feb 26 23:24:42 chromeos4-row1-rack12-host9 is still offline after powercycling END FAIL ---- repair.rpm timestamp=1488180282 localtime=Feb 26 23:24:42 START ---- repair.sysrq timestamp=1488180282 localtime=Feb 26 23:24:42 FAIL ---- repair.sysrq timestamp=1488180571 localtime=Feb 26 23:29:31 chromeos4-row1-rack12-host9 is still offline after reset. END FAIL ---- repair.sysrq timestamp=1488180571 localtime=Feb 26 23:29:31 START ---- repair.servoreset timestamp=1488180571 localtime=Feb 26 23:29:31 FAIL ---- repair.servoreset timestamp=1488180797 localtime=Feb 26 23:33:17 chromeos4-row1-rack12-host9 is still offline after reset. END FAIL ---- repair.servoreset timestamp=1488180797 localtime=Feb 26 23:33:17 START ---- repair.firmware timestamp=1488180797 localtime=Feb 26 23:33:17 FAIL ---- repair.firmware timestamp=1488180797 localtime=Feb 26 23:33:17 Firmware repair is not applicable to host chromeos4-row1-rack12-host9. END FAIL ---- repair.firmware timestamp=1488180797 localtime=Feb 26 23:33:17 START ---- repair.usb timestamp=1488180797 localtime=Feb 26 23:33:17 GOOD ---- verify.ssh timestamp=1488181345 localtime=Feb 26 23:42:25 GOOD ---- verify.writable timestamp=1488181346 localtime=Feb 26 23:42:26 GOOD ---- verify.tpm timestamp=1488181346 localtime=Feb 26 23:42:26 GOOD ---- verify.good_au timestamp=1488181346 localtime=Feb 26 23:42:26 GOOD ---- verify.ext4 timestamp=1488181346 localtime=Feb 26 23:42:26 GOOD ---- verify.power timestamp=1488181347 localtime=Feb 26 23:42:27 GOOD ---- verify.rwfw timestamp=1488181347 localtime=Feb 26 23:42:27 GOOD ---- verify.python timestamp=1488181347 localtime=Feb 26 23:42:27 GOOD ---- verify.cros timestamp=1488181349 localtime=Feb 26 23:42:29 END GOOD ---- repair.usb timestamp=1488181349 localtime=Feb 26 23:42:29 Coming back to the problem this bug is meant to track, unfortunately I've been looking and I have not either not found any logs (looking at /var/log/messages, dmesgs, console-ramoops) from the time the DUT was healthy to the time DUT stopped responding, or none of the logs seem to indicate anything wrong, or give any clues about what could have gone wrong. The closest I've found is this, where the DUT seemed to be working at the begining of the test, but went unresponsive sometime while the test was running: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/103440384-chromeos-test/ Unfortunately there are no logs :-(. In summary: A) I have not found any useful logs thus far. I'll still keep on looking for CQ failures to scrape out any logs that may indicate a problem. B) On my end, I did run into an issue that looked like an EC unresponsive when I put a Kip DUT under a continuous new image installation loop. On reboot, no interesting logs were there. This behaviour would match the problem symptoms, but given this product has been shipping for ages, I'm hesitant to blame the EC. Also, this issue should be showing up on canaries also in that case. C) I will attempt to take a look at the failed CQ builds, and see if I can identify a bad CL that is common across the failed CQ cases. (Question: is there a way to reverse engineer the uberchrome paladin build link, from dut-status provisioning/repair/test job?) I'll be happy to pick up any pointers that anyone may have.
,
Mar 2 2017
Update: I was again able to reproduce the EC freeze. It was exactly like #25 above. Logs attached again. Again, no signs of any issues in the logs, and the system/EC suddenly goes unresponsive. Trying Alt+VolUp+r (warm reboot via EC) does not work. And need to do a EC reset (refresh + power) in order to revive. I'm going to try and reproduce it one more time, before requesting some one from EC team to take a look.
,
Mar 2 2017
OK, I reproduced the EC freeze again, this time in only 12 iterations of reinstalling the image. I think some one from EC team needs to take a look. Sameer, do you know who can help?
,
Mar 2 2017
Unfortunately the board I have does not have a servo header, so I couldn't look at the EC console.My observations are: * the Alt+VolUp+r combo does not work when it is wedged (Ir works when the system is responding). * The system logs after I recover (using EC reboot), show no indication of anything that might have gone wrong in AP software. Attaching the latest logs again
,
Mar 3 2017
Reproduced the issue yet again - after 34 iterations this time, log attached. Same behavior - Logs have nothing - and Alt+VolUp+r does not work. One another thing that I tried on Sameer's suggestion this time is Alt+VolUp+h (hibernate), when I did this the system did go to hibernate apparently because when I followed it up with power button the system rebooted. Sameer mentioned that this could mean that the EC is responding, but there were some known observations/issues around Intel CPU being in bad state such that PCH does not responding appropriately to the reset request line from the EC. The next steps that I'm planning are: * Running an older image (that has 3.10 kernel) ~ I'll try an image from January. * Getting a servo header installed on the board, to take a look at what is happening on the AP console and the EC console.
,
Mar 6 2017
Some more updates: 1) The issue doe not seem to be reproduced with 3.10 kernel. I loaded a January image and it passed 500 iterations without hitting the issue. It seems something that the kernel 4.4 induces (see next): 2) I got servo header installed on a system, and reproduced the issue with a 4.4 kernel (re-install in a loop). My observations: A. the console is absolutely normal with last messages showing a clean shutdown: [ 365.495608] init: debugd main process (1036) killed by TERM signal [ 373.635849] smsc95xx 1-2.2.1:1.0 eth0: hardware isn't capable of remote wakeup [ 375.332799] init: chapsd main process (1084) killed by TERM signal [ 375.340611] init: powerd main process (943) killed by TERM signal [ 375.349729] init: crash-sender main process (1508) killed by TERM signal [ 375.357743] init: cros-machine-id-regen-periodic main process (1519) killed by TERM signal [ 375.367524] init: mtpd main process (1524) killed by TERM signal [ 375.374674] init: log-rotate main process (1536) killed by TERM signal [ 375.383368] init: cras main process (1579) killed by TERM signal [ 375.390465] init: bluetoothd main process (1583) killed by TERM signal [ 375.404700] init: warn-collector main process (2261) killed by TERM signal [ 375.412939] init: tlsdated main process (2292) killed by TERM signal [ 375.441473] init: cryptohomed main process (1365) killed by TERM signal [ 375.450180] EXT4-fs (mmcblk0p5): re-mounted. Opts: block_validity,barrier,user_xattr,acl [ 375.534309] init: recover_duts main process (2259) killed by TERM signal [ 375.588273] Unsafe core_pattern used with suid_dumpable=2. Pipe handler or fully qualified core dump path required. [ 383.107333] gsmi: Log Shutdown Reason 0x00 [ 383.252172] reboot: Restarting system [ 383.256301] reboot: machine restart B. The EC console is alive but again - no signs of any thing wrong there. Here are the last messages: +[2694.580370 HC 0x01] [2694.581005 HC 0x05] +[2694.584210 HC 0x02] [2694.588470 HC 0x01] [2989.906791 DPTF sensor 1, threshold -73 C, index 0, disabled] [2989.907729 DPTF sensor 1, threshold -73 C, index 1, disabled] [2989.914462 DPTF sensor 2, threshold -73 C, index 0, disabled] [2989.915354 DPTF sensor 2, threshold -73 C, index 1, disabled] [2989.917598 DPTF sensor 3, threshold -73 C, index 0, disabled] [2989.918529 DPTF sensor 3, threshold -73 C, index 1, disabled] [2997.221829 KB disable] [2997.222101 KS disable] [2997.222257 KB IRQ disable] [2997.222488 KB IRQ enable] [2997.250347 chipset_reset(0)] [2997.251127 Port 80: 0x1002][2997.251303 LPC RESET# asserted] [2997.255291 LPC RESET# deasserted] [2997.255464 GPIO_PP3300_LTE_EN 1->0] [2997.266121 Port 80: 0x10][2997.282792 chipset_reset(0)] [2997.285769 GPIO_PP3300_LTE_EN 0->1] [2997.307395 Port 80: 0x2a][2997.308202 HC 0x87] [2997.350240 Port 80: 0x1002][2997.350502 LPC RESET# asserted] [2997.354488 LPC RESET# deasserted] [2997.354689 GPIO_PP3300_LTE_EN 1->0] [2997.365316 Port 80: 0x10][2997.384935 GPIO_PP3300_LTE_EN 0->1] [2997.406576 Port 80: 0x2a][2997.407382 HC 0x87] [2997.409450 Port 80: 0x1002][2997.409626 LPC RESET# asserted] C. "powerinfo" believes system is in S0. Power pins looks OK to me: > > powerindebug power in: 0x00ff debug mask: 0x0000 bit meanings: 0x0001 1 PGOOD_PP1050 0x0002 1 PGOOD_PP3300_PCH 0x0004 1 PGOOD_PP5000 0x0008 1 PGOOD_S5 0x0010 1 PGOOD_VCORE 0x0020 1 PGOOD_PP1000_S0IX 0x0040 1 SLP_S3#_DEASSERTED 0x0080 1 SLP_S4#_DEASSERTED > > gpioget 1* POWER_BUTTON_L 1* LID_OPEN 1* AC_PRESENT 1* PCH_SLP_S3_L 1* PCH_SLP_S4_L 1* PP1050_PGOOD 1* PP3300_PCH_PGOOD 1* PP5000_PGOOD 1* S5_PGOOD 1* VCORE_PGOOD 0 WP_L 1* JTAG_TCK 1* UART0_RX 0 BOARD_VERSION1 1* BOARD_VERSION2 1* BOARD_VERSION3 1* PP1000_S0IX_PGOOD 1* USB1_OC_L 1* USB2_OC_L 0 BAT_PRESENT_L 0 CPU_PROCHOT 1* ENABLE_BACKLIGHT 1* ENABLE_TOUCHPAD 0 ENTERING_RW 1* LPC_CLKRUN_L 1* PCH_CORE_PWROK 1* PCH_PWRBTN_L 1* PCH_RCIN_L 1* PCH_RSMRST_L 1* PCH_SMI_L 0 PCH_SOC_OVERRIDE 1* PCH_SYS_PWROK 1* PCH_WAKE_L 1* PP1350_EN 1* PP3300_DX_EN 1* PP3300_LTE_EN 1* PP3300_WLAN_EN 1* PP5000_EN 0 PPSX_EN 1* SUSP_VR_EN 1* TOUCHSCREEN_RESET_L 1* USB_CTL1 0 USB_ILIM_SEL 1* USB1_ENABLE 1* USB2_ENABLE 1* VCORE_EN 1* WLAN_OFF_L 1* PCH_SCI_L 1* KBD_IRQ_L 1* BAT_LED0 0 BAT_LED1 0 PWR_LED_L > D. As earlier, the Alt+VolUp+r does not reboot the system, although I can see that the EC did receive the event and trying to warm reset the system: [3274.775776 KB wait] [3285.393695 KB poll] [3286.005607 KB wait] [3291.708919 KB poll] [3295.067457 KB warm reboot] [3295.067731 chipset_reset(0)] [3295.852992 KB wait] (But the system actually does not reboot).
,
Mar 6 2017
I wonder if increasing the length of the warm reboot pulse would help with the alt-volup-r. +Aaron & Duncan since the hang may be in ACPI transition.
,
Mar 6 2017
Does doing the PCH_RCIN_L assertion manually make it recover? If not, then extending things won't help anything. What dose 'port80' spit out of EC console when you see this?
,
Mar 6 2017
I'll try that out when I reproduce it next. There are 2 things here I think: 1) How the system got into this state. I think this is the more important issue that we need to address. It looks something that the kernel 4.4 induces, because the issue is not seen with 3.10 kernel. 2) Why Alt+VOlUp+r does not recover. I think this is more secondary. I'm trying to reproduce it again, and will try out your suggestions.
,
Mar 7 2017
,
Mar 8 2017
OK, nothing concrete coming out yet. Its getting more and more difficult to produce the problem that I'm trying to chase: I was not able to reproduce the original issue for 430 iterations even on 4.4 in my last 2 runs (which may indicate that my earlier conclusion that the problem does not exist on 3.10 may not be true as that was based on 500+ iterations). But in the last 2 runs, I did run into another (but issue with similar symptoms - may be some eMMC issue since I may be exhausting eMMC's life / no of write cycles?). After about 430 iterations, cty says: [ 12.364470] mmc1: Controller never released inhibit bit(s). [ 12.364476] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 12.364485] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 12.364491] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 12.364497] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 12.364503] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 12.364509] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 12.364514] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 12.364520] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 12.364526] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 12.364532] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 12.364538] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 12.364544] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 12.364548] sdhci: Host ctl2: 0x0000ffff [ 12.364554] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 12.364557] sdhci: =========================================== [ 12.464702] mmc1: Reset 0x2 never completed. [ 12.464707] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 12.464713] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 12.464718] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 12.464725] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 12.464730] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 12.464736] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 12.464741] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 12.464747] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 12.464753] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 12.464759] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 12.464765] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 12.464771] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 12.464774] sdhci: Host ctl2: 0x0000ffff [ 12.464781] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 12.464783] sdhci: =========================================== [ 12.564885] mmc1: Reset 0x4 never completed. [ 12.564889] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 12.564895] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 12.564900] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 12.564906] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 12.564912] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 12.564917] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 12.564923] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 12.564929] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 12.564934] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 12.564940] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 12.564946] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 12.564952] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 12.564956] sdhci: Host ctl2: 0x0000ffff [ 12.564962] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 12.564965] sdhci: =========================================== [ 12.575184] mmc1: Controller never released inhibit bit(s). [ 12.575190] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 12.575199] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 12.575205] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 12.575211] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 12.575217] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 12.575223] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 12.575228] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 12.575234] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 12.575240] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 12.575246] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 12.575252] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 12.575258] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 12.575262] sdhci: Host ctl2: 0x0000ffff [ 12.575268] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 12.575271] sdhci: =========================================== [ 12.675435] mmc1: Reset 0x2 never completed. [ 12.675440] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 12.675446] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 12.675452] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 12.675458] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 12.675464] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 12.675469] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 12.675475] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 12.675481] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 12.675487] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 12.675493] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 12.675499] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 12.675505] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 12.675508] sdhci: Host ctl2: 0x0000ffff [ 12.675514] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 12.675517] sdhci: =========================================== [ 12.775618] mmc1: Reset 0x4 never completed. [ 12.775622] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 12.775628] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 12.775633] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 12.775639] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 12.775645] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 12.775650] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 12.775656] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 12.775662] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 12.775668] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 12.775673] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 12.775679] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 12.775685] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 12.775689] sdhci: Host ctl2: 0x0000ffff [ 12.775695] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 12.775698] sdhci: =========================================== [ 12.785887] mmc1: Controller never released inhibit bit(s). [ 12.785893] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 12.785902] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 12.785907] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 12.785913] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 12.785919] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 12.785925] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 12.785931] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 12.785936] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 12.785942] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 12.785948] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 12.785954] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 12.785960] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 12.785964] sdhci: Host ctl2: 0x0000ffff [ 12.785971] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 12.785973] sdhci: =========================================== [ 12.886132] mmc1: Reset 0x2 never completed. [ 12.886136] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 12.886143] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 12.886149] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 12.886154] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 12.886160] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 12.886166] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 12.886172] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 12.886177] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 12.886183] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 12.886189] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 12.886195] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 12.886201] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 12.886204] sdhci: Host ctl2: 0x0000ffff [ 12.886211] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 12.886213] sdhci: =========================================== [ 12.986315] mmc1: Reset 0x4 never completed. [ 12.986318] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 12.986324] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 12.986330] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 12.986336] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 12.986341] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 12.986347] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 12.986352] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 12.986358] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 12.986364] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 12.986370] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 12.986376] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 12.986382] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 12.986385] sdhci: Host ctl2: 0x0000ffff [ 12.986391] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 12.986394] sdhci: =========================================== [ 12.996593] mmc1: Controller never released inhibit bit(s). [ 12.996599] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 12.996608] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 12.996614] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 12.996620] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 12.996626] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 12.996631] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 12.996637] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 12.996643] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 12.996649] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 12.996655] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 12.996661] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 12.996667] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 12.996671] sdhci: Host ctl2: 0x0000ffff [ 12.996677] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 12.996680] sdhci: =========================================== [ 13.096842] mmc1: Reset 0x2 never completed. [ 13.096847] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.096853] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.096859] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.096864] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.096870] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.096876] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.096882] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.096887] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.096893] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.096899] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.096905] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.096911] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.096915] sdhci: Host ctl2: 0x0000ffff [ 13.096921] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.096924] sdhci: =========================================== [ 13.197025] mmc1: Reset 0x4 never completed. [ 13.197028] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.197034] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.197040] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.197046] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.197052] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.197057] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.197063] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.197068] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.197074] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.197080] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.197086] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.197092] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.197096] sdhci: Host ctl2: 0x0000ffff [ 13.197102] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.197104] sdhci: =========================================== [ 13.207301] mmc1: Controller never released inhibit bit(s). [ 13.207307] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.207316] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.207321] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.207327] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.207333] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.207339] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.207345] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.207350] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.207356] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.207362] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.207368] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.207374] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.207378] sdhci: Host ctl2: 0x0000ffff [ 13.207384] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.207387] sdhci: =========================================== [ 13.307523] mmc1: Reset 0x2 never completed. [ 13.307526] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.307530] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.307533] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.307537] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.307540] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.307544] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.307547] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.307550] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.307554] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.307557] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.307561] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.307564] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.307567] sdhci: Host ctl2: 0x0000ffff [ 13.307571] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.307572] sdhci: =========================================== [ 13.407653] mmc1: Reset 0x4 never completed. [ 13.407655] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.407659] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.407663] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.407666] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.407669] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.407673] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.407676] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.407680] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.407683] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.407686] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.407690] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.407693] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.407696] sdhci: Host ctl2: 0x0000ffff [ 13.407699] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.407701] sdhci: =========================================== [ 13.417850] mmc1: Controller never released inhibit bit(s). [ 13.417856] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.417864] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.417870] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.417876] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.417882] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.417888] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.417894] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.417899] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.417906] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.417911] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.417917] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.417924] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.417927] sdhci: Host ctl2: 0x0000ffff [ 13.417934] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.417936] sdhci: =========================================== [ 13.518099] mmc1: Reset 0x2 never completed. [ 13.518104] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.518110] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.518115] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.518121] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.518127] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.518133] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.518138] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.518144] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.518150] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.518156] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.518162] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.518168] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.518171] sdhci: Host ctl2: 0x0000ffff [ 13.518178] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.518180] sdhci: =========================================== [ 13.618282] mmc1: Reset 0x4 never completed. [ 13.618286] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.618291] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.618297] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.618303] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.618309] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.618314] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.618320] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.618325] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.618332] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.618337] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.618343] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.618349] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.618353] sdhci: Host ctl2: 0x0000ffff [ 13.618359] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.618362] sdhci: =========================================== [ 13.628560] mmc1: Controller never released inhibit bit(s). [ 13.628566] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.628575] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.628581] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.628587] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.628593] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.628599] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.628604] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.628610] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.628616] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.628622] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.628628] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.628634] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.628638] sdhci: Host ctl2: 0x0000ffff [ 13.628644] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.628647] sdhci: =========================================== [ 13.728806] mmc1: Reset 0x2 never completed. [ 13.728811] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.728817] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.728823] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.728829] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.728835] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.728840] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.728846] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.728851] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.728858] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.728863] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.728869] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.728875] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.728879] sdhci: Host ctl2: 0x0000ffff [ 13.728885] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.728888] sdhci: =========================================== [ 13.828988] mmc1: Reset 0x4 never completed. [ 13.828992] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.828998] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.829003] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.829009] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.829015] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.829020] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.829026] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.829032] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.829037] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.829043] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.829049] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.829055] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.829059] sdhci: Host ctl2: 0x0000ffff [ 13.829065] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.829068] sdhci: =========================================== [ 13.839265] mmc1: Controller never released inhibit bit(s). [ 13.839271] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.839280] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.839286] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.839292] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.839298] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.839304] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.839310] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.839315] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.839321] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.839327] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.839333] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.839339] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.839343] sdhci: Host ctl2: 0x0000ffff [ 13.839349] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.839352] sdhci: =========================================== [ 13.939530] mmc1: Reset 0x2 never completed. [ 13.939535] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 13.939541] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 13.939547] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 13.939553] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 13.939559] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 13.939564] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 13.939570] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 13.939576] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 13.939582] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 13.939588] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 13.939594] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 13.939600] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 13.939604] sdhci: Host ctl2: 0x0000ffff [ 13.939611] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 13.939613] sdhci: =========================================== [ 14.039714] mmc1: Reset 0x4 never completed. [ 14.039718] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 14.039724] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 14.039729] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 14.039735] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 14.039741] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 14.039747] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 14.039752] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 14.039758] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 14.039764] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 14.039770] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 14.039776] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 14.039782] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 14.039786] sdhci: Host ctl2: 0x0000ffff [ 14.039792] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 14.039795] sdhci: =========================================== [ 14.050001] mmc1: Controller never released inhibit bit(s). [ 14.050007] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 14.050016] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 14.050021] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 14.050028] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 14.050034] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 14.050040] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 14.050045] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 14.050051] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 14.050057] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 14.050063] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 14.050069] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 14.050075] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 14.050079] sdhci: Host ctl2: 0x0000ffff [ 14.050085] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 14.050088] sdhci: =========================================== [ 14.150245] mmc1: Reset 0x2 never completed. [ 14.150250] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 14.150256] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 14.150262] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 14.150268] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 14.150274] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 14.150280] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 14.150285] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 14.150291] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 14.150297] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 14.150303] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 14.150309] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 14.150315] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 14.150318] sdhci: Host ctl2: 0x0000ffff [ 14.150325] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 14.150327] sdhci: =========================================== [ 14.250428] mmc1: Reset 0x4 never completed. [ 14.250432] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 14.250438] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 14.250443] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 14.250449] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 14.250455] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 14.250461] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 14.250466] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 14.250472] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 14.250478] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 14.250484] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 14.250489] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 14.250495] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 14.250499] sdhci: Host ctl2: 0x0000ffff [ 14.250505] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 14.250508] sdhci: =========================================== [ 14.260753] mmc1: Controller never released inhibit bit(s). [ 14.260759] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 14.260767] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 14.260774] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 14.260780] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 14.260786] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 14.260791] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 14.260797] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 14.260803] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 14.260809] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 14.260815] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 14.260821] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 14.260827] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 14.260831] sdhci: Host ctl2: 0x0000ffff [ 14.260837] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 14.260840] sdhci: =========================================== [ 14.361003] mmc1: Reset 0x2 never completed. [ 14.361007] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 14.361013] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 14.361018] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 14.361024] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 14.361030] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 14.361036] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 14.361042] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 14.361047] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 14.361053] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 14.361059] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 14.361065] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 14.361071] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 14.361075] sdhci: Host ctl2: 0x0000ffff [ 14.361081] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 14.361084] sdhci: =========================================== [ 14.461184] mmc1: Reset 0x4 never completed. [ 14.461188] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 14.461194] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 14.461200] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 14.461205] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 14.461211] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 14.461217] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 14.461223] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 14.461228] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 14.461234] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 14.461240] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 14.461246] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 14.461252] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 14.461256] sdhci: Host ctl2: 0x0000ffff [ 14.461262] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 14.461264] sdhci: =========================================== [ 14.561555] mmc1: Reset 0x1 never completed. [ 14.561561] sdhci: =========== REGISTER DUMP (mmc1)=========== [ 14.561568] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 14.561574] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 14.561580] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 14.561586] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 14.561592] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 14.561598] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 14.561604] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 14.561609] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 14.561615] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff [ 14.561621] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 14.561627] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 14.561631] sdhci: Host ctl2: 0x0000ffff [ 14.561637] sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 14.561640] sdhci: =========================================== [ 18.950171] device-mapper: init: attempting early device configuration. [ 18.961145] device-mapper: init: adding target '0 3584000 verity payload=ROOT_DEV hashtree=HASH_DEV hashstart=3584000 alg=sha1 root_hexdigest=a17a3bbbcad3d0463c75285389b23904e72a7aea salt=86a95059d39955c8e95' [ 18.986391] device-mapper: verity: Argument 0: 'payload=ROOT_DEV' [ 18.993233] device-mapper: verity: Argument 1: 'hashtree=HASH_DEV' [ 19.000171] device-mapper: verity: Argument 2: 'hashstart=3584000' [ 19.007103] device-mapper: verity: Argument 3: 'alg=sha1' [ 19.013155] device-mapper: verity: Argument 4: 'root_hexdigest=a17a3bbbcad3d0463c75285389b23904e72a7aea' [ 19.023776] device-mapper: verity: Argument 5: 'salt=86a95059d39955c8e953617171d1fc66385bfd3af9d0be174307db9c63889f26' [ 19.035767] device-mapper: table: 253:0: verity: Data device lookup failed [ 19.037834] usb 1-2.2.1: New USB device found, idVendor=0424, idProduct=ec00 [ 19.037840] usb 1-2.2.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 19.059754] device-mapper: init: starting dm-0 (vroot) failed [ 19.103173] ACPI: Preparing to enter system sleep state S5 [ 19.109618] reboot: Power down [NOte that after I recovered, the syslog still showed the same symptoms - nothing in the syslogs, so if we do not have the cty or the EC logs, these issues (no logs v/s emmc issue) are probably not distinguishable from each other) The EC console: [50.200104 HC 0x8e] [50.200336 HC 0x8a] [50.200629 HC 0x8b] [50.200830 ACPI query = 0] [50.232333 Port 80: 0x1002][50.232650 LPC RESET# asserted] [50.232844 power state 3 = S0, in 0x003f] [50.233062 power state 7 = S0->S3, in 0x003f] [50.245264 power state 2 = S3, in 0x0008] [50.245590 power state 8 = S3->S5, in 0x0008] [50.245830 USB charge p0 m0] [50.245990 USB charge p1 m0] [50.246178 power state 1 = S5, in 0x0008] [50.246403 power state 1 = S5, in 0x0008] [50.264238 Disabling console in deep sleep] [60.246727 power state 9 = S5->G3, in 0x0008] [60.247065 power state 0 = G3, in 0x0008] [60.247527 power state 0 = G3, in 0x00c8] [60.248145 power state 0 = G3, in 0x00c0] When I press Alt+VolUp+r, it still does not work: > > > [497.958188 Disabling console in deep sleep] [516.139344 KB poll] [516.149489 event set 0x00001000] [517.026309 KB wait] [518.067673 KB poll] [520.959907 KB warm reboot] [520.960174 chipset_reset(0)] [521.916372 KB wait] 951.098224 KB poll] [951.401603 KB wait] [952.413060 KB poll] [952.670656 KB wait] [952.890723 KB poll] [953.120808 KB wait] [953.188556 KB poll] [953.403427 KB wait] (and goes off for a long time before I can type any EC commands again). And no, the system does not reboot. While it does not give me back prompt immediately, it is still responsive as I can see it still print "KB wait/poll" messages as I try to type on the keyboard. Now: > port80 Port 80 writes: (RESET) <--new > I toggled PCH_RCIN_L (first to 0 and then to 1) and the system did not come up. I think this is definitely a different issue because it is in G3 as opposite to the previous state when everything looked good. > powerindebug power in: 0x00c0 debug mask: 0x0000 bit meanings: 0x0001 0 PGOOD_PP1050 0x0002 0 PGOOD_PP3300_PCH 0x0004 0 PGOOD_PP5000 0x0008 0 PGOOD_S5 0x0010 0 PGOOD_VCORE 0x0020 0 PGOOD_PP1000_S0IX 0x0040 1 SLP_S3#_DEASSERTED 0x0080 1 SLP_S4#_DEASSERTED > I'll keep trying to produce the original issue. current logs attached.
,
Mar 8 2017
Do you want to setup a small test farm with 4-6 Kips to improve repro chances?
,
Mar 13 2017
The Kip CQ has been running pretty green since last 2 weeks. I have not seen this issue on kip since last ~135 builds (last seen on 27 feb). I think we should remove the experimental status from kip, while continuing to work on this.
,
Mar 14 2017
Ok
,
Mar 17 2017
The kip CQ has been green because when Autotest performs as
designed, it papers over the problem. I just checked the
last 24 hours of history on the kip CQ, and I found 8 DUT
failures matching the basic symptom of "DUT went offline":
2017-03-16 14:09:43 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack12-host19/577792-repair/
2017-03-17 06:34:11 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack12-host17/579779-repair/
2017-03-16 19:48:07 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack12-host15/578437-repair/
2017-03-17 02:02:59 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack12-host11/579320-repair/
2017-03-16 20:13:36 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack12-host11/578495-repair/
2017-03-16 14:09:47 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack12-host11/577793-repair/
2017-03-17 02:00:38 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack12-host1/579318-repair/
2017-03-17 08:40:59 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack12-host5/580206-repair/
So, we can/should make kip not experimental, but we need to keep moving
the ball forward on finding the cause.
,
Mar 22 2017
OK, finally there is some progress made on this today. The observations make me believe that it is not a kernel issue, but may be something in Intel boot code (or coreboot). 1) I simplified the reproduction steps today to simply continue to do a reboot in a loop, and the issue could be reproduced in a few hundred to < 1000 iterations of reboots (from the AP side - which I believe is a warm reboot). 2) The issue could be reproduced, and once in this messy state, "apreset warm" and "Alt+VolUp+r" do not help at all, neither does toggling the PCH_RCIN_L. Note that in a good state when system is not stuck, the "apreset warm" and "Alt_VolUp+r" work as intended (produce me a console-ramoops). From here on based on the symptoms; I think there are 2 issues that we may hit - but 1 is a major issue (happens >95% of the time), so I'll differentiate between the two issues, and use this bug for the first one, and file a separate bug for the other one (b/36493878 - kip powers down instead of restart). 3) In all the times I reproduced this issue - the AP cty shows that the kernel did shutdown cleanly. [ 10.611827] init: console-ttyS0 main process (505) killed by TERM signal [ 10.620422] init: powerd main process (956) killed by TERM signal [ 10.627822] init: chapsd main process (1117) killed by TERM signal [ 10.635628] init: crash-sender main process (1498) killed by TERM signal [ 10.643577] init: cros-machine-id-regen-periodic main process (1509) killed by TERM signal [ 10.653278] init: mtpd main process (1514) killed by TERM signal [ 10.660398] init: log-rotate main process (1525) killed by TERM signal [ 10.668040] init: cras main process (1556) killed by TERM signal [ 10.674014] EXT4-fs (mmcblk0p5): re-mounted. Opts: block_validity,barrier,user_xattr,acl [ 10.685610] init: bluetoothd main process (1589) killed by TERM signal [ 10.698429] init: warn-collector main process (2258) killed by TERM signal [ 10.706670] init: tlsdated main process (2288) killed by TERM signal [ 10.735417] init: cryptohomed main process (1331) killed by TERM signal [ 10.768458] init: mosys-info main process (2273) terminated with status 143 [ 10.778797] init: recover_duts main process (2256) killed by TERM signal [ 10.827273] Unsafe core_pattern used with suid_dumpable=2. Pipe handler or fully qualified core dump path required. [ 11.737298] gsmi: Log Shutdown Reason 0x00 [ 11.812694] reboot: Restarting system [ 11.816817] reboot: machine restart So although there is a limitation that I'm not able to generate the console-ramoops, I sincerely doubt if there'd be anything really useful for me there (more evidence below). Also, the fact that the kernel shut down cleanly before the system fell off the radar is also visible in the cautotest links in above comments. 3) When the issue happens, here is what is seen last on the EC console: [816.229080 HC 0x0b] [816.230691 HC 0x05] [817.098221 DPTF sensor 1, threshold -73 C, index 0, disabled] [817.099141 DPTF sensor 1, threshold -73 C, index 1, disabled] [817.101107 DPTF sensor 2, threshold -73 C, index 0, disabled] [817.101987 DPTF sensor 2, threshold -73 C, index 1, disabled] [817.103930 DPTF sensor 3, threshold -73 C, index 0, disabled] [817.104714 DPTF sensor 3, threshold -73 C, index 1, disabled] [818.257115 KB disable] [818.257381 KS disable] [818.257529 KB IRQ disable] [818.257753 KB IRQ enable] [818.267211 chipset_reset(0)] [818.268014 Port 80: 0x1002][818.268182 LPC RESET# asserted] [818.272697 LPC RESET# deasserted] [818.272862 GPIO_PP3300_LTE_EN 1->0] [818.283503 Port 80: 0x10][818.299648 chipset_reset(0)] [818.303103 GPIO_PP3300_LTE_EN 0->1] [818.324695 Port 80: 0x2a][818.325502 HC 0x87] [818.367517 Port 80: 0x1002][818.367771 LPC RESET# asserted] [818.371799 LPC RESET# deasserted] [818.372000 GPIO_PP3300_LTE_EN 1->0] [818.382608 Port 80: 0x10][818.402239 GPIO_PP3300_LTE_EN 0->1] [818.423799 Port 80: 0x2a][818.424606 HC 0x87] [818.426667 Port 80: 0x1002][818.426836 LPC RESET# asserted] [821.751205 Battery 96% / 0h:24] [1015.245002 Charge state charge -> charge_near_full after 1012.222794 sec] [1015.246769 Battery 97% / 0h:22] > > powerinfo [1117.577165 power state 3 = S0, in 0x00ff] > port80 Port 80 writes: 27 2a 2a 2a 2b 2c 2d 2d 2d 2e 02 2b 2f 11 79 71 72 24 25 55 73 74 75 f0 f1 75 93 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 76 77 79 9c 9e 9d 7a 7b f8 aa (RESET) 01 10 20 23 27 2a 01 21 23 24 25 27 2a 2a 2a 2b 2c 2d 2d 2d 2e 02 2b 2f 11 79 71 72 24 25 55 73 74 75 f0 f1 75 93 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 76 77 79 9c 9e 9d 7a f8 aa (RESET) 01 10 20 23 27 2a 01 21 23 24 25 27 2a 2a 2a 2b 2c 2d 2d 2d 2e 02 2b 2f 11 79 71 72 24 25 55 73 74 75 f0 f1 75 93 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 76 77 79 9c 9e 9d 7a f8 aa (RESET) 01 10 20 23 27 2a 01 21 23 24 25 27 2a 2a 2a 2b 2c 2d 2d 2d 2e 02 2b 2f 11 79 71 72 24 25 55 73 74 75 f0 f1 75 93 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 76 77 79 9c 9e 9d 7a f8 aa (RESET) 01 10 20 23 27 2a (RESET) 01 10 20 23 27 2a (RESET) <--new > 4) I took help from Furquan today, and he mentioned that this means that the BIOS is reaching the memory init / configuration stage but hanging somewhere there. He also mentioned that this issue (of large number of warm resets making the CPU getting stuck in BIOS init sounds similar to one of the existing issues on reef: b/35587236) 5) Also, the "mosys eventlog list" says the following after each such reproduction (This is also seen in the logs from the cautotest so I think this is the same issue that we're reproducing): 74 | 2017-03-21 17:32:55 | Chrome OS Developer Mode 75 | 2017-03-21 17:36:19 | Kernel Event | Clean Shutdown 76 | 2017-03-21 17:36:20 | System boot | 1246 77 | 2017-03-21 17:36:20 | System Reset 78 | 2017-03-21 17:36:21 | Chrome OS Developer Mode 79 | 2017-03-21 17:36:37 | Kernel Event | Clean Shutdown 80 | 2017-03-21 17:36:38 | System boot | 1247 81 | 2017-03-21 17:36:38 | System Reset 82 | 2017-03-21 17:36:39 | Chrome OS Developer Mode 83 | 2017-03-21 17:36:56 | Kernel Event | Clean Shutdown 84 | 2017-03-21 17:36:57 | System boot | 1248 85 | 2017-03-21 17:36:57 | System Reset 86 | 2017-03-21 17:36:58 | Chrome OS Developer Mode 87 | 2017-03-21 17:37:14 | Kernel Event | Clean Shutdown 88 | 2017-03-21 17:38:26 | System boot | 1249 89 | 2017-03-21 17:38:26 | SUS Power Fail <================= Problem? 90 | 2017-03-21 17:38:26 | Wake Source | Power Button | 0 91 | 2017-03-21 17:38:27 | Chrome OS Developer Mode I'll be taking Furquan's help in debugging this tomorrow, as I'd need some help in debugging the coreboot side of things. Note: I updated the EC & BIOS firmware version to the latest (just mentioning here for completeness, not that I think it has an impact): localhost ~ # chromeos-firmwareupdate Starting Google_Kip firmware updater v4 ()... - Updater package: [Google_Kip.5216.227.78 / EC:kip_v1.6.193-5f42355] - Current system: [RO:Google_Kip.5216.227.5 , ACT:Google_Kip.5216.227.25 / EC:kip_v1.6.193-5f42355]
,
Mar 22 2017
Kip is a baytrail machine. 2a is a super early post code. it seems like we're not even getting that far into booting when things go badly. Also, the final hanging '(RESET) <--new' line is even worse. We couldn't even get one post code out. As for the 'SUS Power Fail' line that is likely what you did to recover the system. Did you ec reboot? or apshutdown?
,
Mar 22 2017
Yes, EC reboot is the only option to recover the system - and that is what I did in order to boot and get mosys evenlog list.
,
Mar 24 2017
OK, the issue reproduced again, this time the serial logs from coreboot enabled, (hopefully the logs below give good pointers to where coreboot became unresponsive). It looks like this needs either Intel's or firmware team's attention. [ 926.013926] init: powerd main process (977) killed by TERM signal [ 926.021387] init: chapsd main process (1160) killed by TERM signal [ 926.029129] init: crash-sender main process (1491) killed by TERM signal [ 926.037134] init: cros-machine-id-regen-periodic main process (1506) killed by TERM signal [ 926.046831] init: mtpd main process (1512) killed by TERM signal [ 926.053986] init: log-rotate main process (1520) killed by TERM signal [ 926.062136] init: cras main process (1561) killed by TERM signal [ 926.069407] init: bluetoothd main process (1568) killed by TERM signal [ 926.082654] init: warn-collector main process (2240) killed by TERM signal [ 926.091037] init: tlsdated main process (2272) killed by TERM signal [ 926.124834] init: cryptohomed main process (1322) killed by TERM signal [ 926.125962] EXT4-fs (mmcblk0p5): re-mounted. Opts: block_validity,barrier,user_xattr,acl [ 926.215396] init: recover_duts main process (2237) killed by TERM signal [ 926.272875] Unsafe core_pattern used with suid_dumpable=2. Pipe handler or fully qualified core dump path required. [ 927.518115] gsmi: Log Shutdown Reason 0x00 [ 927.588170] reboot: Restarting system [ 927.592282] reboot: machine restart coreboot-db3c5d9 Thu Mar 23 13:46:41 PDT 2017 starting... Enabling VR PS2 mode: VNN VCC ram_id=0, total_spds: 8 pm1_sts: 2000 pm1_en: 0000 pm1_cnt: 00001c00 gpe0_sts: 00000000 gpe0_en: 00000000 tco_sts: 00000000 prsts: 04410910 gen_pmcon1: 00201238 gen_pmcon2: 00000000 prev_sleep_state = S0 Boot Count incremented to 1801 FMAP: Found "FMAP" version 1.0 at ffe10000. FMAP: base = 0 size = 800000 #areas = 31 FMAP: area RW_MRC_CACHE found FMAP: offset: 3e0000 FMAP: size: 65536 bytes FMAP: No valid base address, using 0xff800000 FMAP: RW_MRC_CACHE at ffbe0000 (offset 3e0000) BayTrail-MD MRC wrapper v5 Training for Memory Down designs. Applying weaker ODT settings. DRAM ODT is 120. MRC v0.97 2 channels of DDR3 @ 1333MHz CBMEM: root @ 7afff000 254 entries. MRC Wrapper returned 0 MRC data at fe00965f 5719 <hangs here> Please let me know what should be the next steps here.
,
Mar 24 2017
,
Mar 24 2017
Yet again, and this time core boot hung even earlier.. [ 14.110019] init: bluetoothd main process (1602) killed by TERM signal [ 14.122883] init: warn-collector main process (2248) killed by TERM signal [ 14.130975] init: tlsdated main process (2276) killed by TERM signal [ 14.141814] EXT4-fs (mmcblk0p5): re-mounted. Opts: block_validity,barrier,user_xattr,acl [ 14.158117] init: cryptohomed main process (1367) killed by TERM signal [ 14.237213] init: recover_duts main process (2246) killed by TERM signal [ 14.297256] Unsafe core_pattern used with suid_dumpable=2. Pipe handler or fully qualified core dump path required. [ 15.226131] gsmi: Log Shutdown Reason 0x00 [ 15.295294] reboot: Restarting system [ 15.299405] reboot: machine restart coreboot-db3c5d9 Thu Mar 23 13:46:41 PDT 2017 starting... Enabling VR PS2 mode: VNN VCC <restarts on its own> coreboot-db3c5d9 Thu Mar 23 13:46:41 PDT 2017 starting... Enabling VR PS2 mode: VNN VCC <hangs here> EC console: [10163.310001 HC 0x08] [10163.310325 HC 0x10] [10163.310899 HC 0x18] [10163.311107 HC err 1] +[10163.311613 HC err 1] +[10163.312136 HC err 1] [10163.312629 HC 0x16] +[10163.313409 HC 0x15] [10163.314105 HC 0x16] [10164.146913 KB disable] [10164.147249 KS disable] [10164.147466 KB IRQ disable] [10164.147759 KB IRQ enable] [10164.156681 chipset_reset(0)] [10164.157463 Port 80: 0x1002][10164.157700 LPC RESET# asserted] [10164.162076 LPC RESET# deasserted] [10164.162311 GPIO_PP3300_LTE_EN 1->0] [10164.172955 Port 80: 0x10][10164.189242 chipset_reset(0)] [10164.192621 GPIO_PP3300_LTE_EN 0->1] [10164.214072 Port 80: 0x2a][10164.223408 HC 0x87] [10164.225480 Port 80: 0x1002][10164.225803 LPC RESET# asserted] [10164.229714 LPC RESET# deasserted] [10164.229949 GPIO_PP3300_LTE_EN 1->0] [10164.240563 Port 80: 0x10][10164.260258 GPIO_PP3300_LTE_EN 0->1] [10164.281688 Port 80: 0x2a][10164.291024 HC 0x87] [10164.292139 Port 80: 0x1002][10164.292431 LPC RESET# asserted] > > port80 Port 80 writes: 75 75 75 75 75 76 77 79 9c 9e 9d 7a 7b f8 aa (RESET) 01 10 20 27 2a 01 21 23 25 27 2a 2a 2b 2c 2d 2d 2e 02 2b 11 79 39 40 70 71 72 24 25 24 25 55 55 73 74 75 f0 f1 75 93 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 76 77 79 9c 9e 9d 7a 7b f8 aa (RESET) 01 10 20 27 2a 01 21 23 25 27 2a 2a 2b 2c 2d 2d 2e 02 2c 11 79 39 40 70 71 72 24 25 24 25 55 55 73 74 75 f0 f1 75 93 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 76 77 79 9c 9e 9d 7a 7b f8 aa (RESET) 01 10 20 27 2a 01 21 23 25 27 2a 2a 2b 2c 2d 2d 2e 02 2b 11 79 39 40 70 71 72 24 25 24 25 55 55 73 74 75 f0 f1 75 93 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 76 77 79 9c 9e 9d 7a 7b f8 aa (RESET) 01 10 20 27 2a (RESET) 01 10 20 27 2a (RESET) <--new >
,
Apr 5 2017
,
Apr 5 2017
I'm not sure I can do anything about this device. Intel will need to debug if we're getting random hangs and not able to even get out of the reset vector to post a single post code. I'm not sure why LPC RESET# is not deasserted in those last logs. That's entirely a function of the chipset.
,
Apr 11 2017
Intel seems to have reproduced this problem (in the context of the other thermal shutdown bug): https://buganizer.corp.google.com/issues/36679033#comment16 Can we open up this bug to Intel, and assign it to them?
,
Apr 12 2017
Shyam: can you please help route to the right owner within Intel for this? Note that although we saw this issue on a Kip (baytrail device), this issue was reproduced by Brian (Intel) on another baytrail device (Rambi) recently. https://buganizer.corp.google.com/issues/36679033#comment16
,
Apr 12 2017
FYI&A for Intel: the simple steps to reproduce the problem are: Run the attached reboot_test.sh script (from within chroot > /usr/src/scripts).
,
Apr 12 2017
Acking your note Rajat. Kane, CL, Can you take a stab at it?
,
Apr 12 2017
Hi Rajatja, Greetings! Sorry I just got involved into this bug. May I know if this issue is reproduced on every kip system you have? Is it PVT system? I noticed that kernel 3.10->4.4 change, but this is unlikely causing BIOS hang up in this early stage. Since Kip is shipped for a long time, it shouldn't be unstable like this, anything is updated or done specifically to make issue happen? thank you
,
Apr 12 2017
If the LPC RESET# is not deasserted, the BIOS code is not even run. Is it possible to cut off all power AC/battery(this is to do RTC reset) and run again? I just noticed that Brian@intel saw similar issue, I will also sync up with him. Thanks.
,
Apr 12 2017
Hello Kane, Thanks for taking a look, yes, the issue has been produced with multiple units in lab. I believe our labs are mostly MP systems. I can't of any substantial change that could have caused this. I think to debug this, it would be better if you had a servo enabled system. If you don't have it and are local in bay area, I can provide you a Kip with servo header installed (Although since the issue is produced in Brian@Intel's system - may be that works out for you). Thanks, Rajat
,
Apr 13 2017
Hi Rajatja, Greetings! Sorry, I'm in Taiwan. Since BYT is quiet old project, so I am only able to find Candy project to repro issue in my site. So far it's 500 cycles and still alive, I will keep running the test. Meanwhile, we will try to get more information on the system in the U.S. Thank you INFO : kernel: 3.10.18 #1 SMP Wed Apr 5 13:05:32 PDT 2017 INFO : loop 500 INFO : Thu Apr 13 18:36:27 CST 2017 INFO : Rebooting 10.5.253.20...
,
Apr 13 2017
Hello Kane : No, you want to upgrade the candy to the latest image (atleast something built in february) - so that you get the 4.4 kernel. We suspect that the issue may be discovered when we use the new kernel. I'd say you let the reboot script run overnight, by morning you should have passed 3000-5000 cycles. If you pass > 7000 cycles and not run into either b/36493878 or https://bugs.chromium.org/p/chromium/issues/detail?id=696679, then I think it may be safe to conclude that neither issue is seen on candy...
,
Apr 14 2017
Rajat, Sameer, Is there a Kip available in Taiwan you can loan to Kane? Thanks, Shyam
,
Apr 14 2017
Hi Rajatja,
I put the system to run overnight, it stop at 1000th cycles
It looks like the script will only run 1000 cycles. Let modify it and run again.
I will also update the image to kernel v4.4 and run again.
If it's something to do with the kernel updates, then it's very strange.
I couldn't think of SW updates that may cause BIOS hang up like this.
Thank you:)
INFO : kernel: 3.10.18 #1 SMP Wed Apr 5 13:05:32 PDT 2017
INFO : loop 1000
INFO : Thu Apr 13 20:54:08 CST 2017
INFO : Rebooting 10.5.253.20...
15s: can not connect to device
16s: reboot complete
INFO : kernel: 3.10.18 #1 SMP Wed Apr 5 13:05:32 PDT 2017
,
Apr 14 2017
Hi,
Latest update in TW.
With kernel 4.4, it's still running.(over 1060 cycles)
I will put the dut to run over weekend.
Thanks.
20s: reboot complete
INFO : kernel: 4.4.44-07220-gb7d53f9 #1 SMP PREEMPT Sat Feb 25 04:34:42 PST 2017
INFO : loop 1060
INFO : Fri Apr 14 16:51:48 CST 2017
INFO : Rebooting 10.5.253.20...
,
Apr 14 2017
Kane, do you need additional Kip units or are you good here?
,
Apr 14 2017
Yung, Intel does need Kip units. They want one here in the US and one in TPE. Can you please help allocate?
,
Apr 17 2017
Update the test in TW over the weekend.
it stopped at 1626 cycles, the dut is still alive
This is due to the IP address on DUT is somehow changed to 10.5.253.3 from 10.5.253.20.
I will use another warm reset script(without remote_sh) to repro issue.
thanks.
15s: can not connect to device
20s: reboot complete
INFO : kernel: 4.4.44-07220-gb7d53f9 #1 SMP PREEMPT Sat Feb 25 04:34:42 PST 2017
INFO : loop 1626
INFO : Fri Apr 14 20:02:14 CST 2017
INFO : Rebooting 10.5.253.20...
,
Apr 18 2017
Hi all,
I reviewed the log in #47.
It seems the issue follows "chipset_reset(0)".
in normal reboot, I didn't see chipset_reset(0).
So, I did some experiments by using iotools
iotools io_write32 0xcf9 0x6 ==> no chipset_reset(0)
iotools io_write8 0x64 0xfe ==> chipset_reset(0) exists.
so, it looks like the warm reset behavior is kind of different when issue happens.
And I looked into the kernel, *reboot: Restarting system* this message is printed as emergency in kernel_restart function.
Is this expected in BYT with kernel v4.4? I felt something is odd here.
if you have any thought about this, plz kindly let me know.
thank you very much.
void kernel_restart(char *cmd)
{
kernel_restart_prepare(cmd);
migrate_to_reboot_cpu();
syscore_shutdown();
if (!cmd)
pr_emerg("Restarting system\n");
else
pr_emerg("Restarting system with command '%s'\n", cmd);
kmsg_dump(KMSG_DUMP_RESTART);
machine_restart(cmd);
}
,
Apr 18 2017
> And I looked into the kernel, *reboot: Restarting system* this message is > printed as emergency in kernel_restart function. > Is this expected in BYT with kernel v4.4? I felt something is odd here. That is expected. pr_emerg is used because it is a message important enough to not be lost because of insufficient loglevel, and also should be flashed on the console as well as all ttys in use. Also, this behavior is for all platforms, not just BYT.
,
Apr 18 2017
I'm not sure why we're seeing different behavior with the resets. In both 3.18 and 4.4 acpi path should be taken first. If that fails to reboot then the kernel will try the keyboard controller method. We've seen similar behavior on apollolake boards where the cf9 isn't responsive and the keyboard controller is tried. In this case, can we confirm reboot type is ACPI and the ACPI tables are indicating the correct values?
,
Apr 19 2017
Hi Aaron, I just checked FACP table. Looks like it's correct. Once we get the failing system, I will add more log in kernel to confirm if the issue is related to cf9 unresponsive. Thanks 95 [078h 0120 8] Address : 0000000000000CF9 96 97 [080h 0128 1] Value to cause reset : 06
,
Apr 19 2017
> I'm not sure why we're seeing different behavior with the resets. > In both 3.18 and 4.4 acpi path should be taken first. It probably doesn't make a difference, but to be factually correct, we moved from 3.10 (as opposite to 3.18) to 4.4.
,
Apr 19 2017
Hi Aaron I just double checked the log in #47 again. When issue happens, OS used kb controller method to reset the system(0xcf9 unresponsive?) Then BIOS issued another reset by kb controller method right away. From the bios log in #47, the system was stuck btwn "Enabling VR PS2 mode: VNN VCC" and "ram_id=2, total_spds: 12" I read through the code btwn the above 2 places, but I couldn't find any code would do warm reset by kb controller. Not sure if I miss anything. Alos the interval of 2 chipset(0) is just around 30ms. It's kind of similar to APL, but the actual cpu reset in APL is just once. However, in this BYT case, the CPU is reset twice. Or is there any other case that EC issue warm reset itself? Thank you:) [10164.156681 chipset_reset(0)] ==> reset by kb controller method in OS. [10164.157463 Port 80: 0x1002][10164.157700 LPC RESET# asserted] [10164.162076 LPC RESET# deasserted] [10164.162311 GPIO_PP3300_LTE_EN 1->0] [10164.172955 Port 80: 0x10][10164.189242 chipset_reset(0)] ==> reset by kb controller method in BIOS? [10164.192621 GPIO_PP3300_LTE_EN 0->1] [10164.214072 Port 80: 0x2a][10164.223408 HC 0x87] [10164.225480 Port 80: 0x1002][10164.225803 LPC RESET# asserted] [10164.229714 LPC RESET# deasserted] [10164.229949 GPIO_PP3300_LTE_EN 1->0] [10164.240563 Port 80: 0x10][10164.260258 GPIO_PP3300_LTE_EN 0->1]
,
Apr 19 2017
I don't think we'd ever do a keyboard reset like that. On baytrail we don't have an RCIN signal -- it's a true reset button signal. Even if the host wanted to reset we wouldn't issue one using the keyboard controller. But now that you have identified this issue I recall very peculiar behavior in baytrail. Essentially you can't reset the system early like it is because it hangs (just like this bug). I don't know why, but something is really not right in baytrail in this area. I haven't dug into the EC code, but if the code in the EC queues up a reset or if some transaction is flushed on the LPC bus then the EC could trigger the reset at the wrong time.
,
Apr 19 2017
Hi Aaron, I just went back to check my log. Looks like I ran into strange situation but I didn't realize. It didn't hang up. it just shutdown and up then boot to OS, but the behavior is wrong. sorry, I was not involved in BYT too much before. Do u recall anything like this with kernel v3.10(during rambi validation)? I'm not sure if this is connected to this bug. thank you:) In coreboot============================================================= Clearing the recovery request. Shutting down all USB controllers. Finalizing Coreboot Starting kernel ... coreboot-519ff11 Wed Apr 19 18:49:10 CST 2017 starting... Enabling VR PS2 mode: VNN VCC (it seems the dut shutdown here) coreboot-519ff11 Wed Apr 19 18:49:10 CST 2017 starting... (and it's up automatically) Enabling VR PS2 mode: VNN VCC ram_id=2, total_spds: 12 pm1_sts: 2000 pm1_en: 0000 pm1_cnt: 00001c00 gpe0_sts: 00060000 gpe0_en: 00000000 tco_sts: 00000000 prsts: 04410900 gen_pmcon1: 00001038 gen_pmcon2: 00000000 prev_sleep_state = S0 Boot Count incremented to 5908 FMAP: Found "FMAP" version 1.0 at ffe10000. FMAP: base = 0 size = 800000 #areas = 31 FMAP: area RW_MRC_CACHE found FMAP: offset: 3e0000 FMAP: size: 65536 bytes FMAP: No valid base address, using 0xff800000 FMAP: RW_MRC_CACHE at ffbe0000 (offset 3e0000) BayTrail-MD MRC wrapper v5 Training for Memory Down designs. Applying weaker ODT settings. DRAM ODT is 120. MRC v0.97 .... boot to kernel in ec console============================================================= [3456.536809 KB IRQ disable] [3456.537040 KB IRQ enable] [3456.553860 chipset_reset(0)] [3456.554541 Port 80: 0x1002][3456.554717 LPC RESET# asserted] [3456.559432 LPC RESET# deasserted] [3456.570131 Port 80: 0x01][3456.570377 Port 80: 0x10][3456.607765 Port 80: 0x20][3456.608014 Port 80: 0x23][3456.608177 Port 80: 0x27][3456.608340 Port 80: 0x2a][3456.617795 HC 0x87] [3460.889282 Port 80: 0x1002][3460.889544 LPC RESET# asserted] [3460.889723 power state 3 = S0, in 0x003f] [3460.889957 power state 7 = S0->S3, in 0x003f] [3460.898122 power state 2 = S3, in 0x000c] [3460.898421 power state 8 = S3->S5, in 0x000c] [3460.898677 USB charge p0 m0] [3460.898852 USB charge p1 m0] [3460.899102 power state 1 = S5, in 0x0008] [3460.899406 power state 1 = S5, in 0x0008] [3465.531449 power state 1 = S5, in 0x00c8] [3465.531790 power state 5 = S5->S3, in 0x00c8] [3465.532041 power state 2 = S3, in 0x00c8] [3465.532278 power state 6 = S3->S0, in 0x00c8] [3465.537805 Port 80: 0x1001][3465.537997 USB charge p0 m2] [3465.538179 USB charge p1 m2] [3465.654988 LPC RESET# deasserted] [3465.655967 power PLTRST# deasserted] [3465.656207 power state 3 = S0, in 0x00ff] [3465.656444 power state 3 = S0, in 0x00ff]
,
Apr 19 2017
I don't recall anything specifically, but that looks like a cold reset to me. The 4-5s delay gives it away.
,
Apr 20 2017
Hi all, I found that the reset behavior is very different btwn kernel v3.10 and v4.4. I'm not very sure if this is cause of this issue. Plz see the attached EC log from kernel v4.4 and v3.10. With kernel 4.4, I can see chipset_reset(0) and unexpected cold reset happen frequently. With kernel 3.10, I *can't* find any chipset_reset(0) or unexpected cold reset during the warm reset test. If the cf9 is unresponsive, it should happen in v3.10 as well? However, this weird behavior doesn't seem to happen in kernel v3.10. Dose Google see this hang issue before updating kernel v4.4? Thank you:)
,
Apr 20 2017
No, this issue came into light only after the 4.4 upgrade. So yes, your analysis may be correct that this induced by 4.4
,
Apr 24 2017
Update. With kernel v3.10 on the Kip in Folsom, the issue is not seen over 15 hours. thanks.
,
Apr 26 2017
Aaron, Can you please make Kane the owner for this bug? Thanks, Shyam
,
Apr 26 2017
Update the result.
With enabling kernel serial log, I finally was able to repro issue on DELL Candy yesterday. I saw the issue around 2 hours.
Then, I made the below patch, the issue and strange reset behavior is not seen over 18 hours.
Could u plz try this in your site?
Thank you
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f660d63..165c955 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -540,6 +540,7 @@ static void native_machine_emergency_restart(void)
switch (reboot_type) {
case BOOT_ACPI:
acpi_reboot();
+ udelay(1000);
reboot_type = BOOT_KBD;
break;
,
Apr 26 2017
I can't seem to make Kane the owner as it's flagging an error.
,
Apr 27 2017
---------- Forwarded message ---------- From: Chen, Kane <kane.chen@intel.com> Date: Wed, Apr 26, 2017 at 5:32 PM Subject: RE: Intel US and TW needs a Kip (total 2) To: Sameer Nanda <snanda@google.com>, "Sundararaman, Shyam" <shyam.sundararaman@intel.com> Cc: Yung Leem <yungleem@google.com>, Rajat Jain <rajatja@google.com>, Spencer West <spencerwest@google.com>, "Bhat, Kiran U" <kiran.u.bhat@intel.com>, Aaron Durbin <adurbin@google.com> Hi Sameer, The reason I added this is delay is due to I find linux kernel will do the following sequence. 1. Set 0xcf9 = 0x6 to trigger warm reset 2. If the system is not reset immediately by 0xcf9 = 0x6, then kernel issues multiple kbc reset (0x64=0xfe) See native_machine_emergency_restart function While kernel is using kbc reset, kernel would issue multiple commands to ec. I observed multiple reset commands are queue up in the buffer in EC. In this case, we will see consecutive warm reset by EC. Sometimes the interval of 2 warm reset is just 30 ms In EC console, you can find chipset_reset(0) happens in a very short time See https://bugs.chromium.org/p/chromium/issues/detail?id=696679#c47 https://bugs.chromium.org/p/chromium/issues/detail?id=696679#c70 Plz note that the above behavior only happened on v4.4. I never observed this behavior in kernel v3.10 Thank you From: Sameer Nanda [mailto:snanda@google.com] Sent: Thursday, April 27, 2017 7:53 AM To: Chen, Kane <kane.chen@intel.com>; Sundararaman, Shyam <shyam.sundararaman@intel.com> Cc: Yung Leem <yungleem@google.com>; Rajat Jain <rajatja@google.com>; Spencer West <spencerwest@google.com>; Bhat, Kiran U <kiran.u.bhat@intel.com>; Aaron Durbin <adurbin@google.com> Subject: Re: Intel US and TW needs a Kip (total 2) Hi Kane, We can try it but such udelay changes tend to be problematic since timings can change over time/releases etc. Do you have a theory for why adding this delay helps? On Tue, Apr 25, 2017 at 10:32 PM Chen, Kane <kane.chen@intel.com> wrote: Update the result. With enabling kernel log, I finally was able to repro issue on DELL Candy yesterday. I saw the issue around 2 hours. Then, I made the below patch, then issue is not seen over 18 hours. Could u plz try this in your site? Thank you diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index f660d63..165c955 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -540,6 +540,7 @@ static void native_machine_emergency_restart(void) switch (reboot_type) { case BOOT_ACPI: acpi_reboot(); + udelay(1000); reboot_type = BOOT_KBD; break;
,
Apr 27 2017
---------- Forwarded message ---------- From: Aaron Durbin <adurbin@google.com> Date: Wed, Apr 26, 2017 at 8:00 PM Subject: Re: Intel US and TW needs a Kip (total 2) To: "Chen, Kane" <kane.chen@intel.com> Cc: Sameer Nanda <snanda@google.com>, "Sundararaman, Shyam" <shyam.sundararaman@intel.com>, Yung Leem <yungleem@google.com>, Rajat Jain <rajatja@google.com>, Spencer West <spencerwest@google.com>, "Bhat, Kiran U" <kiran.u.bhat@intel.com> On Wed, Apr 26, 2017 at 7:32 PM, Chen, Kane <kane.chen@intel.com> wrote: > Hi Sameer, > > The reason I added this is delay is due to I find linux kernel will do the > following sequence. > > 1. Set 0xcf9 = 0x6 to trigger warm reset > > 2. If the system is not reset immediately by 0xcf9 = 0x6, then kernel > issues multiple kbc reset (0x64=0xfe) > > See native_machine_emergency_restart function > > > > While kernel is using kbc reset, kernel would issue multiple commands to ec. > > I observed multiple reset commands are queue up in the buffer in EC. > > In this case, we will see consecutive warm reset by EC. Sometimes the > interval of 2 warm reset is just 30 ms > > In EC console, you can find chipset_reset(0) happens in a very short time > > > > See > > https://bugs.chromium.org/p/chromium/issues/detail?id=696679#c47 > > https://bugs.chromium.org/p/chromium/issues/detail?id=696679#c70 > > > > Plz note that the above behavior only happened on v4.4. > > I never observed this behavior in kernel v3.10 We've seen similar behavior on apollolake as well in the course of debugging other bugs. It doesn't seem to fail like baytrail though. However the odd thing is that the reboot path is similar from 3.10->4.4. Adding a printk in the kernel path was showing enough delay on my end that the keyboard reset path wasn't taken when conducting tests. For whatever reason these small core designs appear to terminate the cf9 i/o transaction and doing an async reset after that which takes a variable amount of time.
,
Apr 27 2017
---------- Forwarded message ---------- From: Chen, Kane <kane.chen@intel.com> Date: Wed, Apr 26, 2017 at 8:52 PM Subject: RE: Intel US and TW needs a Kip (total 2) To: Aaron Durbin <adurbin@google.com> Cc: Sameer Nanda <snanda@google.com>, "Sundararaman, Shyam" <shyam.sundararaman@intel.com>, Yung Leem <yungleem@google.com>, Rajat Jain <rajatja@google.com>, Spencer West <spencerwest@google.com>, "Bhat, Kiran U" <kiran.u.bhat@intel.com> The printk finding same with my initial experiment. I initially just added some printk, then I didn't observe the odd reset behavior within 2 hrs(usually it happens within 30 mins) That's why I choose to add udelay(1000) in the kernel, so far I don't observe the odd reset behavior and issue for over 37 hours. Thank you.
,
Apr 27 2017
Kane: Ok, let me try out the same thing at my end. > For whatever reason these small core designs appear to terminate the > cf9 i/o transaction and doing an async reset after that which takes a > variable amount of time. How do we know what is the upper bound to this time (i.e. is udelay(1000) enough for all cases)?
,
Apr 27 2017
Also, I'm not sure if I understood what part of this scenario is the problem yet. Again, I'm not trying to incline towards any solution, but just trying to understand:
> The reason I added this is delay is due to I find linux kernel will do the
> following sequence.
>
> 1. Set 0xcf9 = 0x6 to trigger warm reset
>
> 2. If the system is not reset immediately by 0xcf9 = 0x6, then kernel
> issues multiple kbc reset (0x64=0xfe)
And it is OK / expected to do it, right? I mean, we do expect that the acpi_reboot() will fail sometimes and hence we want to try kb reset?
>
> See native_machine_emergency_restart function
>
>
>
> While kernel is using kbc reset, kernel would issue multiple commands to ec.
From the kernel code:
case BOOT_KBD:
mach_reboot_fixups(); /* for board specific fixups */
for (i = 0; i < 10; i++) {
kb_wait();
udelay(50);
outb(0xfe, 0x64); /* pulse reset low */
udelay(50);
}
So the kernel will send 10 of these kbc reset commands to the EC, each spaced 100 us apart. Are we saying we should give EC more time for processing each kb reset?
>
> I observed multiple reset commands are queue up in the buffer in EC.
>
> In this case, we will see consecutive warm reset by EC. Sometimes the
> interval of 2 warm reset is just 30 ms
So are we saying that they get buffered in the EC, and the EC actually issues warm resets consecutively, spaced closer than it should have? TSo the actual issue is cause because the Intel expects that consequetive warm resets should be spaced > (some threshold) time) apart?
,
Apr 27 2017
The issue that the BOOT_ACPI is what we actually want to work. It will write 6 to 0xcf9. But then the io transaction is retired and the cpu moves on and starts whacking the keyboard controller since the system hasn't reset yet. Apparently the reset is async and we haven't given enough time for the reset to occur. The system does eventually reset, but the EC has queued up reset requests and starts acting on them *after* the system has reset. That's the point of the patch in #81.
,
Apr 27 2017
> The system does eventually reset, but the EC has queued up reset requests > and starts acting on them *after* the system has reset. Yes, understood: so after the acpi reset, it will (unexpectedly) reset it 10 more times. But I'd think after the last such reset has been issued, the system should still boot up eventually, no? Why does coreboot not boot up after the last (buffered) reset has been issued? > That's the point of > the patch in #81. Kane: I'm trying to validate this on my board. Meanwhile, is this something that can be upstreamed?
,
Apr 27 2017
I don't think it's 10. It's probably a few. As I noted in #71 baytrail has very odd behavior in that it doesn't reset properly when a reset comes into too early after the previous reset.
,
Apr 27 2017
> I don't think it's 10. It's probably a few. As I noted in #71 baytrail has > very odd behavior in that it doesn't reset properly when a reset comes into > too early after the previous reset. Ah got it, thanks. But then would it mean that the right fix should actually be in EC (not to send resets spaced closer than x microseconds)? I understand that it does not apply to this issue / baytrails because we don't want to release another EC image. But may be worth a considerations for newer designs (like apollolake you mentioned above).
,
Apr 27 2017
Well, which reset request was the one that stuck? The cf9 or the one from the EC? It's a trick question as you don't now, but what we do know is that the cf9 request happened first. Reaction from SoC takes time. That's why the patch in #81 makes a lot of sense and would be the thing I would suggest.
,
Apr 28 2017
Thanks Aaron for the clarification. Similar thing *multiple warm reset cmds queued in EC* also happen on APL. I would suggest putting some delay after the 0xcf9 = 0x6 in kernel to give soc time to respond. Thank you
,
Apr 28 2017
I have run test of my 2 systems overnight and have so far not run into problems. However, I do want to say that we should leave headroom with delay because we're still not sure about how much is good enough (and a few ms delay on reboot is OK). I'd suggest may be 4-5 ms. Kane: is this something that you can upstream?
,
Apr 28 2017
As for the udelay value I'd say 10-20ms is even sufficient.
,
May 1 2017
Thanks for the test, Rajat. I will need to check with our linux kernel team about this upstream. 5/1 is public holiday in TW, so please give us some time to work on this. Thank you.
,
May 1 2017
Update on my test. I did hit the issue with this fix, but it was after 50+ hours of testing. So I think it might be that we want to increase the timeout a little. But overall, I think I'm satisfied with this solution. So if you can please send it upstream and raise a CL (with FROMLIST or UPSTREAM notification as appropriate), that should serve the purpose.
,
May 2 2017
Hi Rajat, Today, I tried to use kernel 4.11 to repro the abnormal reset behavior. I didn't see the abnormal reset behavior on kernel 4.11.0-rc8-00107-g0060e79 over 3 hours(originally I saw the abnormal behavior within 30 mins) I think if the issue can't be repro on latest kernel, the upstream might be rejected. I'm trying to run the warm reset test on my system over night with latest kernel. Could u also try to repro this with latest kernel 4.11.0-rc8? If this issue only happens on kernel v4.4, is it possible to upload a CL with CHROMIUM? Thank you
,
May 2 2017
Yes, if the issue is not reproduced with latest kernel, please send in CL with CHROMIUM.
,
May 3 2017
Hi Rajat, I've uploaded a change as CHROMIUM tot, please take a look. https://chromium-review.googlesource.com/c/494446/ I only changed the udelay(1000) to mdelay(15), it should still fix the issue. Thank you:)
,
May 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/0185e2615ffffecf77020b8acc1864b3739a4b24 commit 0185e2615ffffecf77020b8acc1864b3739a4b24 Author: Kane Chen <kane.chen@intel.com> Date: Wed May 03 20:15:47 2017 CHROMIUM: x86: add delay after acpi_reboot On baytrail, we saw an issue that linux kernel v4.4 sends multiple reset commands to EC, when cpu is not reset by io 0xcf9=0x6 immediately. Those reset commands are queued in EC and cause EC continuously reset cpu in a very short time. In this case, cpu will be reset for multiple times. This delay is to give cpu time to respond the warm reset by acpi_reboot. This issue can't be reproduced kernel v3.10 and 4.11.0-rc8-00107-g0060e79. I can only see issue on kernel kernel v4.4 BUG= chromium:696679 TEST=run warm reset test on BYT chrome books, and issue is not seen over 48 hours. Change-Id: I29fa0254747467aba80150d802569c1349392fd4 Signed-off-by: Kane Chen <kane.chen@intel.com> Reviewed-on: https://chromium-review.googlesource.com/494446 Commit-Ready: Rajat Jain <rajatja@chromium.org> Tested-by: Rajat Jain <rajatja@chromium.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org> Reviewed-by: Rajat Jain <rajatja@chromium.org> [modify] https://crrev.com/0185e2615ffffecf77020b8acc1864b3739a4b24/arch/x86/kernel/reboot.c
,
May 3 2017
What release branches should we be targeting the fix at? I presume that although the issue surfaced on Kip, it impacts all BYT devices.
,
May 3 2017
We should backport it to R59 and R58. Potentially R57, but I think the 3 baytrails on R57 (clapper, quawks, squawks) will anyway be upgraded by R58, so we may be OK not putting in R57.
,
May 3 2017
,
May 4 2017
,
May 4 2017
Kane, can you send a CL for R59?
,
May 4 2017
,
May 4 2017
Your change meets the bar and is auto-approved for M59. Please go ahead and merge the CL to branch 3071 manually. Please contact milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 5 2017
Hi Rajat, Sorry I am kind of confused I saw your comments in #107 and https://chromium-review.googlesource.com/c/494426 Do u you need me to run the warm reset test on R58, then cherry-pick to R59? The test in #79 is done by release-R58-9334.B Thank you.
,
May 5 2017
here is the CL for R59 https://chromium-review.googlesource.com/c/497226 thanks.
,
May 5 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/c6521f7e1e75e2cbfcd8641c716d978585e8e9ff commit c6521f7e1e75e2cbfcd8641c716d978585e8e9ff Author: Kane Chen <kane.chen@intel.com> Date: Fri May 05 16:08:48 2017 CHROMIUM: x86: add delay after acpi_reboot On baytrail, we saw an issue that linux kernel v4.4 sends multiple reset commands to EC, when cpu is not reset by io 0xcf9=0x6 immediately. Those reset commands are queued in EC and cause EC continuously reset cpu in a very short time. In this case, cpu will be reset for multiple times. This delay is to give cpu time to respond the warm reset by acpi_reboot. This issue can't be reproduced kernel v3.10 and 4.11.0-rc8-00107-g0060e79. I can only see issue on kernel kernel v4.4 BUG= chromium:696679 TEST=run warm reset test on BYT chrome books, and issue is not seen over 48 hours. Change-Id: I29fa0254747467aba80150d802569c1349392fd4 Signed-off-by: Kane Chen <kane.chen@intel.com> Reviewed-on: https://chromium-review.googlesource.com/494446 Commit-Ready: Rajat Jain <rajatja@chromium.org> Tested-by: Rajat Jain <rajatja@chromium.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org> Reviewed-by: Rajat Jain <rajatja@chromium.org> (cherry picked from commit 0185e2615ffffecf77020b8acc1864b3739a4b24) Reviewed-on: https://chromium-review.googlesource.com/497226 Commit-Queue: Rajat Jain <rajatja@chromium.org> Trybot-Ready: Rajat Jain <rajatja@chromium.org> [modify] https://crrev.com/c6521f7e1e75e2cbfcd8641c716d978585e8e9ff/arch/x86/kernel/reboot.c
,
May 5 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/c6521f7e1e75e2cbfcd8641c716d978585e8e9ff commit c6521f7e1e75e2cbfcd8641c716d978585e8e9ff Author: Kane Chen <kane.chen@intel.com> Date: Fri May 05 16:08:48 2017 CHROMIUM: x86: add delay after acpi_reboot On baytrail, we saw an issue that linux kernel v4.4 sends multiple reset commands to EC, when cpu is not reset by io 0xcf9=0x6 immediately. Those reset commands are queued in EC and cause EC continuously reset cpu in a very short time. In this case, cpu will be reset for multiple times. This delay is to give cpu time to respond the warm reset by acpi_reboot. This issue can't be reproduced kernel v3.10 and 4.11.0-rc8-00107-g0060e79. I can only see issue on kernel kernel v4.4 BUG= chromium:696679 TEST=run warm reset test on BYT chrome books, and issue is not seen over 48 hours. Change-Id: I29fa0254747467aba80150d802569c1349392fd4 Signed-off-by: Kane Chen <kane.chen@intel.com> Reviewed-on: https://chromium-review.googlesource.com/494446 Commit-Ready: Rajat Jain <rajatja@chromium.org> Tested-by: Rajat Jain <rajatja@chromium.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org> Reviewed-by: Rajat Jain <rajatja@chromium.org> (cherry picked from commit 0185e2615ffffecf77020b8acc1864b3739a4b24) Reviewed-on: https://chromium-review.googlesource.com/497226 Commit-Queue: Rajat Jain <rajatja@chromium.org> Trybot-Ready: Rajat Jain <rajatja@chromium.org> [modify] https://crrev.com/c6521f7e1e75e2cbfcd8641c716d978585e8e9ff/arch/x86/kernel/reboot.c
,
May 5 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/31d7be5289e63aabaf9a43cc3c1e24b57e40cdc3 commit 31d7be5289e63aabaf9a43cc3c1e24b57e40cdc3 Author: Kane Chen <kane.chen@intel.com> Date: Fri May 05 16:11:09 2017 CHROMIUM: x86: add delay after acpi_reboot On baytrail, we saw an issue that linux kernel v4.4 sends multiple reset commands to EC, when cpu is not reset by io 0xcf9=0x6 immediately. Those reset commands are queued in EC and cause EC continuously reset cpu in a very short time. In this case, cpu will be reset for multiple times. This delay is to give cpu time to respond the warm reset by acpi_reboot. This issue can't be reproduced kernel v3.10 and 4.11.0-rc8-00107-g0060e79. I can only see issue on kernel kernel v4.4 BUG= chromium:696679 TEST=run warm reset test on BYT chrome books, and issue is not seen over 48 hours. Change-Id: I29fa0254747467aba80150d802569c1349392fd4 Signed-off-by: Kane Chen <kane.chen@intel.com> Reviewed-on: https://chromium-review.googlesource.com/494426 Commit-Queue: Rajat Jain <rajatja@chromium.org> Trybot-Ready: Rajat Jain <rajatja@chromium.org> Reviewed-by: Rajat Jain <rajatja@chromium.org> [modify] https://crrev.com/31d7be5289e63aabaf9a43cc3c1e24b57e40cdc3/arch/x86/kernel/reboot.c
,
May 5 2017
OK, fix has landed, need to mark kip-paladin as non-experimental again: https://chromium-review.googlesource.com/c/455356/ Aviv, can you please review once again as the revert was not clean (were conflicts).
,
May 8 2017
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 8 2017
,
May 12 2017
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jun 23 2017
This issue hasn't been updated in the last 6 weeks, so removing its merge approval label. Please re-request a merge if needed. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 17 2017
ChromeOS Infra P1 Bugscrub. P1 Bugs in this component should be important enough to get weekly status updates. Is this already fixed? -> Fixed Is this no longer relevant? -> Archived or WontFix Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority. Is this a Feature Request rather than a bug? Type -> Feature Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign. Does this bug have the wrong owner? -> reassign. Bugs that remain in this state next week will be downgraded to P2.
,
Jul 24 2017
ChromeOS Infra P1 Bugscrub. Issue untouched in a week after previous message. Downgrading to P2.
,
Sep 18 2017
Fixed?
,
Jan 22 2018
,
Jan 23 2018
Showing comments 25 - 124
of 124
Older ›
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||