scarlet unit self-rebooting |
|||||||||||
Issue descriptionScarlet unit keep rebooting itself. Logs from cr50 console. [1025.410514 deferred_tpm_rst_isr] [1025.412525 AP on] [1025.413990 tpm_reset_request(0, 0)] [1025.415832 tpm_reset_now(0)] [1025.421218 tpm_init] tpm_manufactured: manufactured [1025.423984 tpm_reset_now: done] [1025.701118 ccd_tpm_reset_callback: TPM Startup processed] [1025.702976 Skipping commit] [1025.834636 Skipping commit] system_process_retry_counter:retry counter 0 [1027.804523 Skipping commit] [1028.425014 Committing NVMEM changes.] system_process_retry_counter:retry counter 0 system_process_retry_counter:retry counter 0 system_process_retry_counter:retry counter 0 system_process_retry_counter:retry counter 0 extension_route_command: handler 37 not found
,
May 24 2018
Also the servo v4 console would probably be useful
,
May 24 2018
power info change during Servo V4 passthrough? [14012.176666 event set 0x0000000000400000] [14017.678620 event set 0x0000000000400000] [14051.690661 event set 0x0000000000400000] [14057.192505 event set 0x0000000000400000] [14062.944583 event set 0x0000000000400000] [14068.696678 event set 0x0000000000400000] [14096.706655 event set 0x0000000000400000] [14102.458741 event set 0x0000000000400000] [14113.712821 event set 0x0000000000400000] [14119.464818 event set 0x0000000000400000] [14136.220800 event set 0x0000000000400000] [14141.972737 event set 0x0000000000400000] [14147.724880 event set 0x0000000000400000] [14158.978817 event set 0x0000000000400000] [14164.480813 event set 0x0000000000400000] [14170.232904 event set 0x0000000000400000] [14198.492930 event set 0x0000000000400000] [14203.994862 event set 0x0000000000400000] [14215.498932 event set 0x0000000000400000] [14221.000891 event set 0x0000000000400000] [14226.753006 event set 0x0000000000400000] [14232.254995 event set 0x0000000000400000] [14255.012974 event set 0x0000000000400000] [14260.515123 event set 0x0000000000400000] [14277.521219 event set 0x0000000000400000] [14283.023252 event set 0x0000000000400000] [14300.029422 event set 0x0000000000400000] [14317.035608 event set 0x0000000000400000] [14328.289748 event set 0x0000000000400000] [14334.041750 event set 0x0000000000400000] [14350.797988 event set 0x0000000000400000] [14356.549945 event set 0x0000000000400000] [14367.804103 event set 0x0000000000400000] [14373.556034 event set 0x0000000000400000] [14379.057957 event set 0x0000000000400000] [14396.064118 event set 0x0000000000400000] [14401.566125 event set 0x0000000000400000] [14413.070158 event set 0x0000000000400000] [14418.572128 event set 0x0000000000400000] [14429.826389 event set 0x0000000000400000] [14435.578178 event set 0x0000000000400000] [14441.330430 event set 0x0000000000400000] [14446.832390 event set 0x0000000000400000] [14452.584425 event set 0x0000000000400000] [14469.340433 event set 0x0000000000400000] [14475.092478 event set 0x0000000000400000]
,
May 24 2018
,
May 25 2018
Unit self-rebooting due to warm reset request. [85161.513235 event set 0x0000000000400000] [85161.738975 AP wants warm rese[85161.763269 event set 0x0000000000400000] [85162.316079 Executing host reboot command 5] [85163.217357 HC 0x400b err 1] [85163.218976 HC 0x67 err 6] [85163.423923 HC 0x28 err 1] [85163.425635 HC 0x28 err 1] [85163.427310 HC 0x28 err 1] [85163.428962 HC 0x28 err 1] [85163.431456 HC 0x28 err 1] [85163.433122 HC 0x28 err 1] [85163.434803 HC 0x28 err 1] [85165.318089 HC 0x2c err 1] [85173.679242 HC 0x18 err 1] [8T--- unknown menu character '\x7f' -- [85328.561674 event set 0x0000000000400000] [85328.811813 event set 0x0000000000400000] [85330.062204 event set 0x0000000000400000] [85330.181648 AP wants warm rese[85330.758550 Executing host reboot command 5] [85330.812567 event set 0x0000000000400000] [85331.633491 HC 0x400b err 1] [85331.635081 HC 0x67 err 6] [85331.833323 HC 0x28 err 1] [85331.835036 HC 0x28 err 1] [85331.836710 HC 0x28 err 1] [85331.838360 HC 0x28 err 1] [85331.840829 HC 0x28 err 1] [85331.842493 HC 0x28 err 1] [85331.844175 HC 0x28 err 1] [85333.838640 HC 0x2c err 1] [85342.349698 HC 0x18 err 1] [8[85497.116899 event set 0x0000000000400000] [85497.366933 event set 0x0000000000400000] [85498.835894 AP wants warm rese[85499.117163 event set 0x000000[85499.414984 Executing host reboot command 5] [85500.316370 HC 0x400b err 1] [85500.317989 HC 0x67 err 6] [85500.524516 HC 0x28 err 1] [85500.526230 HC 0x28 err 1] [85500.527903 HC 0x28 err 1] [85500.529554 HC 0x28 err 1] [85500.532052 HC 0x28 err 1] [85500.533714 HC 0x28 err 1] [85500.535394 HC 0x28 err 1] [85502.455030 HC 0x2c err 1] [85510.921589 HC 0x18 err 1] [8[85666.889871 event set 0x0000000000400000] [85667.139906 event set 0x0000000000400000] [85667.567370 AP wants warm rese[85667.639972 event set 0x0000000000400000] [85668.144321 Executing host reboot command 5] [85669.046271 HC 0x400b err 1] [85669.047891 HC 0x67 err 6] [85669.252581 HC 0x28 err 1] [85669.254297 HC 0x28 err 1] [85669.255973 HC 0x28 err 1] [85669.257627 HC 0x28 err 1] [85669.260103 HC 0x28 err 1] [85669.261767 HC 0x28 err 1] [85669.263448 HC 0x28 err 1] [85671.182120 HC 0x2c err 1] [85679.171148 HC 0x18 err 1] [85679.171970 HC 0x18 err 1]67Ce [85834.944164 event set 0x0000000000400000] [85835.444332 event set 0x0000000000400000] [85836.013615 AP wants warm rese[85836.194650 event set 0x0000000000400000] [85836.590712 Executing host reboot command 5] [85837.494629 HC 0x400b err 1] [85837.496248 HC 0x67 err 6] [85837.702004 HC 0x28 err 1] [85837.703718 HC 0x28 err 1] [85837.705390 HC 0x28 err 1] [85837.707041 HC 0x28 err 1] [85837.709565 HC 0x28 err 1] [85837.711227 HC 0x28 err 1] [85837.712908 HC 0x28 err 1] [85839.748912 HC 0x2c err 1] [85839.945396 Battery 98% / 27h:0 to empty] [85847.772894 HC 0x18 err 1]
,
May 29 2018
I see many of this in /var/log/messages: 2018-05-20T07:13:16.365468+00:00 NOTICE pre-shutdown[3266]: Shutting down for reboot: not-via-powerd It doesn't look like something related to power.
,
May 31 2018
warm_reset during charging. [526456.414364 Battery 24% / 1h:39 to full] [526484.665214 AP wants warm res[526485.243056 Executing host reboot command 5] [526486.140828 HC 0x400b err 1] [526486.347115 HC 0x28 err 1] [526486.348902 HC 0x28 err 1] [526486.350657 HC 0x28 err 1] [526486.352391 HC 0x28 err 1] [526486.356575 HC 0x28 err 1] [526486.358319 HC 0x28 err 1] [526486.360078 HC 0x28 err 1] [526488.451457 HC 0x2c err 1] [526496.344257 HC 0x18 err 1]
,
May 31 2018
Is it only reproducible with servo_v4?
,
May 31 2018
I don't think we want to test scarlet using previous version of servos. So yes, it happens to Servo V4 type C.
,
May 31 2018
Well, dive deep, looks like the devices reboot itself even the charger directly plugged in.
,
May 31 2018
Interesting, can you try another unit? Is this issue reproducible on every unit?
,
Jul 18
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/1ba0dec60bbcd4b4a47883fe9747c074ae733cc7 commit 1ba0dec60bbcd4b4a47883fe9747c074ae733cc7 Author: Aviv Keshet <akeshet@chromium.org> Date: Wed Jul 18 00:30:37 2018 autotest: temporarily disable lab inventory on some unhealthy boards BUG=chromium:861806, chromium:846012 , chromium:854404 TEST=None Change-Id: Ibf0efeb0881056bbcb40a5bc5183a764bf5afd90 Reviewed-on: https://chromium-review.googlesource.com/1135997 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/1ba0dec60bbcd4b4a47883fe9747c074ae733cc7/site_utils/lab_inventory.py
,
Jul 24
,
Jul 24
This isn't the sort of problem that englab-sys-cros@ deals with, and anyway, haoweiw@ is out on leave. nsanders@ can you find someone to figure out why scarlet is having so much trouble with servo repair?
,
Jul 25
As per #6 it appears that userspace (autotest?) is requesting a reboot. Can you check the system logs when this occurs and see what autotest is doing? https://cs/chromeos_public/src/platform2/power_manager/docs/shutdown.md?l=83 > In the above case, `not-via-powerd` indicates that this clean reboot was > initiated by the `reboot` command being run directly From the product side, we'd need DUT EC and system logs to diagnose if there's anything unexpected going on here.
,
Jul 25
,
Jul 25
I thought that this was referring to a DUT that is just rebooting by itself and not because Autotest is running against it?
,
Jul 25
It reboots unexpectedly in the lab, but the logs above suggest someone is unexpectedly calling "reboot" on the commandline. That indicates either: 1) autotest is calling "reboot" unexpectedly due to some bug 2) The logs above don't represent the failure, and are logs from expected reboots. So we'd need logs of an unexpected reboot to diagnose.
,
Jul 25
Okay. I'm going to need at least a hostname so I know which DUT to look at (or are all scarlet DUTs affected?)
,
Jul 25
You can use chromeos2-row1-rack11-host19
,
Jul 27
,
Jul 30
,
Aug 7
Passing Hotlist-Deputy to this week's deputy.
,
Aug 13
Passing on Hotlist-Deputy
,
Aug 17
hmm, too old to be passed. Will mark as wontfix to end the deputy passing. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by mruthven@chromium.org
, May 24 2018