New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 808764 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Extremely high CPU temps on Pixel 2013 with NO fan activity.

Reported by scottt...@gmail.com, Feb 3 2018

Issue description

Chrome Version       : 63.0.3239.140
OS Version: 10032.86.0
URLs (if applicable) :
Other browsers tested:
  Add OK or FAIL after other browsers where you have tested this issue:
     Safari:
    Firefox:
    IE/Edge:

What steps will reproduce the problem?
1. Start the Pixel 2013 on battery power.
2. Run a webGL demo or other CPU stressing activity
3. Watch the temps using a tool like COG
4. The temps will rise in the to 90-100C range with NO fan activity!

What is the expected result?
Fans should spool up to high speed attempting to regulate CPU temp

What happens instead of that?
The fans do not spool up and system temps rise to critical levels.

Please provide any additional information below. Attach a screenshot if
possible.

Running stable build
Version 63.0.3239.140 (Official Build) (64-bit)

This Pixel 2013 is like new in every way and has always run the stable build..  Everything works perfectly and it thermally regulated well until sometime in the last month or two of OS updates.  I haven't been using it as much recently and only this last weekend noticed the heat issue.

The fans work fine on this unit.  The fans will spool up quickly and spin down smoothly often on boot if the system is already warm.  Sometimes connecting the charger seems to wake the fans up and then they thermally regulate as normal with CPU temps.  Other times, there is no fan activity.  When running, they smoothly operate and vary rpm according to temperature.

I'm seeing things like this in the system log snip below.  It seems really odd that temp_metrics would be setting the fan to 0 at the same time the CPU is going critical!  Also, I can have a log full of temp metrics trying to set the rpm to 3000 all the while there is no fan activity.

2018-02-02T22:35:41.061835-05:00 NOTICE temp_metrics[3432]: Setting fan RPM (temps: 1:28:7:27:9:66:): 10 -> 0
2018-02-02T22:35:41.070891-05:00 NOTICE temp_metrics[3443]: Throttling (temps: 1:28:7:27:9:66:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T22:35:41.446680-05:00 CRIT kernel: [   16.177727] CPU0: Package power limit notification (total events = 1)
2018-02-02T22:35:41.446703-05:00 CRIT kernel: [   16.177730] CPU3: Package power limit notification (total events = 1)
2018-02-02T22:35:41.446706-05:00 CRIT kernel: [   16.177732] CPU2: Package power limit notification (total events = 1)
2018-02-02T22:35:41.446721-05:00 CRIT kernel: [   16.177737] CPU1: Package power limit notification (total events = 1)
2018-02-02T22:35:41.457666-05:00 INFO kernel: [   16.188649] CPU1: Package power limit normal
2018-02-02T22:35:41.457681-05:00 INFO kernel: [   16.188651] CPU0: Package power limit normal
2018-02-02T22:35:41.457683-05:00 INFO kernel: [   16.188691] CPU3: Package power limit normal
2018-02-02T22:35:41.457691-05:00 INFO kernel: [   16.188692] CPU2: Package power limit normal
2018-02-02T22:35:47.302744-05:00 INFO kernel: [   21.601447] ca0132 DOWNLOAD OK :-) DSP IS RUNNING.
2018-02-02T22:35:53.897784-05:00 INFO kernel: [   28.194199] tpm_tis tpm_tis: command 0x65 (size 22) returned code 0x0
2018-02-02T22:37:11.949797-05:00 NOTICE temp_metrics[4989]: Setting fan RPM (temps: 1:33:7:34:9:67:): 0 -> 3000






UserAgentString: Mozilla/5.0 (X11; CrOS x86_64 10032.86.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.140 Safari/537.36



 
In further testing this evening it just seems like the system is failing to react properly to CPU temps.  For instance, with the charger plugged in and a full battery, the fans were running at a fairly normal leisurely rate a few minutes ago.  I ran this demo.

https://experiments.withgoogle.com/chrome/the-polygon-shredder 

The temps quickly started climbing into the high 80's and the fans did not change speed.

I also noted these errors in the log while this was taking place.  

2018-02-02T23:09:40.228454-05:00 NOTICE temp_metrics[4260]: Setting fan RPM (temps: 1:41:7:40:9:70:): 4000 -> 5500
2018-02-02T23:09:40.234879-05:00 NOTICE temp_metrics[4267]: Throttling (temps: 1:41:7:40:9:70:): 1801000 800000 1150 0 0x180aa00dd8078 # cap pkg to 15W
2018-02-02T23:09:40.931782-05:00 CRIT kernel: [   26.541732] CPU0: Package power limit notification (total events = 1)
2018-02-02T23:09:40.931830-05:00 CRIT kernel: [   26.541734] CPU2: Package power limit notification (total events = 1)
2018-02-02T23:09:40.931834-05:00 CRIT kernel: [   26.541735] CPU3: Package power limit notification (total events = 1)
2018-02-02T23:09:40.931836-05:00 CRIT kernel: [   26.541737] CPU1: Package power limit notification (total events = 1)
2018-02-02T23:09:40.942752-05:00 INFO kernel: [   26.552777] CPU3: Package power limit normal
2018-02-02T23:09:40.942766-05:00 INFO kernel: [   26.552779] CPU0: Package power limit normal
2018-02-02T23:09:40.942768-05:00 INFO kernel: [   26.552780] CPU1: Package power limit normal
2018-02-02T23:09:40.942769-05:00 INFO kernel: [   26.552781] CPU2: Package power limit normal
2018-02-02T23:09:50.284245-05:00 NOTICE temp_metrics[4515]: Throttling (temps: 1:41:7:39:9:65:): 1801000 800000 1150 0 0x180aa00dd8070 # cap pkg to 14W
2018-02-02T23:10:30.504244-05:00 NOTICE temp_metrics[4934]: Throttling (temps: 1:41:7:39:9:60:): 1801000 800000 1150 0 0x180aa00dd8068 # cap pkg to 13W
2018-02-02T23:11:00.677103-05:00 NOTICE temp_metrics[5081]: Throttling (temps: 1:41:7:39:9:60:): 1800000 800000 900 0 0x180aa00dd8068 # disable turbo
2018-02-02T23:13:21.370552-05:00 NOTICE temp_metrics[5738]: Setting fan RPM (temps: 1:39:7:38:9:53:): 5500 -> 4000
2018-02-02T23:13:21.377830-05:00 NOTICE temp_metrics[5745]: Throttling (temps: 1:39:7:38:9:53:): 1801000 800000 1150 0 0x180aa00dd8068 # cap pkg to 13W
2018-02-02T23:13:51.506254-05:00 NOTICE temp_metrics[5894]: Throttling (temps: 1:39:7:38:9:52:): 1801000 800000 1150 0 0x180aa00dd8070 # cap pkg to 14W
2018-02-02T23:14:41.733255-05:00 NOTICE temp_metrics[6141]: Throttling (temps: 1:39:7:37:9:56:): 1801000 800000 1150 0 0x180aa00dd8078 # cap pkg to 15W
2018-02-02T23:14:51.784040-05:00 NOTICE temp_metrics[6232]: Throttling (temps: 1:39:7:38:9:59:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:15:11.963385-05:00 NOTICE temp_metrics[6344]: Throttling (temps: 1:38:7:37:9:55:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:15:32.069130-05:00 NOTICE temp_metrics[6455]: Throttling (temps: 1:39:7:37:9:53:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:15:52.118169-05:00 NOTICE temp_metrics[6550]: Throttling (temps: 1:38:7:37:9:55:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:16:02.204537-05:00 NOTICE temp_metrics[6631]: Throttling (temps: 1:40:7:37:9:54:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:16:03.089805-05:00 CRIT kernel: [  408.578471] CPU3: Package power limit notification (total events = 7)
2018-02-02T23:16:03.089867-05:00 CRIT kernel: [  408.578474] CPU0: Package power limit notification (total events = 7)
2018-02-02T23:16:03.089872-05:00 CRIT kernel: [  408.578476] CPU1: Package power limit notification (total events = 7)
2018-02-02T23:16:03.089904-05:00 CRIT kernel: [  408.578484] CPU2: Package power limit notification (total events = 7)
2018-02-02T23:16:03.100748-05:00 INFO kernel: [  408.589509] CPU1: Package power limit normal
2018-02-02T23:16:03.100762-05:00 INFO kernel: [  408.589515] CPU0: Package power limit normal
2018-02-02T23:16:03.100764-05:00 INFO kernel: [  408.589519] CPU3: Package power limit normal
2018-02-02T23:16:03.100765-05:00 INFO kernel: [  408.589521] CPU2: Package power limit normal
2018-02-02T23:16:22.249136-05:00 NOTICE temp_metrics[6722]: Throttling (temps: 1:38:7:37:9:54:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:16:32.352089-05:00 NOTICE temp_metrics[6804]: Throttling (temps: 1:39:7:37:9:54:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:17:12.570781-05:00 NOTICE temp_metrics[7017]: Throttling (temps: 1:38:7:37:9:52:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:17:22.621340-05:00 NOTICE temp_metrics[7078]: Throttling (temps: 1:39:7:37:9:51:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:18:12.940630-05:00 NOTICE temp_metrics[7366]: Throttling (temps: 1:38:7:37:9:55:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:18:22.991309-05:00 NOTICE temp_metrics[7426]: Throttling (temps: 1:39:7:37:9:56:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:18:43.113249-05:00 NOTICE temp_metrics[7549]: Throttling (temps: 1:38:7:38:9:57:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:18:53.165193-05:00 NOTICE temp_metrics[7610]: Throttling (temps: 1:39:7:37:9:54:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:19:13.307703-05:00 NOTICE temp_metrics[7722]: Throttling (temps: 1:38:7:37:9:53:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:19:23.343424-05:00 NOTICE temp_metrics[7782]: Throttling (temps: 1:39:7:37:9:53:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:19:33.448437-05:00 NOTICE temp_metrics[7867]: Throttling (temps: 1:38:7:37:9:54:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:19:43.507327-05:00 NOTICE temp_metrics[7964]: Throttling (temps: 1:39:7:37:9:54:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:19:53.553963-05:00 NOTICE temp_metrics[8030]: Throttling (temps: 1:38:7:37:9:73:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:20:13.691747-05:00 NOTICE temp_metrics[8160]: Throttling (temps: 1:39:7:37:9:87:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:20:26.474102-05:00 INFO laptop-mode[8268]: Warning: Configuration file /etc/laptop-mode/conf.d/board-specific/*.conf is not readable, skipping.
2018-02-02T23:20:26.494046-05:00 INFO laptop-mode[8302]: Laptop mode 
2018-02-02T23:20:26.494983-05:00 INFO laptop-mode[8303]: enabled, active
2018-02-02T23:20:26.496620-05:00 INFO laptop-mode[8309]: Warning: Configuration file /etc/laptop-mode/conf.d/board-specific/*.conf is not readable, skipping.
2018-02-02T23:20:26.502094-05:00 ERR laptop-mode[8327]: Couldn't acquire lock. Retrying.... PID is 8298\n
2018-02-02T23:20:26.570882-05:00 ERR laptop-mode[8478]: failed - udev not active?
2018-02-02T23:20:26.575450-05:00 ERR laptop-mode[8485]: failed - udev not active?
2018-02-02T23:20:26.661847-05:00 INFO laptop-mode[8663]: Laptop mode 
2018-02-02T23:20:26.662752-05:00 INFO laptop-mode[8664]: enabled, 
2018-02-02T23:20:26.663625-05:00 INFO laptop-mode[8665]: active [unchanged]
2018-02-02T23:20:29.895749-05:00 INFO laptop-mode[8683]: Warning: Configuration file /etc/laptop-mode/conf.d/board-specific/*.conf is not readable, skipping.
2018-02-02T23:20:29.908079-05:00 INFO laptop-mode[8705]: Laptop mode 
2018-02-02T23:20:29.909010-05:00 INFO laptop-mode[8706]: enabled, not active
2018-02-02T23:20:29.980957-05:00 ERR laptop-mode[8867]: failed - udev not active?
2018-02-02T23:20:29.985337-05:00 ERR laptop-mode[8874]: failed - udev not active?
2018-02-02T23:20:30.021956-05:00 INFO laptop-mode[8963]: Warning: Configuration file /etc/laptop-mode/conf.d/board-specific/*.conf is not readable, skipping.
2018-02-02T23:20:30.027526-05:00 ERR laptop-mode[8983]: Couldn't acquire lock. Retrying.... PID is 8948\n
2018-02-02T23:20:30.067468-05:00 INFO laptop-mode[9064]: Laptop mode 
2018-02-02T23:20:30.068682-05:00 INFO laptop-mode[9065]: enabled, 
2018-02-02T23:20:30.070038-05:00 INFO laptop-mode[9066]: not active [unchanged]
2018-02-02T23:20:35.115689-05:00 INFO laptop-mode[9108]: Warning: Configuration file /etc/laptop-mode/conf.d/board-specific/*.conf is not readable, skipping.
2018-02-02T23:20:35.124753-05:00 INFO laptop-mode[9130]: Laptop mode 
2018-02-02T23:20:35.125423-05:00 INFO laptop-mode[9131]: enabled, 
2018-02-02T23:20:35.125992-05:00 INFO laptop-mode[9132]: not active [unchanged]
2018-02-02T23:20:43.811463-05:00 NOTICE temp_metrics[9166]: Throttling (temps: 1:38:7:38:9:76:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:21:03.784850-05:00 CRIT kernel: [  709.178425] CPU0: Package power limit notification (total events = 957)
2018-02-02T23:21:03.784888-05:00 CRIT kernel: [  709.178428] CPU1: Package power limit notification (total events = 957)
2018-02-02T23:21:03.784893-05:00 CRIT kernel: [  709.178442] CPU3: Package power limit notification (total events = 957)
2018-02-02T23:21:03.784895-05:00 CRIT kernel: [  709.178443] CPU2: Package power limit notification (total events = 957)
2018-02-02T23:21:03.796775-05:00 INFO kernel: [  709.189471] CPU3: Package power limit normal
2018-02-02T23:21:03.796801-05:00 INFO kernel: [  709.189474] CPU0: Package power limit normal
2018-02-02T23:21:03.796803-05:00 INFO kernel: [  709.189476] CPU1: Package power limit normal
2018-02-02T23:21:03.796803-05:00 INFO kernel: [  709.189482] CPU2: Package power limit normal
2018-02-02T23:21:03.911142-05:00 NOTICE temp_metrics[9285]: Throttling (temps: 1:39:7:38:9:75:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:21:23.966643-05:00 NOTICE temp_metrics[9377]: Throttling (temps: 1:38:7:38:9:75:): 1801000 800000 1150 0 0x180aa00dd8088 # no throttling
2018-02-02T23:21:34.053448-05:00 NOTICE temp_metrics[9460]: Throttling (temps: 1:39:7:39:9:76:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-02-02T23:24:34.792154-05:00 NOTICE temp_metrics[10224]: Setting fan RPM (temps: 1:41:7:39:9:66:): 4000 -> 5500
2018-02-02T23:24:34.803971-05:00 NOTICE temp_metrics[10231]: Throttling (temps: 1:41:7:39:9:66:): 1801000 800000 1150 0 0x180aa00dd8078 # cap pkg to 15W
2018-02-02T23:24:50.195715-05:00 INFO periodic_scheduler[10310]: trim: running chromeos-trim
2018-02-02T23:24:50.232301-05:00 INFO periodic_scheduler[10362]: trim: job completed
To rule out anything else I did a full USB recovery of the Pixel this morning.  Again the same issue.  

- After the recovery and the system had time to download and update any extensions I listened and noted no fan activity.  

- I ran a WebGL demo again and according to COG the heat was rising into the 90's.  

- The system notified me that it needed to restart for a Flash update.  I clicked the restart and again listened for fans.  There was no fan activity.

- At that point I shut down the Pixel, gave about 5 seconds and powered it back up.  The fans spun into life and continued regulating from there.  They are running normally in the background as I type this.  I am running the Pixel Shredder demo (link in OP) and the fans are audibly varying speed. The system climbed into the mid 80's (C) for a while (which is still rather hot) at which point they spun up and slowly cooled the CPU back down into the mid 60's.  The fans spun back down at that point but continuing to run quietly.

My Pixel (in Florida) is usually running the fans a low RPM in normal conditions (ambient 75 F).  The Pixel often starts up from a full power down with the quick spool up/down of the fans.  When the fans decrease to inaudible after that it appears they often do not come back up even when they should.  If however the system is warmer and the fans remain running then thermal regulation appears to work and the fans vary speed according to CPU temps.

My fans are clearly not faulty.  It appears something may be killing the thermal monitoring process or that process has a bug in it and simply thinks it is setting the fan speed.  If that is happening it's serious business not just because it will damage Pixel Chromebooks but because a CPU at 100 C should not be anywhere near a Lithium Ion battery.

Again, up until just recently this Pixel always ran the fans normally.

Please look into this.  I will provide any logs or run any tests you wish.

Thanks,

Scott
I would assume that, as with most modern architectures, there is a feedback system allowing ChromeOS to monitor the actual fan RPM vs the requested speed.  Without this most modern computer architectures would be at risk of thermal damage.  If so, this can't be working properly or the system should throw errors indicating a cooling system failure.


Components: OS>Hardware
Labels: mp-triage

Comment 5 by ihf@chromium.org, Feb 5 2018

Components: -OS>Hardware OS>Kernel>Power

Comment 6 by derat@chromium.org, Feb 5 2018

Cc: coconutruben@chromium.org tbroch@chromium.org snanda@chromium.org
Owner: coconutruben@chromium.org
Ruben, can you please take a look?
Labels: -Pri-3 Pri-1
Cc: mqg@chromium.org
Status: Started (was: Unconfirmed)
From taking a first look at this:
- temp_metrics doesn't seem to crash, or run into errors
- after "fixing" the fan control (as outlined below) webql aquarium triggers the expected up and down regulation of the fan speed.

At first when I just flashed the image I also saw a suspiciously low fan activity even though the cpu was being throttled by temp_metrics. This went away in the course of testing.

However, I can reproduce the bad-fan behavior by cold resetting the EC.
Namely, after a cold-reset, any request to set the fan rpm doesn't go through, and the rpm stays at 0. If I call ectool fanduty [0-100] once, then things start to work again. Maybe there are other ways to get the fan-control running again (as the report outlines, maybe some charger action).

Unfortunately, I haven't been able to retrieve a servo-v1 connector here to get ec console access, maybe you guys can retrieve one?

Mengqi, if you still have the link that you got, or that I left, could you take a look?
Flash R63-10032.86.0, and notice if inside the EC there are issues with fan control, or fanspeed setting & retrieval. Namely, notice if there are errors if you after an ec cold_reset type "ectool pwmsetfanrpm 4000" in the ap console. What does the ec side say?

I don't think the issue is on the temp_metrics or ec image side of things since both of those haven't been touched in a while.

I'll look more on my end if I can pinpoint some changes in ectool that might have introduced this issue.

Cc: vpalatin@chromium.org
Labels: OS-iOS
Labels: -OS-iOS
other data-point: there have been some changes around number of fans in ectool over the last ~year, but even when the fan is unresponsive, I still get
$ectool pwmgetnumfans
Number of fans = 1
as output
Thank you all for looking into this and nice job coconutruben on reproducing it.  I knew there was something abnormal going on.  This is the kind of problem only us engineers and testers ever notice so I didn't expect normal users to have picked up on it.  Accordingly I found no references to it on Google+ or other forums.

If I can be of any help with tests of my system just let me know.  I also have an earlier HP 14" Chromebook with fans that I can test on.

Scott


On testing w/in google you could consider an idle lab resource as well if local unit unavailable.  Obviously there's nothing like being there though to feel the fan ;)

h=$(atest host list -b link | grep False | grep pool:suites | cut -d' ' -f1 | head -1)
atest host mod -l -r 'debug  crbug.com/808764 ' $h

#chroot
dut-control ${h}-servo.cros cold_reset:on sleep:1 cold_reset:off


Tried briefly and saw this,

# on dut
date ; ectool console | tail -2 ; date; ectool pwmsetfanrpm 4000 ; date ; ectool console | tail -40
Tue Feb  6 08:27:08 PST 2018
ioctl -1, errno 74 (Bad message), EC result 1 (INVALID_COMMAND)
[967.597563 HC 0x97]

Tue Feb  6 08:27:08 PST 2018
ioctl -1, errno 74 (Bad message), EC result 1 (INVALID_COMMAND)
ioctl -1, errno 74 (Bad message), EC result 6 (INVALID_VERSION)
Fan target RPM set for all fans.
Tue Feb  6 08:27:08 PST 2018
ioctl -1, errno 74 (Bad message), EC result 1 (INVALID_COMMAND)
[967.610476 HC 0x98]
[967.612118 HC 0x98]
[967.613715 HC 0x98]
[967.615156 HC 0x98]
[967.616697 HC 0x98]
[967.618230 HC 0x98]
[967.619753 HC 0x98]
[967.621303 HC 0x98]
[967.622810 HC 0x98]
[967.624332 HC 0x98]
[967.625837 HC 0x98]
[967.627450 HC 0x98]
[967.628949 HC 0x98]
[967.630485 HC 0x98]
[967.631916 HC 0x98]
[967.633442 HC 0x98]
[967.634974 HC 0x98]
[967.636509 HC 0x98]
[967.638053 HC 0x98]
[967.639581 HC 0x98]
[967.641189 HC 0x98]
[967.642624 HC 0x98]
[967.644188 HC 0x98]
[967.645781 HC 0x98]
[967.647296 HC 0x98]
[967.648527 HC 0x98]
[967.653822 HC 0x02]
[967.654532 HC 0x01]
[967.654972 HC 0x0b]
[967.655125 HC err 1]
[967.655571 HC 0x08]
[967.655724 HC err 6]
[967.656125 HC 0x08]
*[967.656477 HC 0x21]
[967.661807 HC 0x02]
[967.662457 HC 0x01]
[967.662940 HC 0x0b]
[967.663094 HC err 1]
[967.663526 HC 0x97]

It does seem like 'set' completes successfully ('*' above)

#define EC_CMD_PWM_SET_FAN_TARGET_RPM 0x21


but subsequent 'get always says '0'

ectool pwmgetfanrpm
Current fan RPM: 0

Above was for 
CHROMEOS_RELEASE_DESCRIPTION=10323.12.0 (Official Build) dev-channel link test

ectool version
ioctl -1, errno 74 (Bad message), EC result 1 (INVALID_COMMAND)
RO version:    link_v1.2.145-352afa8
RW version:    link_v1.2.145-352afa8
Firmware copy: RW
Build info:    link_v1.2.145-352afa8 2015-11-18 13:00:18 @build169-m2

crossystem fwid
Google_Link.2695.1.169

Cc: rspangler@chromium.org
I think the fan is not enabled, and triggering fanduty explicitly enables the fan.
Alternatively suspending the device, and resuming it, since chipset_resume enables the fan, the controls work again, so that makes me believe this theory more.
What makes me believe it less is that this should've shown up much earlier, unless something else was masking the behavior.

I made an EC image that inserted a pwm_enable_fan(1) call inside set_target_rpm, and I stopped being able to reproduce the issue. More on my thoughts on why this might/might not be the issue below at the end.

https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-link-2695.B/chip/lm4/pwm.c#113

Given that an ec image update isn't really feasible, I'd propose a pre-script to temp_metrics that does one call to "ectool fanduty 0" to make sure the fan is enabled. If that's acceptable/good I'll push a CL for this after testing it out.

- - - - - - More thoughts.

So from what I can tell, the code-path to set the fan rpm never enables the fan.
https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-link-2695.B/common/pwm_commands.c#33
https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-link-2695.B/chip/lm4/pwm.c#104

namely, set_rpm_mode() doesn't have a code-path that leaves the fan enabled

https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-link-2695.B/chip/lm4/pwm.c#75

Both command_fanset (the equivalent, but for EC console) and fanduty explicitly enable the fan.
https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-link-2695.B/chip/lm4/pwm.c#150
https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-link-2695.B/chip/lm4/pwm.c#236

resuming the chipset does enable the fan

https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-link-2695.B/chip/lm4/pwm.c#402

Comment 16 Deleted

> Given that an ec image update isn't really feasible, 

Well, once you have the EC patch, it's definitely feasible unless there is a strong dependency in the RO firmware.

That said I'm not sure I understand your explanation of the cause of the bug.
I might have forgotten some details of this platform, but on such an x86,
HOOK_CHIPSET_RESUME is normally *also* called at power-up,
ie the CPU state does G3->S5->S3->S0 when you boot the machine
pwm_resume() should run pwm_enable_fan(1) at that time.
So what is not happening exactly ?

I'm seeing too a mechanism to preserve it across the sysjump of the software-sync, but tricky stuff might happen there.
is your issue happening with soft-sync enabled ? disabled ? both ?
What are your RO and RW EC version ?
I think you're right, at least the sysjump might have something to do with this.
At least, I'm seeing the following behavior:
<boot>
$ ectool fanduty 0
$ ectool pwmsetfanrpm 8000
<hear fans>
$ ectool reboot_ec RO
<fans stop>
$ ectool pwmsetfanrpm 8000
<nothing happens>
$ ectool fanduty 0
$ ectool pwmsetfanrpm 8000
<hear fans again>

I should note that if I do a reboot_ec RO if I'm already in RO, then the fans don't stop. Do we ignore a jump if it's to the same code?

My comment about the ec patch is that I thought it has to go through some qual and validation before we decide to push out a new EC image.

I'm seeing this with soft-sync enabled, and
RO version:    link_v1.2.145-352afa8
RW version:    link_v1.2.145-352afa8
same as Todd mentioned above. But when I got the device, and it had the normal RO that's shipped, I'm also seeing this behavior. //didn't note down the version
Thanks for the useful tests.

> $ ectool reboot_ec RO
> <fans stop>

It's not terribly good, but on a real system, we never do such a thing (the only sequence I know triggering a sysjump to RO is the full EC re-flashing for dogfood machines).
And actually it's impossible to do it a write-protected machine, you would get EC_ERROR_ACCESS_DENIED;


> I should note that if I do a reboot_ec RO if I'm already in RO, 
> then the fans don't stop. 
> Do we ignore a jump if it's to the same code?

Yes, we do, this is this code in system_run_image_copy() :

        /* If system is already running the requested image, done */
        if (system_get_image_copy() == copy)
                return EC_SUCCESS;


> I'm seeing this with soft-sync enabled, and
> RO version:    link_v1.2.145-352afa8
> RW version:    link_v1.2.145-352afa8

Interesting,

At this point, you probably want to do the low-tech debugging :
just put a big fat trace in the 3 spots of chip/lm4/pwm.c accessing the LM4_FAN_FANCTL
and boot with soft-sync enabled (which requires a RW firmware with the proper RW EC image inside) and the servo connected.
I was doing some more work on the Pixel this weekend. It certainly seems there is a strong correlation between having it plugged-in when booting and the fan working properly.  
Still experiencing this regularly.  Just tonight the Pixel is very hard to use safely.  I had to shut down any web pages with active content just to get the thing to cool down to the high 60's.  I connected the charger and went through several power cycles hoping the fan would stay on.  

Even with core temps in the 90's the fan starts up at boot running a rather high rpm and only runs until shortly after booting.  It then shuts down to 0 rpm with the temps climbing into the 80's and above.  No fan activity after that initial shutdown.

Finally sometime after the third power cycle, the fan is now on an regulating temperature.
Sorry I haven't looked into this for a bit - I was OOO.

Anyhow, I got the right cables and from what I can tell by using fanset and faninfo commands the following happens:
- when the device is in RO, the fan is enabled, and the EC jumps to RW, the fan gets disabled
- when the device is in RW, the fan is enabled, and the EC jumps to RO, the fan gets disabled.
I'll try to trace this a little further, but seems to be what you were suggesting Vincent that something undesired happens during the RO->RW jump.

@Scott: could you try out something on your device. Namely, when you notice that the fan isn't kicking in, suspend and resume the device again (closing lid for a few seconds should do it), and then notice if the fan kicks in.

I figured it was something like vacation so I tried to avoid being obnoxious. Hope it was a good one.

I will run that test a few times this evening and post the results.
Ran several tests.  No power cycles, just suspended by folding the screen closed.  All test performed on battery power.

- Started Pixel
- Loaded up heavy WebGL demo allowing temps to climb into the high 80's
- No fan activity
- Shut down WebGL demo
- Suspend, wait several seconds
- On wake, fan spun up to high rpm, immediately settling down but not shutting off
- Shortly after login, the fan seemed to spin back down to zero
- Started WebGL demo again - temps climbing to low 90's
- no fan activity
- Left WebGL demo running
- Suspend, wait several seconds
- On wake, light fan activity, quickly spun down to zero
- Stopped WebGL demo



- Started typing this.
- Suspend, wait several seconds
- On wake, high rpm fan for a few seconds then back to zero even though temps remain in the mid 70's.
- continued typing
- after a couple minutes it appeared the fan restarted and began running at a low rpm.
- Started WebGL demo again and watched temps climb into the 90's.  The fan remained at the low rpm as though it got no further speed updates.
- Shut down WebGL demo to let system cool
- A couple minutes after shutting down demo, the fan spun up a bit again.


It's hard to find any consistent patterns in this.  It's as though the fan only gets sporadic updates remaining at whatever the last requested rpm was.  If it was at zero, then it stays there for quite some time even if system temps climb rapidly into dangerous territory.  If it's at a low rpm, it does the same.  Then at some random time later, it may change rpm again.

Right now it's running at a bit higher rpm and finally cooled the cpu down to the 60's.

If you want me to run more specific tests with or without the supply, just let me know.

I almost convinced myself that some of what I was observing was just the hysteresis built into the fan control logic but there is no way the fan should ever remain off or at a low rpm when temps are in the 90's. At that point the fans should be cranked up to high speed.
A few more use experiments today.

I started it up and worked on a few things for about 10 minutes.  Nothing with heavy CPU loading.  No fan activity, but that wasn't really expected.

I cranked up a WebGL demo and let it go.  The temps continued to rise till it hit 101!  Ouch... that's out of spec for the part.  I think I actually saw the chip throttle itself at that point as the animation glitched.

Needless to say I didn't let it continue to run at that temp.  I shut down the demo and suspended the machine.

After waiting about 10 seconds I woke it back up.  The fan immediately started, but at a rather low rpm.  I logged in and started up the WebGL demo.  Again, the temps rose in the upper 90's only this time the fan continued running at that low speed having little effect on the overall core temps.

It seems like shortly after boot or wake, the fan gets to a point where no further rpm updates are received.  Yet on other occasions, something seems to start the fan monitoring back up again out of the blue.


I should note those last tests were battery only as well
scottt492@gmail.com, thanks for doing these experiments.

Ruben, would it be useful to get /var/log/messages from scottt492@gmail.com?  That may help shed light on whether the issue is in reacting to changes in temperature (if the script has crashed, for example) vs trying to set the fan speed.
No problem. Just give me a scenario you want tested and I'll capture the logs.
Indeed, thanks scottt492@gmail.com :)

Providing the logs would be great. In general any of the tests that you mentioned with logs would be useful, especially if you can reproduce the scenario where the fans don't kick in at first, and after suspending they do kick in, but at low rpm.

I think that there is an issue with fan activation, which shows itself by the fans never kicking in in some cases. I think there's two ways we can solve that issue, so I'll see if I can provide CLs for that today/tomorrow.

However, that would not explain the fans not speeding up to higher RPM, especially if we wait 2-3 minutes at a higher load. So I'm still trying to see what's going on there

I will see what I can provide. It's a hard problem to characterze much less reproduce.

2-3 minutes at high loads, like those WebGL demos, is not something I want entertain. I'm getting worried about these thermal stresses I am subjecting the system to.  Most gamers would freak if their systems hit 90 C.  I think we agree, regardless of the hysteresis employed to avoid constantly changing fan speeds, there is a critical threshold at which point the fan should be aggressively trying to reduce system temperatures. I think 80 plus Centigrade should easily meet that requirement.

As it turns out the absolute maximum temperature rating for the mobile i5 is 105. I haven't got that yet and I don't really want to.  These Pixel Chromebooks are sort of expensive.:-)

I emailed a somewhat long log to coconutruben.  It seemed a little large to post here but I can do that if you like.

I captured the problem this morning.  I was working on the Pixel again just doing low-load stuff that doesn't usually bring up the fan.  I cranked up a web demo and let the temps rise to the low 90's - no fan activity.

https://experiments.withgoogle.com/chrome/the-polygon-shredder

I suspended the Pixel with the demo still running.

I opened the lid and logged in with the demo still running and the fan never started up... Yikes!

I suspended the Pixel again with the demo running.

I opened the lid and the fan started.  I logged in and by that time the fan had gone back to a very low, almost inaudible speed.  With the demo still running the temps were climbing back into the 90's again with no speed changes for the fan.  It just kept running at the low rpm having little effect on the high temperatures.

I emailed the log covering that time period.

Now... as I sit here typing this, the fan has come back of it's own accord of course.  I will email that log as well.


I emailed the logs for the low-load period where the fan started again.  It remained at a log rpm so I decided to hit it with the demo again.  The temps climbed into the 90's and fan changed it speed eventually from a very low speed to a slightly higher speed nowhere near necessary to deal with the high CPU temps.  I finally shut the demo down to let things cool.  Here is a short log of that activity.

2018-03-07T07:05:51.071582-05:00 NOTICE temp_metrics[15803]: Throttling (temps: 1:41:7:39:9:69:): 1800000 800000 900 0 0x180aa00dd8068 # disable turbo
2018-03-07T07:09:01.894171-05:00 NOTICE temp_metrics[16586]: Setting fan RPM (temps: 1:39:7:38:9:62:): 5500 -> 4000
2018-03-07T07:09:01.901649-05:00 NOTICE temp_metrics[16593]: Throttling (temps: 1:39:7:38:9:62:): 1801000 800000 1150 0 0x180aa00dd8068 # cap pkg to 13W
2018-03-07T07:09:01.922338-05:00 CRIT kernel: [ 1966.154975] CPU2: Package power limit notification (total events = 5566)
2018-03-07T07:09:01.922352-05:00 CRIT kernel: [ 1966.154976] CPU3: Package power limit notification (total events = 5566)
2018-03-07T07:09:01.922353-05:00 CRIT kernel: [ 1966.155003] CPU1: Package power limit notification (total events = 5566)
2018-03-07T07:09:01.922355-05:00 CRIT kernel: [ 1966.155005] CPU0: Package power limit notification (total events = 5566)
2018-03-07T07:09:01.933338-05:00 INFO kernel: [ 1966.166012] CPU3: Package power limit normal
2018-03-07T07:09:01.933353-05:00 INFO kernel: [ 1966.166014] CPU2: Package power limit normal
2018-03-07T07:09:01.933356-05:00 INFO kernel: [ 1966.166025] CPU0: Package power limit normal
2018-03-07T07:09:01.933365-05:00 INFO kernel: [ 1966.166026] CPU1: Package power limit normal
2018-03-07T07:09:11.994988-05:00 NOTICE temp_metrics[16676]: Throttling (temps: 1:39:7:38:9:66:): 1801000 800000 1150 0 0x180aa00dd8070 # cap pkg to 14W
2018-03-07T07:10:22.296433-05:00 NOTICE temp_metrics[16977]: Throttling (temps: 1:39:7:39:9:79:): 1801000 800000 1150 0 0x180aa00dd8078 # cap pkg to 15W
2018-03-07T07:10:32.328875-05:00 NOTICE temp_metrics[17042]: Throttling (temps: 1:39:7:38:9:80:): 1801000 800000 1150 0 0x180aa00dd8080 # cap pkg to 16W
2018-03-07T07:11:02.487233-05:00 NOTICE temp_metrics[17201]: Setting fan RPM (temps: 1:41:7:39:9:86:): 4000 -> 5500
2018-03-07T07:11:02.493551-05:00 NOTICE temp_metrics[17208]: Throttling (temps: 1:41:7:39:9:86:): 1801000 800000 1150 0 0x180aa00dd8078 # cap pkg to 15W
In that last little segment I can tell you the fan was nowhere near the 4000 to 5500 rpm.

Again, all this mornings tests were battery only.
Did you get those email logs? Let me know if you need more or more specific tests.
Thank you scottt492@gmail.com for the detailed logs, and for the pointer :)

I have uploaded two CLs crrev.com/c/964069 and crrev.com/c/964037 that I think can address part of this issue.

When testing this I found that temp_metrics seems stable, and doesn't crash. Your logs also have no crashes in temp_metrics. I do believe that there's an issue with the fan being disabled in some cases.
I also found that 3000-4000 range to be almost impossible to hear, but that might just be the office environment here.

Now, the speed at which the device cools down, or the fan speed itself might be another issue, mainly that
a) that the fans react to skin-temperature, and not core-temperature, and only every 10s. So there is an expected delay there.
b) calibration might not be with that intensive a workload in mind
Hopefully the simple change to temp_metrics.conf will be sufficient. Still though, that would not seem to cover the scenario I have observed where the fan seems to remain at a set rpm with no updates for long periods of time even with cpu loads that would warrant speed increases. In that case, the fan is enabled but fails to get updates.

I assume by "skin" you mean the fan control system is reacting to the CPU package temp instead of core temps. There shouldn't be much thermal lag between the two so I wouldn't think that would be part of the problem.

I would like to suggest something though.  I understand the, probably arbitrary, 10 second period is really there to avoid having the fan constantly change speed which would annoy the user. However using that same time constant for both speed increases and decreases isn't really optimal if that is currently the case. Fan speed increases are the critical action whereas speed decreases are non-critical and done only to reduce noise and power usage when the thermal load doesn't require the extra cooling. Having the fan react more quickly to temperature increases and applying the fixed 10 second delay to temperature decreases would result in similar fan behavior from a user perspective. Rapid up-down fluctuations in fan speed would still be avoided, but the fan would cool the CPU more rapidly, avoiding the heat build up that would then take longer to dissipate. Even non-technical users recognize when their lap gets warm. :-)

I'm sure there is some optimal algorithm that would scale the reaction time for fan speed increases with cpu package temperature but I doubt that's necessary. Probably just applying something simple like a 3 second delay on increases and the standard 10 second delay on decreases would accomplish much the same.  This could improve overall fan power usage to some small degree.


Comment 38 Deleted

> I assume by "skin" you mean the fan control system is reacting to the CPU package temp instead of core temps. 

No, the skin temp is an estimation of the case temperature using the measurements from temperature IR sensors in various points.

> but the fan would cool the CPU more rapidly

The final goal of the thermal loop is not really to 'cool' the CPU (which is far from Tjmax in those cases) but to maintain the casing temperature within limits.
The cooling system is designed with no direct response to CPU temps, but simply to avoid the user getting a warm lap?

With CPU core and package temps available to the OS, should the cooling system base it's actions on the temperatures of structures in the system that are bound to have significant thermal lag and unpredictable thermal gradients relative to the core temps? If the cooling system is not reacting to the cpu temps how is it to properly regulate the highest dissipation element in the system? With HTML 5 and WebGL capable of delivering high-load content to any browser this seems a recipe for trouble going forward.

I have seen multiple cores get right to the edge of i5's 105C Tjmax on multiple occasions. That is certainly due to the bug behind this issue but also illustrates that a single point of failure allows the system to hit unsafe core temperatures with a lithium ion battery in close proximity. As an electrical engineer I certainly wouldn't be happy having that possibility in the wild.


I'm still a little stunned that the temp management system isn't really monitoring the heat source!  Thermal gradients between the CPU and adjacent structures will vary significantly with ambient conditions. There is no way to guarantee safe core temps with such a strategy. Automotive cooling systems don't measure the temperature of the hood as the internal engine temperatures are the important factor and the hood is not a reliable indicator of those temperatures.

Intel has a strict "no overclocking" policy for a reason. Running CPU's near their thermal limits reduces the lifespan of the part on top of increasing the risk of errors and crashes.

It really sounds like there is room for improvement here beyond the bug fix.


Project Member

Comment 41 by bugdroid1@chromium.org, Mar 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/ec/+/43d0769918a0c674423227bb9e81226a0dba6274

commit 43d0769918a0c674423227bb9e81226a0dba6274
Author: Ruben Rodriguez Buchillon <coconutruben@chromium.org>
Date: Fri Mar 16 22:56:23 2018

temp_metrics: use fanduty 0 to enable fan

If the fan is never enabled, temp_metrics itself has no code-path to
enable the fan. This fixes this by calling fanduty 0 in the beginning
of temp_metrics, since fanduty does explicitly enable the fan.

Note: This is a hack to avoid having to flash a new EC image. See
crrev.com/c/964037 for a more fundamental fix to the same issue.

BRANCH=link
BUG= chromium:808764 
TEST=couldn't reproduce issue with this version of temp_metrics.

Change-Id: I8a9b258ba7b50cf5180497d318f8d94454dab434
Signed-off-by: Ruben Rodriguez Buchillon <coconutruben@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/964069
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>

[modify] https://crrev.com/43d0769918a0c674423227bb9e81226a0dba6274/util/temp_metrics.conf

Nice job Ruben!  Looking forward to getting your fix rolled out to my Pixel. That will hopefully keep the i5 out of the danger zone.


Project Member

Comment 43 by bugdroid1@chromium.org, Mar 19 2018

Labels: merge-merged-firmware-poppy-10431.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/ec/+/2cf6a6ae8c15590f7cdf0cda153d45e5b49a632f

commit 2cf6a6ae8c15590f7cdf0cda153d45e5b49a632f
Author: Ruben Rodriguez Buchillon <coconutruben@chromium.org>
Date: Mon Mar 19 05:11:23 2018

temp_metrics: use fanduty 0 to enable fan

If the fan is never enabled, temp_metrics itself has no code-path to
enable the fan. This fixes this by calling fanduty 0 in the beginning
of temp_metrics, since fanduty does explicitly enable the fan.

Note: This is a hack to avoid having to flash a new EC image. See
crrev.com/c/964037 for a more fundamental fix to the same issue.

BRANCH=link
BUG= chromium:808764 
TEST=couldn't reproduce issue with this version of temp_metrics.

Change-Id: I8a9b258ba7b50cf5180497d318f8d94454dab434
Signed-off-by: Ruben Rodriguez Buchillon <coconutruben@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/964069
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
(cherry picked from commit 43d0769918a0c674423227bb9e81226a0dba6274)
Reviewed-on: https://chromium-review.googlesource.com/967327
Commit-Queue: Furquan Shaikh <furquan@chromium.org>
Tested-by: Furquan Shaikh <furquan@chromium.org>
Trybot-Ready: Furquan Shaikh <furquan@chromium.org>
Reviewed-by: Furquan Shaikh <furquan@chromium.org>

[modify] https://crrev.com/2cf6a6ae8c15590f7cdf0cda153d45e5b49a632f/util/temp_metrics.conf

Ruben,

How will I know when this fix had been rolled to the device?

thanks,

Scott

Comment 45 by derat@chromium.org, Mar 20 2018

Barring any merges to earlier releases, I think this will go out with Chrome 67 (probably early June for stable channel, late April for beta channel, sooner for dev channel).
Thanks
Certainly looking forward to the fix on this.  I did a simple transfer of photos from an SD card to a USB stick the other day and as usual, the fan sat there mute while the temps hit 92C!  So, it doesn't require some beastly WebGL demo to create these high temp conditions.
Is there any way to determine if this has rolled into the release builds yet?  

I was just writing some email this evening and the  sat there with no fan running into the low 80's (C) for about an hour.  When I rebooted it was the same behavior observed in examples above.  When the Pixel restarted the fan immediately spun up to cool the system and stayed on until the temps got back down into the 60's.  It's still running normally as I type this.
can you tell me what the exact version you're running is?
I was running this last night.
Version 66.0.3359.203 (Official Build) (64-bit)

Just updated to...
Version 67.0.3396.87 (Official Build) (64-bit)

I will test it again.
Just ran this demo with Version 67.0.3396.87 (Official Build) (64-bit)

https://experiments.withgoogle.com/biomes

Temps hit 85 C before I shut it down.

Upon reboot fan activated at high rpm for a moment and then shut down again.  Seems the same as before.
Ran it again up till all the cores were hovering at 100!  I hate doing that. :-)

The base of the unit at this point was about 110 F.  

The fan finally came on and started regulating temps down into the 80's.  Maybe it's working now.  Not sure.  In any case it still seems like poor thermal regulation but at least there was fan activity.  I'll run some more tests this evening to see if the fan activity is consistent.

> The base of the unit at this point was about 110 F.

It's fine / expected for this machine if you are running an artificial intensive workload. The loud fan threshold starts at 42 C (107 F)
By "loud" do you mean the high rpm threshold?  There appears to be more fan activity which is certainly good. I'll run more tests.  However, even at those temps the fan never maintained even close to it's high rpm settings. The only time I actually heard the fan hit it's max rpm was as the system booted.  At other times it was audible but probably half max rpm or less.

As noted earlier in the thread, I was able previously to get the system temps very high doing a simple copy operation of multiple files to or from a flash drive.  That certainly isn't an "artificial" work load and neither is a WebGL demo. A raw floating point benchmark could be considered an artificial workload. A snazzy WebGL visual demo however, is representative of what the web is becoming be it cool data visualization or web gaming.  These machines are designed with the primary function of consuming web content. They need to be able to do that and remain effectively thermally regulated right down to the core level.
It does appear that the fan isn't completely dormant now.  So that's progress!

However, just now I started it up from a full shutdown, went straight to the "biomes" demo linked above (you have to select the arrows to get past the initial demo as that doesn't work).  The Pixel sat there for almost ten minutes with the cores hovering in the high 90's and frequently hitting 101 and 102!  The fan finally went from barely audible (below 4000 rpm) to audible but still nowhere close to it's max rpm.  It would spin up for a short period to 5500 rpm and then go back down to a low rpm allowing the cores to again reach 90's all across the board.

Again, these same temperatures can be attained by simply copying a large number of files to or from a USB stick.  The demo is just the easiest way to reproduce this scenario.

As I type this, the fan is running a consistent 4000 to 5500 rpm according to the log and keeping temps in the low 70's but of course there is no longer any significant load.  Though it's max rpm appears to be 9000 rpm (the highest value I've seen in the log) that speed is rarely ever employed even when temps are almost max across all the cores!  No matter how you approach it, letting a modern CPU like this spend long periods at or near Tjmax is poor thermal management when the fan isn't even near max rpm.

Somewhere along the way, the fan/chipset driver changed as the Pixel once had a considerably more active cooling strategy from a user point of view.  This thing would routinely spin the fan up and down to cool itself and that never bothered me.  The absence of that activity raised the red flag that started this issue. I believe there's a strong argument for further code changes in order to get the Pixel back to behaving well thermally.

Status: Fixed (was: Started)
scottt492@gmail.com thanks for being super helpful in debugging this. Given, that the original issue being tracked in this bug (fans not spinning up) has been fixed, I am closing this bug as fixed.
Agreed, the dead fan issue appears to be fixed.  

Thanks Ruben for your work on this!

Comment 58 Deleted

Sign in to add a comment