New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 842873 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocking:
issue 843037



Sign in to add a comment

[kevin/bob] graphics_PerfControl regression [irq/66-chromeos]

Project Member Reported by ihf@chromium.org, May 14 2018

Issue description

Looks like a kernel change/test that ran way before us causing a CPU to spin?

https://stainless.corp.google.com/search?exclude_retried=true&first_date=2018-05-08&master_builder_name=&builder_name_number=&shard=&exclude_acts=true&builder_name=&master_builder_name_number=&owner=&retry=&exclude_cts=true&exclude_non_production=false&hostname=&board=&test=%5Egraphics%5C_PerfControl%24&exclude_not_run=false&build=%5ER68%5C-10674%5C.0%5C.0%24&status=FAIL&status=ERROR&status=ABORT&reason=&waterfall=&suite=&last_date=2018-05-14&exclude_non_release=true&exclude_au=true&model=&view=list&board_model=%5Ebob%5C%2Fbob%24


https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/199581220-chromeos-test/chromeos2-row6-rack3-host1/graphics_PerfControl/debug/

05/12 23:46:14.819 INFO |             utils:1612| Starting to wait up to 60.0s for idle CPU...
05/12 23:46:15.822 INFO |             utils:1624| After waiting 1.0s CPU utilization is 0.173.
05/12 23:46:17.825 INFO |             utils:1624| After waiting 3.0s CPU utilization is 0.172.
05/12 23:46:21.830 INFO |             utils:1624| After waiting 7.0s CPU utilization is 0.179.
05/12 23:46:29.840 INFO |             utils:1624| After waiting 15.0s CPU utilization is 0.179.
05/12 23:46:45.857 INFO |             utils:1624| After waiting 31.0s CPU utilization is 0.175.
05/12 23:47:01.874 INFO |             utils:1624| After waiting 47.0s CPU utilization is 0.176.
05/12 23:47:17.892 INFO |             utils:1624| After waiting 63.0s CPU utilization is 0.175.
05/12 23:47:17.892 WARNI|             utils:1626| CPU did not become idle.
05/12 23:47:17.901 INFO |             utils:1644| Logging current process activity using top and ps.
05/12 23:47:17.901 DEBUG|             utils:0214| Running 'top -b -n1 -c'
05/12 23:47:18.121 INFO |             utils:1647| * Command: 
    top -b -n1 -c
Exit status: 0
Duration: 0.210731983185

stdout:
top - 23:47:18 up 11 min,  0 users,  load average: 2.26, 3.69, 2.23
Tasks: 241 total,   2 running, 238 sleeping,   0 stopped,   1 zombie
%Cpu(s): 11.6 us, 24.3 sy,  0.6 ni, 63.2 id,  0.3 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem :  3904868 total,  1991572 free,   234884 used,  1678412 buff/cache
KiB Swap:  3813344 total,  3813344 free,        0 used.  3470228 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  120 root     -51   0       0      0      0 R 100.0  0.0  11:40.84 [irq/66-ch+
  119 root      20   0       0      0      0 S  41.2  0.0   0:59.54 [spi5]
17254 root      20   0    2616   1380   1204 R  17.6  0.0   0:00.05 top -b -n1+
    1 root      20   0    3404   2548   1580 S   0.0  0.1   0:03.13 /sbin/init

[...]
1     0   120     2 -51   -      0     0 -      R    ?         11:40 [irq/66-chromeos]


--
Also need to import error at the right place?

  File "/usr/local/autotest/tests/graphics_PerfControl/graphics_PerfControl.py", line 40, in run_once
    raise error.TestFailure('Failed: %s' % pc.get_error_reason())
AttributeError: 'module' object has no attribute 'TestFailure'
 
can you look at /proc/interrupts to see what is irq 66?
Owner: puneetster@chromium.org
 66:   15747595          0          0          0          0          0  rockchip_gpio_irq   1 Level     chromeos-ec

-> to Puneet for ec

Comment 3 by pwang@chromium.org, May 14 2018

Cc: pwang@chromium.org
note that this is still happening on chromeos2-row6-rack3-host1.cros
Cc: ihf@chromium.org
+Ilja, maybe we want to also remove that device from our pools until the irq storm is fixed

Comment 6 by pwang@chromium.org, May 14 2018

Same behavior is observed in other machine as well.
chromeos6-row2-rack24-host*.cros etc.
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                
  124 root     -51   0       0      0      0 R  98.4  0.0   7:34.27 irq/66-chromeos

Comment 7 by pwang@chromium.org, May 14 2018

Cc: sadolfsson@chromium.org
+sadolfsson

Looking at the CL landed in 10674.0.0
This looks fishing to me as it touches cros_ec.c
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1051085
Project Member

Comment 8 by bugdroid1@chromium.org, May 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d335682a57150067457d848a0f5e9b344720929b

commit d335682a57150067457d848a0f5e9b344720929b
Author: Po-Hsien Wang <pwang@chromium.org>
Date: Tue May 15 04:29:52 2018

graphics_PerfControl: fix TestFailure

TestFailure is not used in autotest anymore. Use TestFail instead.

BUG= chromium:842873 
TEST=Check error.py

Change-Id: Ic888833e0721a65a464b0e69adda2ece0255f402
Reviewed-on: https://chromium-review.googlesource.com/1058142
Commit-Ready: Pohsien Wang <pwang@chromium.org>
Tested-by: Pohsien Wang <pwang@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/d335682a57150067457d848a0f5e9b344720929b/client/site_tests/graphics_PerfControl/graphics_PerfControl.py

Comment 9 by deanliao@google.com, May 15 2018

Blocking: 843037
Project Member

Comment 11 by bugdroid1@chromium.org, May 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/ec/+/f139d3a0ca9215b5b5bb2abc1f120ff6171036c9

commit f139d3a0ca9215b5b5bb2abc1f120ff6171036c9
Author: Ilja H. Friedel <ihf@chromium.org>
Date: Tue May 15 17:00:53 2018

Revert "npcx: CEC: Send CEC message in mkbp event"

This reverts commit 74b5a2ccb58739d4e21fdeb36e40fe01c0ca7ede.

Suspected to have broken perf tests by keeping a CPU busy on kevin/bob.

BUG= chromium:842873 , b:76467407

Change-Id: Iebbbb4623116840b851656e3ec28e75dc99cff79
Reviewed-on: https://chromium-review.googlesource.com/1060073
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/f139d3a0ca9215b5b5bb2abc1f120ff6171036c9/include/ec_commands.h
[modify] https://crrev.com/f139d3a0ca9215b5b5bb2abc1f120ff6171036c9/common/mkbp_event.c
[modify] https://crrev.com/f139d3a0ca9215b5b5bb2abc1f120ff6171036c9/util/ectool.c
[modify] https://crrev.com/f139d3a0ca9215b5b5bb2abc1f120ff6171036c9/chip/npcx/cec.c

Project Member

Comment 12 by bugdroid1@chromium.org, May 15 2018

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7f22e51e592d4cac1ebc832688148910cc1965c9

commit 7f22e51e592d4cac1ebc832688148910cc1965c9
Author: Ilja H. Friedel <ihf@chromium.org>
Date: Tue May 15 17:01:22 2018

Revert "CHROMIUM: mkbp: Increase maximum mkbp event size"

This reverts commit 8e3c039f25b874f5c22dbf1109f9ee97cd9ec9f0.

Suspected to have broken perf tests by keeping a CPU busy on kevin/bob.

BUG= chromium:842873 , b:76467407

Change-Id: I4a452f1ab472cbdcb4b70cc24526bc8ed051b335
Reviewed-on: https://chromium-review.googlesource.com/1059947
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/7f22e51e592d4cac1ebc832688148910cc1965c9/include/linux/mfd/cros_ec.h
[modify] https://crrev.com/7f22e51e592d4cac1ebc832688148910cc1965c9/drivers/mfd/cros_ec.c
[modify] https://crrev.com/7f22e51e592d4cac1ebc832688148910cc1965c9/include/linux/mfd/cros_ec_commands.h

Comment 13 by ihf@chromium.org, May 15 2018

Just to clarify, the condition is not caused by graphics_PerfControl, but detected by it. It is caused by a test that runs earlier.

As an example here
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/200307442-chromeos-test/chromeos6-row1-rack24-host9/graphics_PerfControl/debug/

one CPU got burned for 7:27.51 minutes at 09:19:55.851. There were three tests run on the DUT since boot. I guess the first login into Chrome caused the problem (or maybe that is even not needed).

2018-05-15T16:12:21.688965+00:00 INFO kernel: [    0.000000] Booting Linux on physical CPU 0x0

2018-05-15T16:12:43.475008+00:00 NOTICE root[3508]: autotest runtest desktopui_SimpleLogin
2018-05-15T16:14:31.313580+00:00 NOTICE root[5945]: autotest runtest desktopui_ChromeSanity
2018-05-15T16:16:26.383237+00:00 NOTICE root[8666]: autotest runtest logging_FeedbackReport

2018-05-15T16:18:50.805711+00:00 NOTICE root[12756]: autotest starting iteration /usr/local/autotest/results/default/graphics_PerfControl/sysinfo/iteration.1 on kevin-arcnext_2.0GHz_4GB

05/15 09:19:55.842 WARNI|             utils:1626| CPU did not become idle.
05/15 09:19:55.851 INFO |             utils:1644| Logging current process activity using top and ps.
05/15 09:19:55.851 DEBUG|             utils:0214| Running 'top -b -n1 -c'
05/15 09:19:56.073 INFO |             utils:1647| * Command: 
    top -b -n1 -c
Exit status: 0
Duration: 0.212687969208

stdout:
top - 09:19:56 up 7 min,  0 users,  load average: 2.03, 2.58, 1.37
Tasks: 233 total,   2 running, 230 sleeping,   0 stopped,   1 zombie
%Cpu(s): 11.3 us, 24.3 sy,  0.0 ni, 64.0 id,  0.4 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem :  3904868 total,  2773236 free,   218716 used,   912916 buff/cache
KiB Swap:  3813344 total,  3813344 free,        0 used.  3488828 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  121 root     -51   0       0      0      0 R  94.4  0.0   7:27.51 [irq/66-ch+
ihf, is the reverted patch still the suspect? 
(the one I authored)

Comment 16 Deleted



Add a comment and make changes below
⚐ Flag issue as spam	[kevin/bob] graphics_PerfControl regression [irq/66-chromeos]	
Project Member Reported by ihf@chromium.org, Today (23 hours ago) Edit description	Back to list
Issue description
Looks like a kernel change/test that ran way before us causing a CPU to spin?

https://stainless.corp.google.com/search?exclude_retried=true&first_date=2018-05-08&master_builder_name=&builder_name_number=&shard=&exclude_acts=true&builder_name=&master_builder_name_number=&owner=&retry=&exclude_cts=true&exclude_non_production=false&hostname=&board=&test=%5Egraphics%5C_PerfControl%24&exclude_not_run=false&build=%5ER68%5C-10674%5C.0%5C.0%24&status=FAIL&status=ERROR&status=ABORT&reason=&waterfall=&suite=&last_date=2018-05-14&exclude_non_release=true&exclude_au=true&model=&view=list&board_model=%5Ebob%5C%2Fbob%24


https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/199581220-chromeos-test/chromeos2-row6-rack3-host1/graphics_PerfControl/debug/

05/12 23:46:14.819 INFO |             utils:1612| Starting to wait up to 60.0s for idle CPU...
05/12 23:46:15.822 INFO |             utils:1624| After waiting 1.0s CPU utilization is 0.173.
05/12 23:46:17.825 INFO |             utils:1624| After waiting 3.0s CPU utilization is 0.172.
05/12 23:46:21.830 INFO |             utils:1624| After waiting 7.0s CPU utilization is 0.179.
05/12 23:46:29.840 INFO |             utils:1624| After waiting 15.0s CPU utilization is 0.179.
05/12 23:46:45.857 INFO |             utils:1624| After waiting 31.0s CPU utilization is 0.175.
05/12 23:47:01.874 INFO |             utils:1624| After waiting 47.0s CPU utilization is 0.176.
05/12 23:47:17.892 INFO |             utils:1624| After waiting 63.0s CPU utilization is 0.175.
05/12 23:47:17.892 WARNI|             utils:1626| CPU did not become idle.
05/12 23:47:17.901 INFO |             utils:1644| Logging current process activity using top and ps.
05/12 23:47:17.901 DEBUG|             utils:0214| Running 'top -b -n1 -c'
05/12 23:47:18.121 INFO |             utils:1647| * Command: 
    top -b -n1 -c
Exit status: 0
Duration: 0.210731983185

stdout:
top - 23:47:18 up 11 min,  0 users,  load average: 2.26, 3.69, 2.23
Tasks: 241 total,   2 running, 238 sleeping,   0 stopped,   1 zombie
%Cpu(s): 11.6 us, 24.3 sy,  0.6 ni, 63.2 id,  0.3 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem :  3904868 total,  1991572 free,   234884 used,  1678412 buff/cache
KiB Swap:  3813344 total,  3813344 free,        0 used.  3470228 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  120 root     -51   0       0      0      0 R 100.0  0.0  11:40.84 [irq/66-ch+
  119 root      20   0       0      0      0 S  41.2  0.0   0:59.54 [spi5]
17254 root      20   0    2616   1380   1204 R  17.6  0.0   0:00.05 top -b -n1+
    1 root      20   0    3404   2548   1580 S   0.0  0.1   0:03.13 /sbin/init

[...]
1     0   120     2 -51   -      0     0 -      R    ?         11:40 [irq/66-chromeos]


--
Also need to import error at the right place?

  File "/usr/local/autotest/tests/graphics_PerfControl/graphics_PerfControl.py", line 40, in run_once
    raise error.TestFailure('Failed: %s' % pc.get_error_reason())
AttributeError: 'module' object has no attribute 'TestFailure'
 
Comment 1 by marcheu@chromium.org, Today (23 hours ago)
⚐
can you look at /proc/interrupts to see what is irq 66?
Comment 2 by marcheu@chromium.org, Today (23 hours ago)
⚐
Owner: puneetster@chromium.org
 66:   15747595          0          0          0          0          0  rockchip_gpio_irq   1 Level     chromeos-ec

-> to Puneet for ec
Comment 3 by pwang@chromium.org, Today (23 hours ago)
⚐
Cc: pwang@chromium.org
Comment 4 by marcheu@chromium.org, Today (22 hours ago)
⚐
note that this is still happening on chromeos2-row6-rack3-host1.cros
Comment 5 by marcheu@chromium.org, Today (22 hours ago)
⚐
Cc: ihf@chromium.org
+Ilja, maybe we want to also remove that device from our pools until the irq storm is fixed
Comment 6 by pwang@chromium.org, Today (22 hours ago)
⚐
Same behavior is observed in other machine as well.
chromeos6-row2-rack24-host*.cros etc.
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                
  124 root     -51   0       0      0      0 R  98.4  0.0   7:34.27 irq/66-chromeos
Comment 7 by pwang@chromium.org, Today (22 hours ago)
⚐
Cc: sadolfsson@chromium.org
+sadolfsson

Looking at the CL landed in 10674.0.0
This looks fishing to me as it touches cros_ec.c
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1051085
Project Member Comment 8 by bugdroid1@chromium.org, Today (15 hours ago)
⚐
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d335682a57150067457d848a0f5e9b344720929b

commit d335682a57150067457d848a0f5e9b344720929b
Author: Po-Hsien Wang <pwang@chromium.org>
Date: Tue May 15 04:29:52 2018

graphics_PerfControl: fix TestFailure

TestFailure is not used in autotest anymore. Use TestFail instead.

BUG= chromium:842873 
TEST=Check error.py

Change-Id: Ic888833e0721a65a464b0e69adda2ece0255f402
Reviewed-on: https://chromium-review.googlesource.com/1058142
Commit-Ready: Pohsien Wang <pwang@chromium.org>
Tested-by: Pohsien Wang <pwang@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/d335682a57150067457d848a0f5e9b344720929b/client/site_tests/graphics_PerfControl/graphics_PerfControl.py

Comment 9 by deanliao@google.com, Today (12 hours ago)
⚐
Blocking: 843037
Comment 10 by ihf@chromium.org, Today (2 hours ago)
⚐
Looking at stainless history
https://stainless.corp.google.com/search?view=matrix&row=board_model&col=build&first_date=2018-05-09&last_date=2018-05-15&test=%5Egraphics%5C_PerfControl%24&board=kevin%7Cbob%7Cscarlet&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false

it is clear that the regression started
https://crosland.corp.google.com/log/10673.0.0..10674.0.0

There are only two suspicious changes. I will revert them.
Project Member Comment 11 by bugdroid1@chromium.org, Today (2 hours ago)
⚐
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/ec/+/f139d3a0ca9215b5b5bb2abc1f120ff6171036c9

commit f139d3a0ca9215b5b5bb2abc1f120ff6171036c9
Author: Ilja H. Friedel <ihf@chromium.org>
Date: Tue May 15 17:00:53 2018

Revert "npcx: CEC: Send CEC message in mkbp event"

This reverts commit 74b5a2ccb58739d4e21fdeb36e40fe01c0ca7ede.

Suspected to have broken perf tests by keeping a CPU busy on kevin/bob.

BUG= chromium:842873 , b:76467407

Change-Id: Iebbbb4623116840b851656e3ec28e75dc99cff79
Reviewed-on: https://chromium-review.googlesource.com/1060073
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/f139d3a0ca9215b5b5bb2abc1f120ff6171036c9/include/ec_commands.h
[modify] https://crrev.com/f139d3a0ca9215b5b5bb2abc1f120ff6171036c9/common/mkbp_event.c
[modify] https://crrev.com/f139d3a0ca9215b5b5bb2abc1f120ff6171036c9/util/ectool.c
[modify] https://crrev.com/f139d3a0ca9215b5b5bb2abc1f120ff6171036c9/chip/npcx/cec.c

Project Member Comment 12 by bugdroid1@chromium.org, Today (2 hours ago)
⚐
Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7f22e51e592d4cac1ebc832688148910cc1965c9

commit 7f22e51e592d4cac1ebc832688148910cc1965c9
Author: Ilja H. Friedel <ihf@chromium.org>
Date: Tue May 15 17:01:22 2018

Revert "CHROMIUM: mkbp: Increase maximum mkbp event size"

This reverts commit 8e3c039f25b874f5c22dbf1109f9ee97cd9ec9f0.

Suspected to have broken perf tests by keeping a CPU busy on kevin/bob.

BUG= chromium:842873 , b:76467407

Change-Id: I4a452f1ab472cbdcb4b70cc24526bc8ed051b335
Reviewed-on: https://chromium-review.googlesource.com/1059947
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/7f22e51e592d4cac1ebc832688148910cc1965c9/include/linux/mfd/cros_ec.h
[modify] https://crrev.com/7f22e51e592d4cac1ebc832688148910cc1965c9/drivers/mfd/cros_ec.c
[modify] https://crrev.com/7f22e51e592d4cac1ebc832688148910cc1965c9/include/linux/mfd/cros_ec_commands.h

Comment 13 by ihf@chromium.org, Today (2 hours ago)
⚐
Just to clarify, the condition is not caused by graphics_PerfControl, but detected by it. It is caused by a test that runs earlier.

As an example here
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/200307442-chromeos-test/chromeos6-row1-rack24-host9/graphics_PerfControl/debug/

one CPU got burned for 7:27.51 minutes at 09:19:55.851. There were three tests run on the DUT since boot. I guess the first login into Chrome caused the problem (or maybe that is even not needed).

2018-05-15T16:12:21.688965+00:00 INFO kernel: [    0.000000] Booting Linux on physical CPU 0x0

2018-05-15T16:12:43.475008+00:00 NOTICE root[3508]: autotest runtest desktopui_SimpleLogin
2018-05-15T16:14:31.313580+00:00 NOTICE root[5945]: autotest runtest desktopui_ChromeSanity
2018-05-15T16:16:26.383237+00:00 NOTICE root[8666]: autotest runtest logging_FeedbackReport

2018-05-15T16:18:50.805711+00:00 NOTICE root[12756]: autotest starting iteration /usr/local/autotest/results/default/graphics_PerfControl/sysinfo/iteration.1 on kevin-arcnext_2.0GHz_4GB

05/15 09:19:55.842 WARNI|             utils:1626| CPU did not become idle.
05/15 09:19:55.851 INFO |             utils:1644| Logging current process activity using top and ps.
05/15 09:19:55.851 DEBUG|             utils:0214| Running 'top -b -n1 -c'
05/15 09:19:56.073 INFO |             utils:1647| * Command: 
    top -b -n1 -c
Exit status: 0
Duration: 0.212687969208

stdout:
top - 09:19:56 up 7 min,  0 users,  load average: 2.03, 2.58, 1.37
Tasks: 233 total,   2 running, 230 sleeping,   0 stopped,   1 zombie
%Cpu(s): 11.3 us, 24.3 sy,  0.0 ni, 64.0 id,  0.4 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem :  3904868 total,  2773236 free,   218716 used,   912916 buff/cache
KiB Swap:  3813344 total,  3813344 free,        0 used.  3488828 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  121 root     -51   0       0      0      0 R  94.4  0.0   7:27.51 [irq/66-ch+
Comment 14 by sadolfsson@chromium.org, Today (38 minutes ago)
⚐
ihf, is the reverted patch still the suspect? 
Comment 15 by sadolfsson@chromium.org, Today (33 minutes ago)
⚐
(the one I authored)
Comment 16 by vpalatin@google.com, Today (moments ago)
Delete comment ⚐
as commented in the revert patch, the EC changes is very unlikely to cause this kind of troubles as we are never rolling out ToT firmware changes on test DUTs.
So the firmware changes are effectivily no-op for the perf tests.
There is a small ectool userspace diff but looking at it, it seems quite innocuous (as in, there is a high probability these code paths are not even executed). 


the kernel one might be the issue though (particularly as in this case it was running without the matching EC change, the fallback in that case might have been badly tested) 

Comment 18 by ihf@chromium.org, May 15 2018

I am having a hard time reading Vincent's comment. 

Stefan, we will know tomorrow. If the tests still fail with the revert I will reland the patches.
He is saying  "npcx: CEC: Send CEC message in mkbp event" which only touches EC-firmware should not even be in this test, but ""CHROMIUM: mkbp: Increase maximum mkbp event size" which is in the kernel does. 

Also there is some dependency between them, so having the second one without the first one has some special handling in the code.

Comment 20 by pwang@chromium.org, May 16 2018

Owner: pwang@chromium.org
Status: Verified (was: Assigned)
R68-10682.0.0 passed on kevin/bob.
https://stainless.corp.google.com/search?view=matrix&row=model&col=build&first_date=2018-05-10&last_date=2018-05-16&test=graphics_PerfControl%7Cvideo_HangoutHardwarePerf&board=bob%7Ckevin&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false

Checking some of the DUT in the lab. It looks like the situation is sane.
$ ssh chromeos2-row6-rack3-host1.cros
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
27050 root      20   0    2448   1536   1396 R  61.9  0.0   0:06.13 tar

$ ssh chromeos6-row2-rack24-host11.cros
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
17258 chronos   12  -8  574536 177852  83436 S  15.0  4.6   7:20.24 chrome

I'll close this for now.
FYI, my kernel commit was the problem. I have a fixed version out for review now with some comments in the commit message on what went wrong:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1061879
Project Member

Comment 22 by bugdroid1@chromium.org, May 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/ec/+/86734119fcba3909f02ab294ae3b2b31540c17a7

commit 86734119fcba3909f02ab294ae3b2b31540c17a7
Author: Stefan Adolfsson <sadolfsson@chromium.org>
Date: Wed May 16 23:43:03 2018

Reland "npcx: CEC: Send CEC message in mkbp event"

This reverts commit f139d3a0ca9215b5b5bb2abc1f120ff6171036c9.

Reason for revert: Verified that the problem is in the kernel, not EC.

Original change's description:
> Revert "npcx: CEC: Send CEC message in mkbp event"
>
> This reverts commit 74b5a2ccb58739d4e21fdeb36e40fe01c0ca7ede.
>
> Suspected to have broken perf tests by keeping a CPU busy on kevin/bob.
>
> BUG= chromium:842873 , b:76467407
>
> Change-Id: Iebbbb4623116840b851656e3ec28e75dc99cff79
> Reviewed-on: https://chromium-review.googlesource.com/1060073
> Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
> Tested-by: Ilja H. Friedel <ihf@chromium.org>

Bug:  chromium:842873 , b:76467407
Change-Id: I7d8990b2b8901b7de08f190a993bec645bbdacd2
Reviewed-on: https://chromium-review.googlesource.com/1061854
Commit-Ready: Stefan Adolfsson <sadolfsson@chromium.org>
Tested-by: Stefan Adolfsson <sadolfsson@chromium.org>
Reviewed-by: Stefan Adolfsson <sadolfsson@chromium.org>
Reviewed-by: Vincent Palatin <vpalatin@chromium.org>

[modify] https://crrev.com/86734119fcba3909f02ab294ae3b2b31540c17a7/include/ec_commands.h
[modify] https://crrev.com/86734119fcba3909f02ab294ae3b2b31540c17a7/common/mkbp_event.c
[modify] https://crrev.com/86734119fcba3909f02ab294ae3b2b31540c17a7/util/ectool.c
[modify] https://crrev.com/86734119fcba3909f02ab294ae3b2b31540c17a7/chip/npcx/cec.c

Project Member

Comment 23 by bugdroid1@chromium.org, May 18 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ed5018b2daae9fabbf95d819b6fb14c968372dc8

commit ed5018b2daae9fabbf95d819b6fb14c968372dc8
Author: Stefan Adolfsson <sadolfsson@chromium.org>
Date: Fri May 18 02:35:02 2018

CHROMIUM: mkbp: Increase maximum mkbp event size

Having a 16 byte mkbp event size makes it possible to send CEC
messages from the EC to the AP directly inside the mkbp event
instead of first doing a notification and then a read.

This version of the change includes a fix for a failed
performance auto-test on bob/kevin. The problem was
that the previous version did not properly fall back
to v0 of the mkbp event since the return value that
it looked on was that of the transfer, not that of
the protocol error.

Signed-off-by: Stefan Adolfsson <sadolfsson@chromium.org>

BRANCH=none
TEST=Verified that bob falls back to v0 version of the mkbp
event and that Fizz works with both v0 and v1 (using some
extra debug prints). Also made sure there is nothing strange
when running "top" on bob, and that bobs keyboard works.
BUG= chromium:842873 , b:76467407
Change-Id: I39e96cd17213f6b702707b3448859fafa682f146
Reviewed-on: https://chromium-review.googlesource.com/1061879
Commit-Ready: Stefan Adolfsson <sadolfsson@chromium.org>
Tested-by: Stefan Adolfsson <sadolfsson@chromium.org>
Reviewed-by: Stefan Adolfsson <sadolfsson@chromium.org>
Reviewed-by: Vincent Palatin <vpalatin@chromium.org>

[modify] https://crrev.com/ed5018b2daae9fabbf95d819b6fb14c968372dc8/include/linux/mfd/cros_ec.h
[modify] https://crrev.com/ed5018b2daae9fabbf95d819b6fb14c968372dc8/drivers/mfd/cros_ec.c
[modify] https://crrev.com/ed5018b2daae9fabbf95d819b6fb14c968372dc8/include/linux/mfd/cros_ec_commands.h

Project Member

Comment 24 by bugdroid1@chromium.org, Jun 19 2018

Labels: merge-merged-factory-fizz-10167.B-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a9d2c1832967c9f1864391bef407c328281e44ff

commit a9d2c1832967c9f1864391bef407c328281e44ff
Author: Stefan Adolfsson <sadolfsson@chromium.org>
Date: Tue Jun 19 20:10:11 2018

CHROMIUM: mkbp: Increase maximum mkbp event size

Having a 16 byte mkbp event size makes it possible to send CEC
messages from the EC to the AP directly inside the mkbp event
instead of first doing a notification and then a read.

This version of the change includes a fix for a failed
performance auto-test on bob/kevin. The problem was
that the previous version did not properly fall back
to v0 of the mkbp event since the return value that
it looked on was that of the transfer, not that of
the protocol error.

Signed-off-by: Stefan Adolfsson <sadolfsson@chromium.org>

BRANCH=none
TEST=Verified that bob falls back to v0 version of the mkbp
event and that Fizz works with both v0 and v1 (using some
extra debug prints). Also made sure there is nothing strange
when running "top" on bob, and that bobs keyboard works.
BUG= chromium:842873 , b:76467407
Change-Id: I39e96cd17213f6b702707b3448859fafa682f146
Reviewed-on: https://chromium-review.googlesource.com/1061879
Commit-Ready: Stefan Adolfsson <sadolfsson@chromium.org>
Tested-by: Stefan Adolfsson <sadolfsson@chromium.org>
Reviewed-by: Stefan Adolfsson <sadolfsson@chromium.org>
Reviewed-by: Vincent Palatin <vpalatin@chromium.org>
(cherry picked from commit ed5018b2daae9fabbf95d819b6fb14c968372dc8)
Reviewed-on: https://chromium-review.googlesource.com/1071670
Reviewed-by: Daisuke Nojiri <dnojiri@chromium.org>
Commit-Queue: Stefan Adolfsson <sadolfsson@chromium.org>

[modify] https://crrev.com/a9d2c1832967c9f1864391bef407c328281e44ff/include/linux/mfd/cros_ec.h
[modify] https://crrev.com/a9d2c1832967c9f1864391bef407c328281e44ff/drivers/mfd/cros_ec.c
[modify] https://crrev.com/a9d2c1832967c9f1864391bef407c328281e44ff/include/linux/mfd/cros_ec_commands.h

Sign in to add a comment