New issue
Advanced search Search tips

Issue 668326 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 10
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocking:
issue 646626



Sign in to add a comment

platform_CryptohomeStress: suspends timing out and affected by fio parameters

Project Member Reported by asavery@chromium.org, Nov 24 2016

Issue description

What steps will reproduce the problem?
(1) Run the edited platform_CryptohomeStress.surfing
Changed files here: https://chromium-review.googlesource.com/#/c/399893/ to fix some of  issue 646626 .

What is the expected result?
The tests will pass.

What happens instead?
The power_SuspendStress part of the test either fails with ERROR: Sanity check failed: did not try to suspend, or passes but gets a warning that most of the iterations had non-fatal suspend failures due to timeouts.

This only seems to occur when the platform_CryptohomeFio part of the test is given 'crypto' for the disk_configs parameter. The actual mounting and unmounting of the cryptohome vault does not seem to be the problem, since changing the tmpdir from the user path to the tmpdir used with the other parameters seems to get around the problem. It seems like the platform_CryptohomeFio test is interfering with the power_SuspendStress test it uses the user path but the assumption is that it will just run between suspends regardless of the tmpdir.
 
Cc: gwendal@chromium.org snanda@chromium.org
I have attached logs that show this problem. I asked Daniel Erat about this issue and he said that there might be a delay that we could increase, or it might make sense to have powerd_suspend not run sync before suspending since that could take a long time, which is what appears to be happening here in the messages log: 

2017-02-01T10:17:22.520443-08:00 NOTICE powerd_suspend[3878]: Going to suspend-to-RAM state: args=--suspend_duration=-1 --nosuspend_to_idle --wakeup_count=5
2017-02-01T10:17:22.570124-08:00 NOTICE powerd_suspend[3932]: Explicit sync
2017-02-01T10:17:52.231514-08:00 WARNING crash_reporter[3965]: Could not load the device policy file.
2017-02-01T10:17:52.231959-08:00 WARNING crash_reporter[3965]: [user] Received crash notification for powerd_dbus_suspend[3848] sig 6, user 0 (developer build - not testing - always dumping)
2017-02-01T10:17:52.234247-08:00 INFO crash_reporter[3965]: State of crashed process [3848]: S (sleeping)
2017-02-01T10:17:52.239294-08:00 INFO metrics_daemon[1698]: [INFO:metrics_daemon.cc(396)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2017-02-01T10:17:52.292398-08:00 INFO crash_reporter[3965]: Stored minidump to /var/spool/crash/powerd_dbus_suspend.20170201.101752.3848.dmp
2017-02-01T10:17:52.292715-08:00 INFO crash_reporter[3965]: Leaving core file at /var/spool/crash/powerd_dbus_suspend.20170201.101752.3848.core due to developer image
2017-02-01T10:18:00.756300-08:00 INFO kernel: [   88.386016] last active wakeup source: 00:06
2017-02-01T10:18:00.758935-08:00 NOTICE powerd_suspend[3977]: Aborting suspend, wake event received
2017-02-01T10:18:00.762884-08:00 NOTICE powerd_suspend[3979]: wakeup_count is 6
2017-02-01T10:18:00.764888-08:00 NOTICE powerd_suspend[3980]: Cancel suspend at kernel
2017-02-01T10:18:00.774613-08:00 NOTICE powerd_suspend[3985]: Resume finished
2017-02-01T10:18:00.926468-08:00 INFO laptop-mode[4013]: Laptop mode 
2017-02-01T10:18:00.928493-08:00 INFO laptop-mode[4014]: enabled, 
2017-02-01T10:18:00.930683-08:00 INFO laptop-mode[4015]: not active [unchanged]
2017-02-01T10:18:02.549105-08:00 NOTICE autotest[4019]: 10:18:02.545 ERROR|     power_suspend:0511| SuspendTimeout(1): System took too long to suspend.
2017-02-01T10:18:56.266549-08:00 NOTICE powerd_suspend[4108]: Going to suspend-to-RAM state: args=--suspend_duration=-1 --nosuspend_to_idle --wakeup_count=6
powerd.20170201-101634
7.8 KB Download
messages
18.6 MB Download
client.0.DEBUG
56.1 KB Download
The wake even received is the alarm for the suspend, so it does appear that when using crypto it just takes too long to fully suspend, seeming to take a lot of time to sync. Increasing the timeout for the call to /usr/bin/powerd_dbus_suspend did not work, and it seems like this is because the increased timeout was not actually changing the alarm, so the wake event was still received before the suspend could complete and before the increased timeout time. I have tried increasing the actual suspend time, paired with an increased timeout and it seems promising. Even though changing the suspend time looks like it's working, it changes the timing of the overall stress test a bit since the fio test is only running when the device is not suspended. I'm trying to see how much the suspend actually needs to increase and trying to figure out the least intrusive way to make these adjustments.

Comment 3 by snanda@chromium.org, Feb 28 2017

Owner: asavery@chromium.org
Did changing the vm diry ratio and associated knobs not help?
It looked like it was helping at first, but as I tried to find the best place to set it, it stopped working. Even when I set it back down to 1000 the suspends all timed out. I will see if changing the associated values helps at all.

Comment 5 by snanda@chromium.org, Feb 28 2017

I would suggest continuing to go down the tweaking vm dirty thresholds & pdflush times since then the test itself needs to be changed and nothing outside of it (such as powerd) needs to.

My only worry would be whether in fact we do need to make changes to powerd's timeouts since something underlying has changed sufficiently.

Gwendal, your thoughts?
Note that for tpm 2.0, trunksd now performs actions on suspend (SuspendImminent from powerd), which are required for correct cryptohomed operation after resume.
So, I believe, this test should allow enough time for suspend.
Project Member

Comment 7 by bugdroid1@chromium.org, May 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/e5b5fa849448a63cf1b2ccd55df8b078037d65d8

commit e5b5fa849448a63cf1b2ccd55df8b078037d65d8
Author: Alexis Savery <asavery@chromium.org>
Date: Wed May 03 03:24:17 2017

platform_CryptohomeStress: fix fio and suspend stress interaction

Increase min_suspend to provide additional time to sync and suspend.
Make dirty_ratio 1 for the duration of the test to reduce the time
needed to sync.

BUG= chromium:668326 
TEST=Ran platform_CryptohomeStress and platform_CryptohomeStress.surfing

Change-Id: I5e199e1fca5c56fca6fc630482bb7e3f13e7e43d
Reviewed-on: https://chromium-review.googlesource.com/479677
Commit-Ready: Alexis Savery <asavery@chromium.org>
Tested-by: Alexis Savery <asavery@chromium.org>
Reviewed-by: Gwendal Grignou <gwendal@chromium.org>

[modify] https://crrev.com/e5b5fa849448a63cf1b2ccd55df8b078037d65d8/client/site_tests/platform_CryptohomeStress/control
[modify] https://crrev.com/e5b5fa849448a63cf1b2ccd55df8b078037d65d8/client/site_tests/platform_CryptohomeStress/platform_CryptohomeStress.py
[modify] https://crrev.com/e5b5fa849448a63cf1b2ccd55df8b078037d65d8/client/site_tests/platform_CryptohomeStress/control.surfing

Status: Assigned (was: Untriaged)
Triage nag: This Chrome OS bug has an owner but no component. Please add a component so that this can be tracked by the relevant team.
Components: OS>Systems>Security
Status: Fixed (was: Assigned)
The test is still a little flaky but appears to be for unrelated reasons - marking this fixed

Sign in to add a comment