New issue
Advanced search Search tips

Issue 620806 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Feature



Sign in to add a comment

Make platform_ServoPowerStateController stricter

Reported by jrbarnette@chromium.org, Jun 16 2016

Issue description

Recently a servod bug was uncovered in selected Braswell-based boards.
The problem behavior (apparently) is triggered by this sequence:
    power_state:rec power_state:off power_state:on

In some cases, at the end of the sequence, the DUT would power on
in recovery mode, not normal mode.

The platform_ServoPowerStateController test needs to be stricter,
to make some sort of attempt to find problems of this sort.
The basic idea would be to change certain test sequences:
  * For tests that include this sequence:
        power_state:off power_state:on
    We need an option that also tests this alternate sequence:
        power_state:off power_state:rec power_state:off power_state:on
  * For tests with this sequence:
        power_state:off power_state:rec
    We need an option that also tests this sequence:
        power_state:off power_state:on power_state:off power_state:rec

 
Cc: dlaurie@chromium.org
Additionally, we should consider enhancing any sequence
that includes 'power_state:on power_state:off' to also test
'power_state:reset power_state:off'.  Probably, there are other
substitutions to consider, as well.

Cc: -dlaurie@chromium.org
Components: -Infra>Client>ChromeOS
Owner: dlaurie@chromium.org
Status: Assigned (was: Available)

Comment 5 Deleted

Owner: reinauer@chromium.org
Owner: aaboagye@chromium.org
Status: Assigned (was: Available)
Status: Started (was: Assigned)
I took a look at the test yesterday and decided that I would pretty much re-write the whole thing. I'd like to gather some feedback though as I'm creating the CL.

I think a better approach is to look at the possible transitions. Therefore, my intention of the rewrite is to really hammer all the possible transitions. As it stands now, it seems to me that platform_ServoPowerStateController is only testing a subset of possible transitions. Also, it seems that more cases are just injected as failures are encountered. (Overall, that last part is a good thing. We want to have tests that catch real failures.)

                                       t1
              +-----------------------------------------------------+
              |                                                     |
              |             t2                      t3              |
              |  +--------------------+ +------------------------+  |
              |  |                    | |                        |  |
          +---+--++               +---v-+-+                   +--v--v-+
   +------>       |               |       |                   |       +-----+
t0 |      |  Off  |        +------>  On   |                   |  Rec  |     |t6
   +------+       |      t4|      |       |                   |       <-----+
          |       |        +------+       |                   |       |
          +--^---^+               +--+----+                   +--+----+
             |   |                   |                           |
             |   +-------------------+                           |
             |            t5                                     |
             |                                                   |
             +---------------------------------------------------+
                                      t7

Here's a state machine showing the possible transitions with the USB unplugged. The approach I'm taking is to individually and hermetically test transitions to each of the states.

For example, one suite is "test transitions to off". The only transitions to get the off state are t0, t5, and t7. This yields the following test sequences.

(t7): Reset -> Rec -> Off
(t0): Reset -> Off -> Off
(t5): Reset -> Off

For transitions to on, it's a little more interesting. If we look at the t2 transition, we can have the following test sequences.

  - Reset -> Off -> On -> Rec -> Off -> On (enter 'off' thru t7)
  - Reset -> Off -> Off-> On (enter 'off' thru t0)
  - Reset -> Off -> On (enter 'off' thru t5)

Anyways, hopefully you get the gist of what I'm thinking now. Any thoughts on this? One drawback I see is that it makes the test a lot longer, but I think it's more thorough. I'm currently running some tests and will upload a CL once it's all fleshed out.
We should talk offline:  Not all transitions are valid to
test, because the behavior is meant to be unspecified.
Most importantly, 'on' and 'rec' are only valid after
'off'.

Ah okay; that'd be great. I was not aware of that.
With those constraints, the only valid transitions are the following:

                                       t1
              +-----------------------------------------------------+
              |                                                     |
              |             t2                                      |
              |  +--------------------+                             |
              |  |                    |                             |
          +---+--++               +---v---+                   +-----v-+
   +------>       |               |       |                   |       |
t0 |      |  Off  |               |  On   |                   |  Rec  |
   +------+       |               |       |                   |       |
          |       |               |       |                   |       |
          +--^---^+               +--+----+                   +--+----+
             |   |                   |                           |
             |   +-------------------+                           |
             |            t3                                     |
             |                                                   |
             +---------------------------------------------------+
                                      t4

Today, the test runs through the following transitions:
Off -> Off -> On -> Off -> On -> Off -> Reset -> Reset

With USB stick:
Off -> remove USB -> Rec -> insert USB
Off -> remove USB -> Rec -> Off -> insert USB -> On

The 'Rec' and 'On' states only have a single entry point, which is the 'Off' state. However, there are three ways to enter the 'Off' state. I had come up with some transitions to test, accounting for the various ways to enter the 'Off' state, however even with the constraints, the test ends up taking almost an hour to run through. To be clear, that's almost an hour to test with the USB stick plugged in, which also tests the unplugged cases.

Each transition can take at least 30 seconds to verify and power_state:off and power_state:rec can at least ~13 and ~30s respectively.

I haven't taken a look at this in awhile because other higher priority things came up (and are ongoing). If I recall correctly, the issue was that the test was still taking too long to be completely thorough.

Checking my notes, here's the list of prospective sequences that I came up with:

Transitions to 'Off':
 - Off -> Off (t0 entry)
 - Off -> Rec -> Off (t4 entry)
 - Off -> On -> Off & Reset -> Off (t3 entry)


Transitions to 'On' (t2):
 - Off -> On -> Off -> On (t3 entry)
 - Off -> Rec -> Off -> On (t4 entry)
 - Off -> Off -> On (t0 entry)
 - Reset


Transitions to 'Rec' (t1):
 - Off ->  Off -> Rec (t0 entry)
 - Off -> Rec -> Off -> Rec (t4 entry)
 - Off -> On -> Off -> Rec  & Reset -> Off -> Rec (t3 entry)


For 'Rec' with a USB stick inserted:
 - Reset
 - Off -> On
 - Off -> Off -> On
 - Off -> remove USB -> Reset -> Off -> Rec -> insert USB

Status: Assigned (was: Started)
I took a look at wmatrix to try and see how often platform_ServoPowerStateController is run, but I didn't see any runs at all. If it's not regularly run, then maybe it's not _so_ bad if it runs a little longer. Plus, it can be sped up a bit by making the final "off" state of a previous case, the starting "off" state for the next case.

Changing status to Assigned, since I'm not actively working on this right now. If there aren't any big objections, I'll probably proceed with the cases in c#14.
Components: Infra>Client>ChromeOS
Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>Test

Sign in to add a comment