New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 726445 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: Jun 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug-Regression



Sign in to add a comment

gale-paladin: jetstream_PrioritizedDevice failing with "Mismatched discipline list lengths"

Project Member Reported by grundler@chromium.org, May 25 2017

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/gale-paladin

Specifically:
https://uberchromegw.corp.google.com/i/chromeos/builders/gale-paladin/builds/2661/steps/HWTest%20%5Bjetstream_cq%5D/logs/stdio

...
 jetstream_GuestFirewall                     [ PASSED ]
  jetstream_PrioritizedDevice                 [ FAILED ]
  jetstream_PrioritizedDevice                   FAIL: Mismatched discipline list lengths. Expected: "['qdisc mq 0: dev wan0 root', 'qdisc pfifo_fast 0: dev wan0 parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo_fast 0: dev wan0 parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo_fast 0: dev wan0 parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc prio 1: dev wan0 parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo 10: dev wan0 parent 1:1 limit 1000p', 'qdisc htb 20: dev wan0 parent 1:2 r2q 10 default 1 direct_packets_stat [0-9]+ direct_qlen 1000']", actual: "['qdisc mq 0: dev wan0 root ', 'qdisc pfifo_fast 0: dev wan0 parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo_fast 0: dev wan0 parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo_fast 0: dev wan0 parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo_fast 0: dev wan0 parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1']"
  jetstream_PrioritizedDevice                   retry_count: 2
  
  Suite timings:
  Downloads started at 2017-05-25 09:13:07
  Payload downloads ended at 2017-05-25 09:13:16
  Suite started at 2017-05-25 09:13:42
  Artifact downloads ended (at latest) at 2017-05-25 09:13:45
  Testing started at 2017-05-19 12:44:17
  Testing ended at 2017-05-19 12:46:35
  
  
  Links to test logs:
  Suite job http://cautotest/tko/retrieve_logs.cgi?job=/results/119539682-chromeos-test/
  jetstream_GuestInterfaces http://cautotest/tko/retrieve_logs.cgi?job=/results/119539684-chromeos-test/
  jetstream_ApiServerAttestation http://cautotest/tko/retrieve_logs.cgi?job=/results/119539685-chromeos-test/
  jetstream_DiagnosticReport http://cautotest/tko/retrieve_logs.cgi?job=/results/119539686-chromeos-test/
  jetstream_LocalApi http://cautotest/tko/retrieve_logs.cgi?job=/results/119539687-chromeos-test/
  jetstream_ApiServerDeveloperConfiguration http://cautotest/tko/retrieve_logs.cgi?job=/results/119539688-chromeos-test/
  jetstream_GcdCommands http://cautotest/tko/retrieve_logs.cgi?job=/results/119539689-chromeos-test/
  jetstream_NetworkInterfaces http://cautotest/tko/retrieve_logs.cgi?job=/results/119539691-chromeos-test/
  jetstream_WanCustomDns http://cautotest/tko/retrieve_logs.cgi?job=/results/119539692-chromeos-test/
  jetstream_GuestFirewall http://cautotest/tko/retrieve_logs.cgi?job=/results/119539693-chromeos-test/
  jetstream_PrioritizedDevice http://cautotest/tko/retrieve_logs.cgi?job=/results/119541037-chromeos-test/
 ...


In response to an email thread, I got two replies worth repeating here regarding this specific issue.

kkunduru wrote:
| discipline list above indicates the number of queueing rules that are established
| for an interface. In this example if you see the tool is expecting 7 rules but can
| only find 5, hence failing. The error could be due to jetstream_prioritizedDevice
| running before firewall rules are completely set up.
|
| How can we reproduce this and how is it different from running general jetstream sanity?

lgoodby wrote:
| There was recently a change to the provisioning process in the destiny lab, which
| could change the timing of when tests start running.
| That might explain why we see this in the CQ, but not in the Jetstream lab.
|
| I am looking into making the boot up process more stable in the CQ, but it
| may take some time to get this in.
 
Labels: -Type-Bug -Pri-3 OS-Chrome Pri-2 Type-Bug-Regression

Comment 2 by ra...@google.com, May 25 2017

Do you have any more info about that change Laurence? We should inspect it to see if we should ask the infra team for a fix or change our test to block until the firewall is configured properly
Re #2, it looks to me like the new provisioning flow preceded the failure, so I don't think it is related.

  
The failure details:

05/19 19:44:51.899 WARNI|              test:0621| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/common_lib/test.py", line 615, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/common_lib/test.py", line 823, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/common_lib/test.py", line 476, in execute
    dargs)
  File "/usr/local/autotest/common_lib/test.py", line 353, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/common_lib/test.py", line 386, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/tests/jetstream_PrioritizedDevice/jetstream_PrioritizedDevice.py", line 43, in run_once
    self._Verify(self._normal_disciplines, 'Failed in normal mode.')
  File "/usr/local/autotest/tests/jetstream_PrioritizedDevice/jetstream_PrioritizedDevice.py", line 115, in _Verify
    (expected_disciplines, wan0_queuing_disciplines))
TestFail: Mismatched discipline list lengths. Expected: "['qdisc mq 0: dev wan0 root', 'qdisc pfifo_fast 0: dev wan0 parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo_fast 0: dev wan0 parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo_fast 0: dev wan0 parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc prio 1: dev wan0 parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo 10: dev wan0 parent 1:1 limit 1000p', 'qdisc htb 20: dev wan0 parent 1:2 r2q 10 default 1 direct_packets_stat [0-9]+ direct_qlen 1000']", actual: "['qdisc mq 0: dev wan0 root ', 'qdisc pfifo_fast 0: dev wan0 parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo_fast 0: dev wan0 parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo_fast 0: dev wan0 parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1', 'qdisc pfifo_fast 0: dev wan0 parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1']"

The failure occurred in the first part of the test, "normal mode".

Reformatted for clarity, the disciplines are:

Expected:

['qdisc mq 0: dev wan0 root',
 'qdisc pfifo_fast 0: dev wan0 parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1',
 'qdisc pfifo_fast 0: dev wan0 parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1',
 'qdisc pfifo_fast 0: dev wan0 parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1',
 'qdisc prio 1: dev wan0 parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1',
 'qdisc pfifo 10: dev wan0 parent 1:1 limit 1000p',
 'qdisc htb 20: dev wan0 parent 1:2 r2q 10 default 1 direct_packets_stat [0-9]+ direct_qlen 1000']

Actual:

['qdisc mq 0: dev wan0 root ',
 'qdisc pfifo_fast 0: dev wan0 parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1',
 'qdisc pfifo_fast 0: dev wan0 parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1',
 'qdisc pfifo_fast 0: dev wan0 parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1',
 'qdisc pfifo_fast 0: dev wan0 parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1']

Last known passing: R60-9565.0.0, 2017-05-19 01:33:14
First known failing: R60-9566.0.0, 2017-05-19 03:35:17

Diffs: https://crosland.corp.google.com/log/9565.0.0..9566.0.0
No obvious suspects?

Passing run:
 https://ubercautotest.corp.google.com/new_tko/#tab_id=test_detail_view&object_id=484042732

Failing run:
 https://ubercautotest.corp.google.com/new_tko/#tab_id=test_detail_view&object_id=484113552

Actually, the gap between last passing and first failing is larger since I was including release candidate builds in that.

Excluding rc builds:

Last passing: R60-9564.0.0
First failing: R60-9576.0.0

Diff: https://crosland.corp.google.com/log/9564.0.0..9576.0.0

Cc: -kkunduru@chromium.org lgoo...@chromium.org
Owner: kkunduru@chromium.org
Narrowed failure to R60-9567.0.0.

Diff: https://crosland.corp.google.com/log/9566.0.0..9567.0.0

The failure was likely introduced by https://chrome-internal-review.googlesource.com/c/378412/

The reason for the test failing in the Destiny lab but not in the Talyn lab is possibly due to the devices in Destiny not having WAN connections, while Talyn lab devices do have WAN connections.

Handing over to Kishan.

Project Member

Comment 9 by bugdroid1@chromium.org, Jun 10 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/ap-daemons/+/b9dd13a3e1ca9342a95437117429ae2222f03070

commit b9dd13a3e1ca9342a95437117429ae2222f03070
Author: Laurence Goodby <lgoodby@google.com>
Date: Sat Jun 10 02:35:55 2017

Owner: lgoo...@chromium.org
Status: Fixed (was: Untriaged)
This test has been removed from the commit queue.

Cc: sureshraj@chromium.org mojahsu@chromium.org dgarr...@chromium.org bleung@chromium.org
 Issue 730104  has been merged into this issue.

Comment 12 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment