New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 670842 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: Feb 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

gale: uninitialized spinlock accessed in pktgen_thread_worker

Project Member Reported by grundler@chromium.org, Dec 2 2016

Issue description

Google Wifi kernel crash reported by user (report_id: 48e356ef00000000)

<0>[292540.910693] BUG: spinlock bad magic on CPU#0, kpktgend_0/787
...
<4>[292540.918485] [<c0361020>] (spin_dump) from [<c0361094>] (spin_bug+0x34/0x38)
<4>[292540.918517] [<c0361094>] (spin_bug) from [<c0361134>] (do_raw_spin_lock+0x30/0x18c)
<4>[292540.918551] [<c0361134>] (do_raw_spin_lock) from [<c0920b1c>] (_raw_spin_lock_irqsave+0x38/0x40)
<4>[292540.918591] [<c0920b1c>] (_raw_spin_lock_irqsave) from [<c035a8fc>] (prepare_to_wait_event+0xa0/0x11c)
<4>[292540.918635] [<c035a8fc>] (prepare_to_wait_event) from [<bf4def48>] (pktgen_thread_worker+0x1a8/0x12d8 [pktgen])
<4>[292540.918689] [<bf4def48>] (pktgen_thread_worker [pktgen]) from [<c033860c>] (kthread+0xe8/0x100)
<4>[292540.918724] [<c033860c>] (kthread) from [<c03069b8>] (ret_from_fork+0x14/0x20)

This is likely already be fixed in UPSTREAM with the following three commits:

commit 7ba8bd75ddc6b041b5716dbb29e49df3e9cc2928
Author: Oleg Nesterov <oleg@redhat.com>
Date:   Tue Aug 4 18:33:34 2015 +0200

    net: pktgen: don't abuse current->state in pktgen_thread_worker()
    
    Commit 1fbe4b46caca "net: pktgen: kill the Wait for kthread_stop
    code in pktgen_thread_worker()" removed (in particular) the final
    __set_current_state(TASK_RUNNING) and I didn't notice the previous
    set_current_state(TASK_INTERRUPTIBLE). This triggers the warning
    in __might_sleep() after return.
    
    Afaics, we can simply remove both set_current_state()'s, and we
    could do this a long ago right after ef87979c273a2 "pktgen: better
    scheduler friendliness" which changed pktgen_thread_worker() to
    use wait_event_interruptible_timeout().


commit 1fbe4b46caca5b01b070af93d513031ffbcc480c
Author: Oleg Nesterov <oleg@redhat.com>
Date:   Wed Jul 8 21:42:13 2015 +0200

    net: pktgen: kill the "Wait for kthread_stop" code in pktgen_thread_worker()
    
    pktgen_thread_worker() doesn't need to wait for kthread_stop(), it
    can simply exit. Just pktgen_create_thread() and pg_net_exit() should
    do get_task_struct()/put_task_struct(). kthread_stop(dead_thread) is
    fine.
    

commit fecdf8be2d91e04b0a9a4f79ff06499a36f5d14f
Author: Oleg Nesterov <oleg@redhat.com>
Date:   Wed Jul 8 21:42:11 2015 +0200

    net: pktgen: fix race between pktgen_thread_worker() and kthread_stop()
    
    pktgen_thread_worker() is obviously racy, kthread_stop() can come
    between the kthread_should_stop() check and set_current_state().

None of these is currently present in chromeos-3.18 (and I assume also not in chromeos-3.14) branch(s).
 
Project Member

Comment 1 by bugdroid1@chromium.org, Jan 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/93ed9bc89533b71e48ce7e6355ac26edff377cf6

commit 93ed9bc89533b71e48ce7e6355ac26edff377cf6
Author: Grant Grundler <grundler@google.com>
Date: Thu Jan 12 00:12:38 2017

autotest: add parameters to "Pktgen Test"

The simple test for pktgen hard codes eth='eth0' and I need to
invoke pktgen with eth=wan0 for jetstream devices.

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=10000

Change-Id: Iad232c50566e5f06845b1f77a45e3f248d3e8853
Reviewed-on: https://chromium-review.googlesource.com/427263
Commit-Ready: Grant Grundler <grundler@chromium.org>
Tested-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Zhihong Yu <zhihongyu@chromium.org>
Reviewed-by: John Carey <ranix@google.com>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Dane Pollock <danepollock@google.com>

[modify] https://crrev.com/93ed9bc89533b71e48ce7e6355ac26edff377cf6/client/tests/pktgen/control

Cc: -cbook@chromium.org kyan@chromium.org dtor@chromium.org kkunduru@chromium.org
Status: Started (was: Assigned)
I've updated the pktgen autotest in order to try to reproduce the original problem (failed to do so) and get more confidence UPSTREAM pktgen changes don't break anything in chromeos-3.14 and -3.18 branches (appears to work correctly).

I will add kkunduru and dtor as reviewers. Since Chrome OS isn't using pktgen in production devices at this time, I expect this can be treated as FYI for dtor.

I've uploaded 13 CLs to chromeos-3.14 branch:
remote: New Changes:        
remote:   https://chromium-review.googlesource.com/433343 UPSTREAM: pktgen: fix out-of-bounds access in pgctrl_write()        
remote:   https://chromium-review.googlesource.com/433344 UPSTREAM: pktgen: simplify error handling in pgctrl_write()        
remote:   https://chromium-review.googlesource.com/433345 UPSTREAM: pktgen: document all supported flags        
remote:   https://chromium-review.googlesource.com/433346 UPSTREAM: pktgen: be friendly to LLTX devices        
remote:   https://chromium-review.googlesource.com/433347 UPSTREAM: pktgen: Use seq_puts() where seq_printf() is not needed        
remote:   https://chromium-review.googlesource.com/433348 UPSTREAM: pktgen: avoid expensive set_current_state() call in loop        
remote:   https://chromium-review.googlesource.com/433349 UPSTREAM: pktgen: RCU-ify "if_list" to remove lock in next_to_run()        
remote:   https://chromium-review.googlesource.com/433350 UPSTREAM: pktgen: remove unnecessary break after goto        
remote:   https://chromium-review.googlesource.com/433351 UPSTREAM: pktgen: add flag NO_TIMESTAMP to disable timestamping        
remote:   https://chromium-review.googlesource.com/433352 UPSTREAM: pktgen: Convert pr_warning to pr_warn        
remote:   https://chromium-review.googlesource.com/433353 UPSTREAM: net: pktgen: fix race between pktgen_thread_worker() and kthread_st...        
remote:   https://chromium-review.googlesource.com/433354 UPSTREAM: net: pktgen: kill the "Wait for kthread_stop" code in pktgen_thread...        
remote:   https://chromium-review.googlesource.com/433355 UPSTREAM: net: pktgen: don't abuse current->state in pktgen_thread_worker()     

And three CLs for chromeos-3.18 branch to reach the same "patch level":
remote: New Changes:        
remote:   https://chromium-review.googlesource.com/433384 UPSTREAM: net: pktgen: fix race between pktgen_thread_worker() and kthread_st...        
remote:   https://chromium-review.googlesource.com/433385 UPSTREAM: net: pktgen: kill the "Wait for kthread_stop" code in pktgen_thread...        
remote:   https://chromium-review.googlesource.com/433386 UPSTREAM: net: pktgen: don't abuse current->state in pktgen_thread_worker()

Both patch sets have been tested on respective Onhub/Gale HW for _wired_ interfaces only.
Project Member

Comment 3 by bugdroid1@chromium.org, Jan 27 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/5850a19f5339ad9614606f374604b92a915f0b28

commit 5850a19f5339ad9614606f374604b92a915f0b28
Author: Grant Grundler <grundler@google.com>
Date: Thu Jan 19 23:57:22 2017

autotest: add num_iterations to pktgen test

num_iterations allows one to invoke many runs of pktgen.
Two reasons I want to do this:
1) verify start/stop of the pktgen threads doesn't fail (original bug)
2) generate better histograms about TX packet transmission
latency.

Code is modeled after the implementation in network_EthernetStressPlug
test.

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" \
	--args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I23c9d5b84a0ac1fe9f69617e408de0e5deb92e65
Reviewed-on: https://chromium-review.googlesource.com/430386
Commit-Ready: Grant Grundler <grundler@chromium.org>
Tested-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Kishan Kunduru <kkunduru@chromium.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>

[modify] https://crrev.com/5850a19f5339ad9614606f374604b92a915f0b28/client/tests/pktgen/control
[modify] https://crrev.com/5850a19f5339ad9614606f374604b92a915f0b28/client/tests/pktgen/pktgen.py

Project Member

Comment 4 by bugdroid1@chromium.org, Jan 31 2017

Labels: merge-merged-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/92cb5cfc4d80cc7291c1c586842712bbee953a68

commit 92cb5cfc4d80cc7291c1c586842712bbee953a68
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Jul 08 19:42:11 2015

UPSTREAM: net: pktgen: fix race between pktgen_thread_worker() and kthread_stop()

pktgen_thread_worker() is obviously racy, kthread_stop() can come
between the kthread_should_stop() check and set_current_state().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fecdf8be2d91e04b0a9a4f79ff06499a36f5d14f)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I01b7b2070624e08a312b88b8a46a82185ae3df90
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433384
Reviewed-by: Kishan Kunduru <kkunduru@chromium.org>

[modify] https://crrev.com/92cb5cfc4d80cc7291c1c586842712bbee953a68/net/core/pktgen.c

Project Member

Comment 5 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/1ac394d9bd6818b87fa846253b7d1f198dfce0f0

commit 1ac394d9bd6818b87fa846253b7d1f198dfce0f0
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Jul 08 19:42:13 2015

UPSTREAM: net: pktgen: kill the "Wait for kthread_stop" code in pktgen_thread_worker()

pktgen_thread_worker() doesn't need to wait for kthread_stop(), it
can simply exit. Just pktgen_create_thread() and pg_net_exit() should
do get_task_struct()/put_task_struct(). kthread_stop(dead_thread) is
fine.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1fbe4b46caca5b01b070af93d513031ffbcc480c)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: Iae52a02d6e751684634fcf4ffb5efae024b3e1cb
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433385
Reviewed-by: Kishan Kunduru <kkunduru@chromium.org>

[modify] https://crrev.com/1ac394d9bd6818b87fa846253b7d1f198dfce0f0/net/core/pktgen.c

Project Member

Comment 6 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/dd43d41254d558c89a38bdeafbfd16aa81211518

commit dd43d41254d558c89a38bdeafbfd16aa81211518
Author: Oleg Nesterov <oleg@redhat.com>
Date: Tue Aug 04 16:33:34 2015

UPSTREAM: net: pktgen: don't abuse current->state in pktgen_thread_worker()

Commit 1fbe4b46caca "net: pktgen: kill the Wait for kthread_stop
code in pktgen_thread_worker()" removed (in particular) the final
__set_current_state(TASK_RUNNING) and I didn't notice the previous
set_current_state(TASK_INTERRUPTIBLE). This triggers the warning
in __might_sleep() after return.

Afaics, we can simply remove both set_current_state()'s, and we
could do this a long ago right after ef87979c273a2 "pktgen: better
scheduler friendliness" which changed pktgen_thread_worker() to
use wait_event_interruptible_timeout().

Reported-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7ba8bd75ddc6b041b5716dbb29e49df3e9cc2928)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I7b295d9699b63cc84974c25d0da084134185684c
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433386
Reviewed-by: Kishan Kunduru <kkunduru@chromium.org>

[modify] https://crrev.com/dd43d41254d558c89a38bdeafbfd16aa81211518/net/core/pktgen.c

Project Member

Comment 7 by bugdroid1@chromium.org, Jan 31 2017

Labels: merge-merged-chromeos-3.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/45ee3d1f49b5493d7cabb9dcae00679309919989

commit 45ee3d1f49b5493d7cabb9dcae00679309919989
Author: Mathias Krause <minipli@googlemail.com>
Date: Fri Feb 21 20:38:34 2014

UPSTREAM: pktgen: fix out-of-bounds access in pgctrl_write()

If a privileged user writes an empty string to /proc/net/pktgen/pgctrl
the code for stripping the (then non-existent) '\n' actually writes the
zero byte at index -1 of data[]. The then still uninitialized array will
very likely fail the command matching tests and the pr_warning() at the
end will therefore leak stack bytes to the kernel log.

Fix those issues by simply ensuring we're passed a non-empty string as
the user API apparently expects a trailing '\n' for all commands.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 20b0c718c3bb122107bebadbb8ecf4bab76fb392)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I4c4f7906d68827855c8cc32ca70df196aeb19c21
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433343
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/45ee3d1f49b5493d7cabb9dcae00679309919989/net/core/pktgen.c

Project Member

Comment 8 by bugdroid1@chromium.org, Jan 31 2017

Labels: merge-merged-chromeos-3.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/63e3f74f0ce99500365686dc285e8cd78838f6dd

commit 63e3f74f0ce99500365686dc285e8cd78838f6dd
Author: Mathias Krause <minipli@googlemail.com>
Date: Fri Feb 21 20:38:35 2014

UPSTREAM: pktgen: simplify error handling in pgctrl_write()

The 'out' label is just a relict from previous times as pgctrl_write()
had multiple error paths. Get rid of it and simply return right away
on errors.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0945574750f3040a2309d960a569215598a64672)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I8f71b9b87a32180540a435120dc086e1c32005c1
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433344
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/63e3f74f0ce99500365686dc285e8cd78838f6dd/net/core/pktgen.c

Project Member

Comment 9 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/d5c7f61bcfaed8251d8090c1fa2de6a2409e781e

commit d5c7f61bcfaed8251d8090c1fa2de6a2409e781e
Author: Mathias Krause <minipli@googlemail.com>
Date: Fri Feb 21 20:38:36 2014

UPSTREAM: pktgen: document all supported flags

The documentation misses a few of the supported flags. Fix this. Also
respect the dependency to CONFIG_XFRM for the IPSEC flag.

Cc: Fan Du <fan.du@windriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 72f8e06f3ea022d9002b825d57b9d897b9dbe6be)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: Idab3b885f84025f6f003d86fa45db93c0d1a9de8
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433345
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/d5c7f61bcfaed8251d8090c1fa2de6a2409e781e/net/core/pktgen.c
[modify] https://crrev.com/d5c7f61bcfaed8251d8090c1fa2de6a2409e781e/Documentation/networking/pktgen.txt

Project Member

Comment 10 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/16f7d05e2e317c8f15a975c3d4f661c03ac91828

commit 16f7d05e2e317c8f15a975c3d4f661c03ac91828
Author: Daniel Borkmann <dborkman@redhat.com>
Date: Fri Apr 11 11:22:00 2014

UPSTREAM: pktgen: be friendly to LLTX devices

Similarly to commit 43279500deca ("packet: respect devices with
LLTX flag in direct xmit"), we can basically apply the very same
to pktgen. This will help testing against LLTX devices such as
dummy driver (or others), which only have a single netdevice txq
and would otherwise require locking their txq from pktgen side
while e.g. in dummy case, we would not need any locking. Fix this
by making use of HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will
be respected.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0f2eea4b7e29ab693b07f6eedf8e87a0c11b8b42)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I192e48b8116b6393bed767a2584f871b5facdcd5
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433346
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/16f7d05e2e317c8f15a975c3d4f661c03ac91828/net/core/pktgen.c

Project Member

Comment 11 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/6aeb8c0983255862d65af3d4490b8f95af4cdd64

commit 6aeb8c0983255862d65af3d4490b8f95af4cdd64
Author: Thomas Graf <tgraf@suug.ch>
Date: Fri May 16 21:28:54 2014

UPSTREAM: pktgen: Use seq_puts() where seq_printf() is not needed

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 97dc48e220282742e4c4a8fde81bdd4dbe011f1e)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I470fd0a476584baa8085896e7d40d5be4b12542a
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433347
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/6aeb8c0983255862d65af3d4490b8f95af4cdd64/net/core/pktgen.c

Project Member

Comment 12 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/265e5e390e039f4c6dc39b82bc4da49c52ef45aa

commit 265e5e390e039f4c6dc39b82bc4da49c52ef45aa
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu Jun 26 11:16:49 2014

UPSTREAM: pktgen: avoid expensive set_current_state() call in loop

Avoid calling set_current_state() inside the busy-loop in
pktgen_thread_worker().  In case of pkt_dev->delay, then it is still
used/enabled in pktgen_xmit() via the spin() call.

The set_current_state(TASK_INTERRUPTIBLE) uses a xchg, which implicit
is LOCK prefixed.  I've measured the asm LOCK operation to take approx
8ns on this E5-2630 CPU.  Performance increase corrolate with this
measurement.

Performance data with CLONE_SKB==100000, rx-usecs=30:
 (single CPU performance, ixgbe 10Gbit/s, E5-2630)
 * Prev:  5454050 pps --> 183.35ns (1/5454050*10^9)
 * Now:   5684009 pps --> 175.93ns (1/5684009*10^9)
 * Diff:  +229959 pps -->  -7.42ns

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit baac167b706600ebe7158acaeb7c489ae9d0bb8b)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: Ibfd29ded636afa3915037580f6dbd67c54df49f2
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433348
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/265e5e390e039f4c6dc39b82bc4da49c52ef45aa/net/core/pktgen.c

Project Member

Comment 13 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/46472fdc6dd544c030951b9e9c8f17cd9d515315

commit 46472fdc6dd544c030951b9e9c8f17cd9d515315
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu Jun 26 11:16:59 2014

UPSTREAM: pktgen: RCU-ify "if_list" to remove lock in next_to_run()

The if_lock()/if_unlock() in next_to_run() adds a significant
overhead, because its called for every packet in busy loop of
pktgen_thread_worker().  (Thomas Graf originally pointed me
at this lock problem).

Removing these two "LOCK" operations should in theory save us approx
16ns (8ns x 2), as illustrated below we do save 16ns when removing
the locks and introducing RCU protection.

Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30:
 (single CPU performance, ixgbe 10Gbit/s, E5-2630)
 * Prev   : 5684009 pps --> 175.93ns (1/5684009*10^9)
 * RCU-fix: 6272204 pps --> 159.43ns (1/6272204*10^9)
 * Diff   : +588195 pps --> -16.50ns

To understand this RCU patch, I describe the pktgen thread model
below.

In pktgen there is several kernel threads, but there is only one CPU
running each kernel thread.  Communication with the kernel threads are
done through some thread control flags.  This allow the thread to
change data structures at a know synchronization point, see main
thread func pktgen_thread_worker().

Userspace changes are communicated through proc-file writes.  There
are three types of changes, general control changes "pgctrl"
(func:pgctrl_write), thread changes "kpktgend_X"
(func:pktgen_thread_write), and interface config changes "etcX@N"
(func:pktgen_if_write).

Userspace "pgctrl" and "thread" changes are synchronized via the mutex
pktgen_thread_lock, thus only a single userspace instance can run.
The mutex is taken while the packet generator is running, by pgctrl
"start".  Thus e.g. "add_device" cannot be invoked when pktgen is
running/started.

All "pgctrl" and all "thread" changes, except thread "add_device",
communicate via the thread control flags.  The main problem is the
exception "add_device", that modifies threads "if_list" directly.

Fortunately "add_device" cannot be invoked while pktgen is running.
But there exists a race between "rem_device_all" and "add_device"
(which normally don't occur, because "rem_device_all" waits 125ms
before returning). Background'ing "rem_device_all" and running
"add_device" immediately allow the race to occur.

The race affects the threads (list of devices) "if_list".  The if_lock
is used for protecting this "if_list".  Other readers are given
lock-free access to the list under RCU read sections.

Note, interface config changes (via proc) can occur while pktgen is
running, which worries me a bit.  I'm assuming proc_remove() takes
appropriate locks, to assure no writers exists after proc_remove()
finish.

I've been running a script exercising the race condition (leading me
to fix the proc_remove order), without any issues.  The script also
exercises concurrent proc writes, while the interface config is
getting removed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8788370a1d4b1d89efc1aea1b13ea5e5bfe10fde)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: Ied36d05b322ff5625502ba389abfc4539c262c35
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433349
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/46472fdc6dd544c030951b9e9c8f17cd9d515315/net/core/pktgen.c

Project Member

Comment 14 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/5ca9b8dcf499c748a494bdc3663610df2ac99b8b

commit 5ca9b8dcf499c748a494bdc3663610df2ac99b8b
Author: Fabian Frederick <fabf@skynet.be>
Date: Mon Jul 14 16:30:56 2014

UPSTREAM: pktgen: remove unnecessary break after goto

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit aee944ddf893e14a148af0f4e857c11887660053)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I2b2b77417908c46d898b66e9042055504359ff36
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433350
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/5ca9b8dcf499c748a494bdc3663610df2ac99b8b/net/core/pktgen.c

Project Member

Comment 15 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/556ac8d4b1889327767bdcef99a2192d73fe3b8f

commit 556ac8d4b1889327767bdcef99a2192d73fe3b8f
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu Aug 28 16:14:47 2014

UPSTREAM: pktgen: add flag NO_TIMESTAMP to disable timestamping

Then testing the TX limits of the stack, then it is useful to
be-able to disable the do_gettimeofday() timetamping on every packet.

This implements a pktgen flag NO_TIMESTAMP which will disable this
call to do_gettimeofday().

The performance change on (my system E5-2695) with skb_clone=0, goes
from TX 2,423,751 pps to 2,567,165 pps with flag NO_TIMESTAMP. Thus,
the cost of do_gettimeofday() or saving is approx 23 nanosec.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit afb84b6261841f8ab387e267e748236fa805bea0)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I5aac3f38972691a1e63e2c29142d0fe29a3c5142
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433351
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/556ac8d4b1889327767bdcef99a2192d73fe3b8f/net/core/pktgen.c

Project Member

Comment 16 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/6e00f8d3a5b7806e56212f89c7cca8637a354054

commit 6e00f8d3a5b7806e56212f89c7cca8637a354054
Author: Joe Perches <joe@perches.com>
Date: Wed Sep 10 04:17:30 2014

UPSTREAM: pktgen: Convert pr_warning to pr_warn

Use the more common pr_warn.
Realign arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 294a0b7f3148e2a4e916965a6d14838e08143ba8)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I0929890d1e82cdd6e3b086e0b71411e318e2b664
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433352
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/6e00f8d3a5b7806e56212f89c7cca8637a354054/net/core/pktgen.c

Project Member

Comment 17 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/47b96a4f4e831b4626af81455e5a798748bafbba

commit 47b96a4f4e831b4626af81455e5a798748bafbba
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Jul 08 19:42:11 2015

UPSTREAM: net: pktgen: fix race between pktgen_thread_worker() and kthread_stop()

pktgen_thread_worker() is obviously racy, kthread_stop() can come
between the kthread_should_stop() check and set_current_state().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fecdf8be2d91e04b0a9a4f79ff06499a36f5d14f)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: Icf24379a6b479efc8a67d7b4a213d03bfa01b55d
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433353
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/47b96a4f4e831b4626af81455e5a798748bafbba/net/core/pktgen.c

Project Member

Comment 18 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/74d31a292b59750d3181569c5cc17639dca128a2

commit 74d31a292b59750d3181569c5cc17639dca128a2
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Jul 08 19:42:13 2015

UPSTREAM: net: pktgen: kill the "Wait for kthread_stop" code in pktgen_thread_worker()

pktgen_thread_worker() doesn't need to wait for kthread_stop(), it
can simply exit. Just pktgen_create_thread() and pg_net_exit() should
do get_task_struct()/put_task_struct(). kthread_stop(dead_thread) is
fine.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1fbe4b46caca5b01b070af93d513031ffbcc480c)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: Ie664f5a2f51138d3e5c59697e8eccebdcaace188
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433354
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/74d31a292b59750d3181569c5cc17639dca128a2/net/core/pktgen.c

Project Member

Comment 19 by bugdroid1@chromium.org, Jan 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/3bfacde57908a9d75ff167a79530e54b72625f12

commit 3bfacde57908a9d75ff167a79530e54b72625f12
Author: Oleg Nesterov <oleg@redhat.com>
Date: Tue Aug 04 16:33:34 2015

UPSTREAM: net: pktgen: don't abuse current->state in pktgen_thread_worker()

Commit 1fbe4b46caca "net: pktgen: kill the Wait for kthread_stop
code in pktgen_thread_worker()" removed (in particular) the final
__set_current_state(TASK_RUNNING) and I didn't notice the previous
set_current_state(TASK_INTERRUPTIBLE). This triggers the warning
in __might_sleep() after return.

Afaics, we can simply remove both set_current_state()'s, and we
could do this a long ago right after ef87979c273a2 "pktgen: better
scheduler friendliness" which changed pktgen_thread_worker() to
use wait_event_interruptible_timeout().

Reported-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7ba8bd75ddc6b041b5716dbb29e49df3e9cc2928)

BUG= chromium:670842 
TEST=test_that $H "Pktgen test" --args=interface=wan0,count=2000,num_iterations=1000

Change-Id: I30c0a5032b2a6a51cc511a7dafcea6d108984363
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/433355
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/3bfacde57908a9d75ff167a79530e54b72625f12/net/core/pktgen.c

Status: Fixed (was: Started)
All patches have landed.

Comment 21 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 22 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 24 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment