New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 785930 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 782187
Owner: ----
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

elm performance issue related to WiFi and big.LITTLE and Interactive governor

Reported by dave.rod...@arm.com, Nov 16 2017

Issue description

UserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.45 Safari/537.36
Platform: 10082.0.2017_10_31_1154 (Test Build - davrod01) developer-build elm

Example URL:

Steps to reproduce the problem:
1. connect to WiFi on elm
2. run top_25_smooth benchmark:
   run_benchmark --browser=cros-chrome smoothness.top_25_smooth --remote=...
3. results significantly lower (around 14% regression) compared to known-good USB network adaptor

What is the expected behavior?

What went wrong?
Benchmark results are lower than expected.

Did this work before? N/A 

Chrome version: 64.0.3253.0  Channel: dev
OS Version: 
Flash Version: 

As a reference for "good" performance, I'm using a LogiLink UA0025C USB 2 network adaptor - whilst this is USB 2, it performs best out of all adaptors that I have available. Some other USB network adapters also exhibit performance issues (see  issue 782187 ).

Motionmark appears to be unaffected.

So far, I've mainly tested on a single page (booking.com) of the smoothness benchmark, but most/all pages are affected. For booking.com, performance drops from around 91% to 79%, i.e., using WiFi causes a drop of around 7 FPS.

I plan to continue testing on page_cycler and Speedometer, and will also look at other platforms.

elm uses a mwifiex driver. dmesg reports:
[    2.293094] mwifiex: rx work enabled, cpus 4
[    3.635885] mwifiex_sdio mmc2:0001:1: info: FW download over, size 800344 bytes
[    4.361339] mwifiex_sdio mmc2:0001:1: WLAN FW is active
[    4.438210] mwifiex_sdio mmc2:0001:1: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.7.p87) 
[    4.438221] mwifiex_sdio mmc2:0001:1: driver_version = mwifiex 1.0 (15.68.7.p87)
 

Comment 1 by mmenke@chromium.org, Nov 16 2017

Components: -Internals>Network OS>Systems>Network
Cc: briannorris@chromium.org karth...@marvell.com diand...@chromium.org
Adding Brian/Doug/Karthik (marvell) since they seem to have touched 
Cc: drinkcat@chromium.org djkurtz@chromium.org
If you fix the CPU frequency to a certain value with userspace governor and keep only small cores on (or only big cores on--whichever is easier) then do you still get a performance regression?

AKA: one guess is that your choice of network adapter affects the load on the system.  If you put a little extra or a little less load on the system it may decide to bump up the CPU frequency, which could make tests run faster or slower.

Comment 4 Deleted

Comment 5 by dave.rod...@arm.com, Nov 21 2017

Tested with a range of governors and cores enabled:

Adapter	  governor      result (little only)   result (all cores)
WiFi      performance   91.97                  94.51
USB       performance   92.50                  94.48
WiFi      powersave     59.63                  55.46
USB       powersave     45.65                  47.96
WiFi      ondemand      93.94                  89.17
USB       ondemand      92.41                  89.10
WiFi      interactive   89.31                  70.44
USB       interactive   88.39                  89.38

In other words, the regression is only seen with all cores enabled, when the interactive governor is used. I've also tested with a 4.4 kernel and found that the regression is much smaller, but still present (about 6%).

Comment 6 by dave.rod...@arm.com, Nov 21 2017

re #3: I've compared frequency data and this shows a big difference. When using ethernet/usb, mean frequency is 13% higher for the little cores (880 vs 782), and 67% higher for the big cores (1303 vs 782).

The attached image compares CPU frequencies during a 1 second trace taken while scrolling the booking.com page.
cpu_frequencies - wifi_vs_ethernet.png
29.8 KB View Download
Cc: kirtika@chromium.org
I've heard Kirtika mention in the context of another Wifi driver that cpufreq was throttling max wifi performance at times. I don't recall what solution she came up with.

I think her problem was seen on 4.4, with a uniform (not bit.LITTLE) Intel architecture, so maybe the concerns (and solutions) are significantly different anyway.

Comment 8 by kirtika@google.com, Nov 21 2017

Cc: snanda@chromium.org
In that case, the CPU was idling too much and preventing the wifi NIC from 'depositing' packets i.e. forcing the NIC to drop packets. 
You are already doing the right thing here by comparing performance across governors, assuming performance governor is your best-possible-case. 
In that case, the solutions were:
(a) enable LTR on the pci root port, something you may want to check for here.
(b) Have the driver use pm_qos_add_request / pm_qos_update_request to ask for low latency when it detected high load. 

It might be helpful to monitor the output of `iw mlan0 station dump` periodically and plot the avg tx/rx bitrates. In that case, the good case had a steady rx bitrate at the max MCS, the bad case had a lower rx rate. 
A low/unsteady rx rate would point to the device dropping packets, a low/unsteady tx rate would mean something is wrong with the rate-scaling. 

In comment #6, what is the governor used?

@8: There's no PCI here, so (a) won't work ;)
Thanks for the suggestions, I'll look into these.

The data in #6 is using the default (interactive) governor.
Summary: elm WiFi performance issue related to big.LITTLE and Interactive governor (was: elm WiFi performance issue)
I could certainly be wrong here, but it seems like the results above are just not all that surprising.

From all the work we did on Kevin we know that the Interactive governor vs. big.LITTLE is a bit fragile and just not the best solution.  It's been working OK so far on elm but it seems totally sane that there are rough edges and ways to make it fall over.

If I had to guess the simple explanation here is that the WiFi driver is _too_ efficient compared to the USB Ethernet adapter driver.  My theory here is that the USB Ethernet adapter is doing enough inefficient things to trip the CPU Frequency up to the next notch.  That has the side effect of burning more power but also giving us better performance.  I would sorta bet that if you had the WiFi driver fork off a kthread and factor some prime numbers whenever it's doing WiFi transfers that you'd see a nice boost in performance (and power usage).  ...or you could tweak the Interactive governor a bit. 

IMHO our solution here is not to waste lots of time on tweaking the Interactive governor vs. big.LITTLE, though.  We should find a way to get the proper governor to elm, either by uprevving elm's kernel or backporting the scheduler patches.
Has anyone done similar tests on Kevin (4.4 kernel and similar Wifi driver -- though it's PCIe, not SDIO)? I haven't personally done side-by-side comparisons on a throttled network (Wifi should have a max throughput of more than 95 Mbps) to see if there are scheduling inefficiencies, though generally I believe performance was mostly up to expectations.

There's no guarantee that the scheduler in 4.4 will get us better Wifi performance.
@12: I haven't done testing with kevin, but see above (comment #5):

> I've also tested with a 4.4 kernel and found that the regression 
> is much smaller, but still present (about 6%).

...so it seems like Dave is pretty convinced that the 4.4 scheduler is at least a bit better here, even if it's still not perfect...
Ah, sorry. I *did* read that comment before, but I lost all memory over the holiday ;)

It still sounds reasonable that we might do something like Kirtika's suggestion in #8(b); there's no guarantee that high network traffic will translate "quickly enough" into a CPU load signal that the scheduler will notice. But on the other hand, I see very few drivers that make PM QoS requests, so we might proceed with caution there...
@14: I'm still a little confused here.  From the discussions on the other similar bug ( bug #782187 ), it seems like network performance isn't really the issue here.  My understanding of everything is that simply being connected via WiFi vs. a special USB Ethernet adapter affects the performance of a test that is not really affected in a major way by network performance.

Specifically, my theory is that the only reason the choice of network adapter (WiFi or various USB Ethernet adapters) affects the test is because the choice of network adapter affects what CPU Frequency the governor chooses.

The test I suggested in the other bug was to use userspace governor.  AKA:

===

Run the test like this (min freq) and compare WiFi vs. USB Ethernet:

cd /sys/devices/system/cpu/
for cpu in cpu*/cpufreq; do
  cd /sys/devices/system/cpu/"${cpu}"
  echo userspace > scaling_governor
  cat scaling_min_freq > scaling_setspeed
done

---

Run the test like this (max freq) and compare WiFi vs. USB Ethernet:

cd /sys/devices/system/cpu/
for cpu in cpu*/cpufreq; do
  cd /sys/devices/system/cpu/"${cpu}"
  echo userspace > scaling_governor
  cat scaling_max_freq > scaling_setspeed
done

===

If WiFi vs. USB Ethernet actually get the same in both cases then it's not that the WiFi driver somehow needs a higher QoS or anything.  It's just that the WiFi driver happens to not cause a bump in CPU Frequency.
One other note: if there is some actual evidence of dropped packets somewhere, then that could possibly be a different story.  I'm just not sure I saw any evidence of that.  In any case, my test case ought to be a simple one to run.
Wow, I think I really misinterpreted something along the way. For one, I was reading the "result" columns in #5 as network performance numbers. It looks like they are some unit-less measure from the top_25_smooth test instead. I have no idea what that is measuring.

Reading through  bug 782187 , I still don't see an agreement over whether this test should be affected by network performance itself (and not just the network driver's effect on CPU scheduling). If it's not network performance, then this bug's subject is misleading (both the old subject and the new one).

I'll shut up until I see more useful data.
Summary: elm performance issue related to WiFi and big.LITTLE and Interactive governor (was: elm WiFi performance issue related to big.LITTLE and Interactive governor)
I haven't gone and re-dug through all the comments / evidence, but perhaps Dave just knows the answer: now that we know the root cause of  bug #782187 , do we think that this is the same or different.
Yes, I'm confident this is the same issue. We opened this because there was some concern that WiFi was a different case (I think at the time, the belief was that it was a USB2/3 issue causing differences with the network adapaters).

So IMO we should close this as a duplicate.
Mergedinto: 782187
Status: Duplicate (was: Unconfirmed)

Sign in to add a comment