New issue
Advanced search Search tips

Issue 795323 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug-Regression



Sign in to add a comment

31.1%-68% regression in thread_times.key_silk_cases at 524044:524066

Project Member Reported by chiniforooshan@chromium.org, Dec 15 2017

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Dec 15 2017

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=795323

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=38a34cf86b4b49b455eefa91dc48752da1c0b4a340bb85ca10eddc0bf66ef211


Bot(s) for this bug's original alert(s):

android-webview-nexus6
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Dec 16 2017

Cc: ofrobots@google.com
Owner: ofrobots@google.com
Status: Assigned (was: Untriaged)

=== Auto-CCing suspected CL author ofrobots@google.com ===

Hi ofrobots@google.com, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Ali Ijaz Sheikh
  Commit : 4623b8223b487fb18b3af12fcf4fdbb7a194eca3
  Date   : Tue Dec 12 19:12:03 2017
  Subject: [heap] replace DisableInlineAllocationSteps with PauseAllocationObservers

Bisect Details
  Configuration: android_webview_nexus6_aosp_perf_bisect
  Benchmark    : thread_times.key_silk_cases
  Metric       : thread_other_cpu_time_per_frame/http___mobile-news.sandbox.google.com_news_pt0?swipe
  Change       : 63.90% | 15.8933688998 -> 26.0484550505

Revision                           Result                  N
chromium@524043                    15.8934 +- 1.76457      6      good
chromium@524055                    15.4808 +- 1.60551      6      good
chromium@524056                    15.8429 +- 2.8379       6      good
chromium@524056,v8@10d9c31488      15.9452 +- 1.17452      6      good
chromium@524056,v8@4623b8223b      27.4099 +- 4.87266      6      bad       <--
chromium@524056,v8@1220809924      25.2278 +- 4.27933      6      bad
chromium@524056,v8@18d7c66131      25.885 +- 3.01651       6      bad
chromium@524056,v8@f514cc967d      26.0746 +- 4.85701      6      bad
chromium@524056,v8@d5fbf0edcf      24.7171 +- 3.35632      6      bad
chromium@524057                    25.4461 +- 3.35011      6      bad
chromium@524058                    25.439 +- 5.14821       6      bad
chromium@524061                    26.6489 +- 8.38122      6      bad
chromium@524066                    26.0485 +- 2.58653      6      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-webview --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=http...mobile.news.sandbox.google.com.news.pt0.swipe thread_times.key_silk_cases

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8960120282111934624


For feedback, file a bug with component Speed>Bisection

Comment 4 by ofrobots@google.com, Dec 16 2017

Cc: u...@chromium.org
I will take a look at this next week. LMK if this more urgent than that.

Comment 5 by ofrobots@google.com, Dec 18 2017

Is there a way to reproduce the regression if I don't have an android device to test on? Is this regression seen on any other platform?
Project Member

Comment 6 by 42576172...@developer.gserviceaccount.com, Dec 18 2017

 Issue 795776  has been merged into this issue.
Labels: OS-Android
Looks like all regressions for this bug (https://chromeperf.appspot.com/group_report?bug_id=795323) are on Android.

I don't know if it is possible to reproduce the regression on Linux. But, maybe you can reproduce it by uploading a patch and trying it on android_webview_nexus6_aosp_perf_bisect?

Comment 8 by ofrobots@google.com, Dec 19 2017

I can reproduce a behaviour difference on a different benchmark locally on linux. I am investigating.

Comment 9 by ofrobots@google.com, Dec 19 2017

I understand the issue now. The following patch will 'fix' the regression:

void NewSpace::UpdateAllocationInfo() {
+ Address old_top = top();
  Address new_top = to_space_.page_low();
- InlineAllocationStep(top(), new_top, nullptr, 0);

  MemoryChunk::UpdateHighWaterMark(allocation_info_.top());
  allocation_info_.Reset(new_top, to_space_.page_high());
  original_top_.SetValue(top());
  original_limit_.SetValue(limit());
  UpdateInlineAllocationLimit(0);
+ InlineAllocationStep(old_top, new_top, nullptr, 0);
  DCHECK_SEMISPACE_ALLOCATION_INFO(allocation_info_, to_space_);
}

The patch that caused the regression effectively moved the InlineAllocationStep to be *before* the call to UpdateInlineAllocationLimit.

The problem is that, it is not correct to do the Allocation Step *after* picking the limit. We should do the Allocation Step *before* picking the limit for the next step.

The way the code was, our accounting of allocated memory was incorrect. Let's say the step size is 512K and we just allocated 500K before running out of space on the page. We would add a new page, advance the limit to new_page_start + 512K (as 'no bytes have been allocated yet'). We'd do a step afterwards, but it was too late, the limit was already set. The next step is going to occur after 512KB rather than 12KB.

The patch that causes the regression causes the idle scavenger to run more frequently, because we are now more accurate in our accounting. These Android benchmarks seem sensitive to the frequency of when idle scavenger is pinged.

One mitigation might be to scale back the step size for the idle scavenger (and possibly also the incremental marking observers).

Another problem that I would like to tackle is that the code has been doing inaccurate accounting for quite a long time. The bug was subtle and not quite obvious to figure out. It would be good to add a regression test / code assertion going forward. One idea here is that we should assert in NewSpace::UpdateInlineAllocationLimit when we observe that top has moved but no step has been performed.

Ulan: I am going to 'fix' the regression by moving the call to InlineAllocationStep. We can handle fixing this correctly in a follow-on. WDYT?
I was approximating 'increase in CPU usage on android devices' by counting the number of times 'InlineAllocationStep' is called. I saw a doubling of this number after the patch causing the regression when running the octane benchmark in the v8 benchmark/ directory.

If we are only changing the idle scavenger step size (and not the incremental marking step size), I needed to increase the idle scavenger step size by 4 times (512KB -> 2048KB) in order to get to the same call count for InlineAllocationStep.

I suspect we need to scale up idle scavenger and incremental marking step sizes by 2x each in order to come back to parity after we fix the accounting.

Note that this regression has not been observed anywhere else so far. This particular benchmark on android seems sensitive the to the frequency of the idle scavenger & incremental marking interruptions.
Project Member

Comment 11 by 42576172...@developer.gserviceaccount.com, Dec 20 2017

Cc: mlippautz@chromium.org
 Issue 796217  has been merged into this issue.
CL with the 'fix' is here: https://chromium-review.googlesource.com/c/v8/v8/+/835287

Comment 13 by u...@chromium.org, Dec 20 2017

> The way the code was, our accounting of allocated memory was incorrect. Let's say the step size is 512K and we just allocated 500K before running out of space on the page. We would add a new page, advance the limit to new_page_start + 512K (as 'no bytes have been allocated yet'). We'd do a step afterwards, but it was too late, the limit was already set. The next step is going to occur after 512KB rather than 12KB.

I am lost here. If we don't account the current bytes_allocated=(top - top_on_previsous_step) for the current limit, it should be accounted for the next limit computation, right? So this should only shift by one step?
> I am lost here. If we don't account the current bytes_allocated=(top - top_on_previsous_step) for the current limit, it should be accounted for the next limit computation, right? So this should only shift by one step?

We count all the bytes we have allocated; but we do this count operation less frequently than we should. We want interrupt inline allocation at specific points; these points are delayed because of the math error.

(oversharing in the interest of reducing round trips)

Let me illustrate. With quick fix from #9 applied, let's start with the following printfs added to NewSpace::UpdateInlineAllocationLimit and SpaceWithLinearArea::InlineAllocation Step. (I am restricting the printing to new space because the error in the accounting are easier to see.)

@@ -2096,7 +2096,9 @@ void NewSpace::UpdateInlineAllocationLimit(int size_in_bytes) {
     // Lower limit during incremental marking.
     Address high = to_space_.page_high();
     Address new_top = allocation_info_.top() + size_in_bytes;
-    Address new_limit = new_top + GetNextInlineAllocationStepSize() - 1;
+    size_t step = GetNextInlineAllocationStepSize();
+    Address new_limit = new_top + step - 1;
+    printf("--limit-- set next limit to %p + %ld - 1 = %p\n", new_top, step, new_limit);
     allocation_info_.set_limit(Min(new_limit, high));
   }
   DCHECK_SEMISPACE_ALLOCATION_INFO(allocation_info_, to_space_);
@@ -2221,6 +2223,9 @@ void SpaceWithLinearArea::InlineAllocationStep(Address top,
       top_on_previous_step_ = top;
     }
     int bytes_allocated = static_cast<int>(top - top_on_previous_step_);
+    if (identity() == 0) { // new space
+      printf("-- step-- bytes allocated %d top %p\n", bytes_allocated, top);
+    }
     AllocationStep(bytes_allocated, soon_object, static_cast<int>(size));
     top_on_previous_step_ = top_for_next_step;
   }

Let's run this on benchmark/run.js. Here's the output:

--limit-- set next limit to 0x18a90fe82200 + 524288 - 1 = 0x18a90ff021ff
--limit-- set next limit to 0x3dd948882200 + 524288 - 1 = 0x3dd9489021ff
-- step-- bytes allocated 344504 top 0x18a90fed63b8
-- step-- bytes allocated 515568 top 0x3dd9488ffff0
--limit-- set next limit to 0x172192f38fa0 + 524288 - 1 = 0x172192fb8f9f
-- step-- bytes allocated 290824 top 0x172192f7ffa8
--limit-- set next limit to 0x18a90fe82508 + 233464 - 1 = 0x18a90febb4ff
-- step-- bytes allocated 233480 top 0x18a90febb510
--limit-- set next limit to 0x18a90febb510 + 524288 - 1 = 0x18a90ff3b50f
--limit-- set next limit to 0x3dd948882200 + 524288 - 1 = 0x3dd9489021ff
-- step-- bytes allocated 281296 top 0x18a90fefffe0
-- step-- bytes allocated 515576 top 0x3dd9488ffff8
--limit-- set next limit to 0x1547a0b02840 + 524288 - 1 = 0x1547a0b8283f
--limit-- set next limit to 0x172192f02200 + 524288 - 1 = 0x172192f821ff
-- step-- bytes allocated 513984 top 0x1547a0b80000
--limit-- set next limit to 0x29a156e02200 + 10304 - 1 = 0x29a156e04a3f
-- step-- bytes allocated 515576 top 0x172192f7fff8
-- step-- bytes allocated 16 top 0x29a156e02210
--limit-- set next limit to 0x29a156e02210 + 524272 - 1 = 0x29a156e821ff
--limit-- set next limit to 0x20d4a702200 + 524272 - 1 = 0x20d4a7821ef
-- step-- bytes allocated 515504 top 0x29a156e7ffc0
-- step-- bytes allocated 515520 top 0x20d4a77ffc0
--limit-- set next limit to 0x18a90fe822e8 + 524288 - 1 = 0x18a90ff022e7
--limit-- set next limit to 0x3dd948882200 + 524288 - 1 = 0x3dd9489021ff
-- step-- bytes allocated 515288 top 0x18a90fefffc0
--limit-- set next limit to 0x2ee5db582200 + 9000 - 1 = 0x2ee5db584527
-- step-- bytes allocated 515552 top 0x3dd9488fffe0
...
...

The problem almost immediately jumps out; one would expect update-limit and steps to happen in matched pairs.

Let's look at the first few lines:

--limit-- set next limit to 0x18a90fe82200 + 524288 - 1 = 0x18a90ff021ff
--limit-- set next limit to 0x3dd948882200 + 524288 - 1 = 0x3dd9489021ff
-- step-- bytes allocated 344504 top 0x18a90fed63b8
-- step-- bytes allocated 515568 top 0x3dd9488ffff0

So we stared with page at 0x18...fe8220, allocated 336k before reaching the non-inline (AllocateRaw) path. Apparently added fresh page at 0x2d..8882200. On the fresh page the limit should be 512kb-336kb = 176k in the new page, or in other words the new limit should have been 3d..88ae200. Instead it was computed to be 3d..89021ff.

Eventually we *will* count the memory; it is just that we have delayed the interrupt.

If we reverse the quick fix in #9, and compare, we see that
1. the output has --limit-- and --step-- lines perfectly interleaved.
2. The string '--limit--' shows up 130k times instead of 67k times.

Project Member

Comment 15 by bugdroid1@chromium.org, Dec 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/aefc8a315343b124f16a2cfdaab3b086629e145d

commit aefc8a315343b124f16a2cfdaab3b086629e145d
Author: Ali Ijaz Sheikh <ofrobots@google.com>
Date: Wed Dec 20 14:15:08 2017

[heap] fix regression on thread_times.key_silk_cases

Doing an allocation step before the limit is picked is more correct, but
it causes the idle scavenger and incremental marking to run more
frequently, which causes a regression on certain thread_time benchmarks.
This CL fixes the regression by reverting the behaviour. A proper fix
can be implemented in a follow-on.

Bug:  chromium:795323 
Change-Id: Ic62eab1ddfd1da4c5904e11eba7d5d71cee38f12
Reviewed-on: https://chromium-review.googlesource.com/835287
Commit-Queue: Ali Ijaz Sheikh <ofrobots@google.com>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#50239}
[modify] https://crrev.com/aefc8a315343b124f16a2cfdaab3b086629e145d/src/heap/spaces.cc

Comment 16 by u...@chromium.org, Dec 20 2017

Thanks, Ali. Now I understand it. What I was missing is that AllocationObserver::AllocationStep 'forgets' overallocated bytes. So the bytes after step_size_ do not affect the next step_size_, which makes sense.

void AllocationObserver::AllocationStep(int bytes_allocated,
                                        Address soon_object, size_t size) {
  DCHECK_GE(bytes_allocated, 0);
  bytes_to_next_step_ -= bytes_allocated;
  if (bytes_to_next_step_ <= 0) {
    Step(static_cast<int>(step_size_ - bytes_to_next_step_), soon_object, size);
    step_size_ = GetNextStepSize();
    bytes_to_next_step_ = step_size_;
  }
}

> The string '--limit--' shows up 130k times instead of 67k times.
If doubling the scavenger step size fixes the regression, then let's do it.

> If we are only changing the idle scavenger step size (and not the incremental marking step size), I needed to increase the idle scavenger step size by 4 times (512KB -> 2048KB) in order to get to the same call count for InlineAllocationStep.
Did you try if 1024KB fixes the regression? Note that scavenger also keeps track of bytes allocated (see ScavengeJob::ScheduleIdleTaskIfNeeded), so it should be robust against AllocationStep being called due to incremental marking or other observers.
> If doubling the scavenger step size fixes the regression, then let's do it.

Note that I have not been able to reproduce the android-specific regression. I have been using 'number of allocation steps' as a proxy for 'thread time'.

Assuming that 67k steps is the correct number for benchmarks/run.js, doubling the scavenger step size reduces the steps from 130k -> ~89k. That alone might not be enough. At the same time it is quite possible that it is enough to remove the 'thread times' metric on these specific android benchmarks.

I agree that we should 'tweak' the numbers as little as possible, but I don't really have a good way to reproduce the actual regression.

How do you suggest we experiment with this?

Comment 18 by u...@chromium.org, Dec 20 2017

> How do you suggest we experiment with this?
If it doesn't reproduce locally, then let's land it and observe the perf bots.
The change might affect other GC sensitive benchmarks. We will have to try and see.
Project Member

Comment 19 by 42576172...@developer.gserviceaccount.com, Dec 21 2017

 Issue 795775  has been merged into this issue.
Status: Fixed (was: Assigned)

Sign in to add a comment