Failed to save changes in Windows + Chrome (version 51) since May 27 till now, but works well on Chrome Canary v53 and Firefox/Safari/IE
Reported by
helenlil...@gmail.com,
Jul 1 2016
|
||||||||||||||||||
Issue descriptionChrome Version : 51.0.2704.103 OS Version: 6.3 URLs (if applicable) : www.tinkercad.com and https://projectignite.autodesk.com Other browsers tested: Add OK or FAIL after other browsers where you have tested this issue: Chrome: Failed Opera: Failed Chrome Canary: OK (53.0.2784.1) Safari: OK Firefox: OK IE 11: OK What steps will reproduce the problem? 1. Sign in https://projectignite.autodesk.com 2. Start to work on a 3D project 3. Insert a shape generator shape from the side panel 4. Change the parameters of the shape generator What is the expected result? Saving all works well after changing the parameters or do other actions. What happens instead of that? Failed to save the changes, it just showing saving message in our canvas. Please provide any additional information below. Attach a screenshot if possible. 1. The first user reported us is on May 27, 2016, all of them are having the issues on Windows + latest Chrome and Opera 2. Chrome Canary version 53 has no such issue 3. Firefox, Safari and IE 11 works well 4. For few our users, they disable the Google Docs and Google Docs Offline extensions could help, but rest are not. UserAgentString: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
,
Jul 1 2016
About when the Chrome will be upgraded to version 53 which has same code base of Chrome Canary 53? We are getting more and more users report this issue on Chrome to us this month. Please help to have a look about this issue. thanks a lot!
,
Jul 1 2016
We find this issue might be similar to https://bugs.chromium.org/p/chromium/issues/detail?id=624077
,
Jul 1 2016
Attach the trace file which could reproduce this issue to you. Hope it helps. Thanks a lot!
,
Jul 1 2016
I take a look at your site. It might be the same issue. I saw that you have a setInterval set on 100ms where you check if there is a change to send it to backend. I open the app and make some changes but the "Saving..." label didn't disappear. Then I refresh the app and my changes disappear. After the refresh it works as I expected and everything was saved to backend and now I cannot reproduce it.
,
Jul 1 2016
Thanks for the trace and the animated gif - very helpful. I tried to reproduce this issue on 51.0.2704.106 (linux) but I wasn't able to, The changes got saved as expected.
A few things to note:
* The problem (what ever it is) is not related to the sheculer's ThrottlingHelper, the upcoming expensive task blocking intervention (that's not enabled in M51 and the tasks are not deemed expensive anyway), or the hidden iframe task throttling intervention (that's not in M51 and besides I don't think it would apply here).
* There might be something odd about the delayed tasks. At the 14s mark the trace contains this information about the frame_timer_tq (the queue on which things like set timeout and set interval go):
{delay_to_next_task_ms: 2033.925,
delayed_incoming_queue_size: 80,
delayed_work_queue_size: 1,
enabled: true,
immediate_incoming_queue_size: 0,
immediate_work_queue_size: 0,
name: "frame_timer_tq",
priority: "normal",
pump_policy: "auto",
time_domain_name: "RealTimeDomain",
wakeup_policy: "can_wake_other_queues"},
That shows there are 80 timers pending, and the next one is due to fire in about 2 seconds. That is surprising considering there is supposed to be setInterval of 100ms. I'm not sure what is going on here.
,
Jul 4 2016
Tentatively marking this a scheduling bug. Could you check whether this happens in an incognito profile (i.e., without extensions)?
,
Jul 4 2016
To answer #2 the _estimated_ stable release of M53 is Sep 6th, 2016 https://www.chromium.org/developers/calendar
,
Jul 4 2016
+sigbjornf would you mind double checking the if (!canFire()) return; code in TimerBase::runInternal? As far I can tell it's doing the right thing (I instrumented TimerBase and so far I always see the destructor run shortly after that early out) but if there was some corner case that might explain these setInterval reports. IIRC oilpan was turned on in M51 too.
,
Jul 4 2016
yes, it was enabled in M50. If the canFire() predicate incorrectly returns false, that means the object 'owning' the timer has been incorrectly deemed to be garbage. i.e., the check is there to prevent unsafe access to objects about to be destructed by the garbage collector sweeper. An incorrect result would mean that we're not handling page/object marking in the heap correctly. I'd expect more failures (and crashing ones) if that were the case, so it seems an unlikely source. I can't think of any recent Blink GC fixes which could help explain M51 vs M53.
,
Jul 4 2016
Could be a pure coincidence, but May 27th and Windows brings the unexplained M51-only issue 615306 to mind.
,
Jul 5 2016
+sigbjornf sounds it could be a possible reason because we observe that only clients with M50 or M51 experienced the problem described in the issue 624077 .
,
Jul 5 2016
Hmm, it looks hard to diagnose the problem without bisecting. Given that M53 doesn't have the issue, can we find a revision that fixed the issue?
,
Jul 6 2016
There's a tool here that helps bisecting Chrome builds: https://www.chromium.org/developers/bisect-builds-py If anyone who can reproduce this could try it that would be super helpful!
,
Jul 11 2016
Thanks all for your comments! Much appreciate! Sorry for my late response. Response to skyos...@chromium.org Could you check whether this happens in an incognito profile (i.e., without extensions)? [Helen]: Yes, I am still able to reproduce this issue with incognito profile today.
,
Jul 11 2016
@skyos...@chromium.org: Hi, I'm from Tinkercad team, we tried the bisect-build tools you provided to us, and we found that the latest Chromium version we downloaded is v40, but the Chrome version in our computer is v51 now. (We use the revision 300698 to download the Chromium). Could you help to check this? Thank you:) We got more and more user's feedback about this issue recently, and the issue was first reported on 5/27, so could you provide us the build revision during the whole May this year? And one more thing, seems that the bisect-builds.py file was removed in your server if the user try to download the file directly.
,
Jul 12 2016
I tried again today and was able to reproduce this once in M51 (linux), but despite significant effort I was not able to reproduce it again subsequently. Looking at the trace I notice a few things: * It went into UseCase::SYNCHRONIZED_GESTURE and timer tasks where deemed to be expensive and where throttled briefly. (note the symptom of the saving... message not going away happened before I started tracing). Judging from the trace the throttling code seemed to be working correctly. Subsequent traces never seemed to go into UseCase::SYNCHRONIZED_GESTURE. * Some of the DOM timers have very long delays (up to 58 mins). Currently I don't have a theory regarding what might be wrong. There haven't been that many scheduler / dom timer patches between M51 and M53 and I'd be surprised if any of them would fix the bug.
,
Jul 12 2016
I'm grasping at straws a bit here, but Sigbjornf is it possible https://codereview.chromium.org/2015173003 might be related? That patch didn't exist in M51 but does in M53 (which according to #1 doesn't have the bug).
,
Jul 12 2016
That change was made to ensure absolute safety of some later change (concerning SVG image-related finalization) using willObjectBeLazilySwept(), and not addressing some existing problem with willObjectBeLazilySwept() usage. So I'd be quite surprised if it intersects with the issue here; if there was a problem with probing such near-dead objects, it would have shown itself earlier than suddenly in M51, one would think. iow, "unlikely, but not impossible."
,
Jul 12 2016
How to get the Chrome build newer than V40? We don't know how to get the old Chrome builds to bisect it after build v40. Anybody could point a place where we could get new builds than V40? Thanks a lot!
,
Jul 13 2016
Could you give us a bit more details about the bisect command line you where running?
,
Jul 13 2016
Make sure you don't use the -l (or --blink) option for the bisect script
,
Jul 14 2016
We use the bisect script: python tools/bisect-builds.py -a win -g good-revision -b bad-revision. The revision is selected from: https://src.chromium.org/viewvc/chrome/branches/?sortby=date#dirlist
,
Jul 14 2016
Hello all, Above bisect script and revision list is what we used to debug the issue. While we only able to get version 40, but no newer Chrome version to debug. We use the bisect script: python tools/bisect-builds.py -a win -g good-revision -b bad-revision. The revision is selected from: https://src.chromium.org/viewvc/chrome/branches/?sortby=date#dirlist Are they correct? Thanks for your help! Regards, Helen
,
Jul 14 2016
I tried this issue on Windows beta version 52.0.2743.75, it works without hanging. The timers get going as expected. (I am a developer working with Helen in Tinkercad)
,
Jul 15 2016
Thanks for the datapoint. I'll have another go at trying to reproduce this today. If I do manage to even semi-reliably reproduce it I'll be able to find out what it is and hopefully add some testing to make sure it stays fixed.
,
Jul 18 2016
I'm trying to track down what's going wrong with an instrumented version of M51. Unfortunately I'm getting 500s when I try to log in. https://accounts.acg.autodesk.com/TokenAPI/login.cfm?afc=IG1ONP&locale=en-US®ister=false&relative_redirect_path=%2Fapi%2Fv1%2Fauth%2Fget-token%2Fhttps%3A%2F%2Fprojectignite.autodesk.com%2Fapp%2FloggedIn%2F%2F&social_providers=FB%2CGO%2CYH%2CWL%2CDA Doesn't happen with a stock version of M51, not sure why.
,
Jul 18 2016
Never mind I got it working in an incognito window. I'm not sure why yet, but the DomTimer backing the setInterval got deleted. I'll try and find out why. NOTE this is doesn't seem to be a scheduler bug.
,
Jul 18 2016
I spoke too soon there's an unfortunate interaction between an optimization in TimerBase::setNextFireTime (the check to see if m_nextFireTime != newTime) and virtualization of time for throttled timers. The TimerBase logic assumes time is monotonically increasing and normally it is, however the task queue was in a transition from unthrottled to throttled and it seems now went backwards! [0x3cb051e08a60] TimerBase::setNextFireTime now 2421367.119952 delay 0.037971 (2421367.157923) on queue 0x11282b5949f0 [0x3cb051e08a60] TimerBase::setNextFireTime now 2421367.119952 delay 0.037971 (2421367.157923) on queue 0x11282b5949f0 [0x3cb051e08a60] TimerBase::setNextFireTime now 2421367.009109 delay 0.148814 (2421367.157923) on queue 0x11282b5949f0 That happened because ThrottlingHelper's timebase only gets advanced one per second, and the value being used for the timer computation was stale. Normally that's not a problem, the timer still fires and when the time is unthrottled everything proceeds as normal. However in this case the computed runtime was exactly the same as the immediate run before so the timer didn't get scheduled and the DOMTimer got deleted. To fix this I propose we do two things: 1. Remove the optimization in TimerBase::setNextFireTime. It's almost never hit and in light of this bug it seems dangerous. (this would fix the bug) 2. Change the design of scheduler::TimeDomain to allow the throttled time domain to report the real time to blink while still using the virtual time base for internal scheduling decisions. (this would also fix the bug) This underlying bug probably existed in M50, it exists in M51 onwards (it's probably not visible in M53 for this page because the policy that was triggering the throttling changed). I'll get this fixed in ToT and get it merged into M52 and M53. I don't know if this would qualify for a M51 refresh although I'll ask.
,
Jul 18 2016
,
Jul 18 2016
,
Jul 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716 commit d8ef45f5f3b2d1cdfe863d346acda5da03ba6716 Author: alexclarke <alexclarke@chromium.org> Date: Mon Jul 18 16:56:48 2016 Fix a bug that could occasionaly cause setInterval to stop There was a bad interaction between throttled task queues and logic in TimerBase::setNextFireTime that could cause setInterval to stop firing. If a timer was posted to a throttled task queue after running on a normal task queue, it was possible for NOW to be in the past. Logic existed in TimeDomain::ComputeDelayedRunTime to compensate for this. Usually that worked fine, however the anti-drift logic in TimerBase for repeating tasks could cause the scheduled run time (as visible to TimerBase) to fall on the exact same time as the previous run. This would cause TimerBase::setNextFireTime to nop out, which would cause the setInterval to stop firing. This patch prevents that by changing ThrottledTimeDomain to be based on RealTimeDomain. This means Now() is the real time rather than a periodically updated virtual time. I.e. it's monotonically increasing. At the same time I've remove the now useless TimeDomain::ComputeDelayedRunTime BUG= 625041 Review-Url: https://codereview.chromium.org/2155143002 Cr-Commit-Position: refs/heads/master@{#406017} [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/base/real_time_domain.cc [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/base/real_time_domain.h [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/base/task_queue.h [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/base/task_queue_impl.cc [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/base/task_queue_impl.h [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/base/task_queue_manager_unittest.cc [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/base/time_domain.h [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/base/time_domain_unittest.cc [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/base/virtual_time_domain.cc [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/base/virtual_time_domain.h [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/child/compositor_worker_scheduler.cc [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/child/idle_helper.cc [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/child/scheduler_helper_unittest.cc [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/renderer/throttled_time_domain.cc [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/renderer/throttled_time_domain.h [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/renderer/throttling_helper.cc [modify] https://crrev.com/d8ef45f5f3b2d1cdfe863d346acda5da03ba6716/components/scheduler/renderer/throttling_helper_unittest.cc
,
Jul 18 2016
,
Jul 18 2016
Before we approve merge to M52, Could you please confirm whether this change is baked/verified in Canary and safe to merge?
,
Jul 18 2016
Assuming there's time, would it make sense to merge into M53 and wait a couple of days to make sure there's no unexpected side effects before merging into M52?
,
Jul 18 2016
I spoke to govind@ offline and we've agreed we will try to get this for next stable refresh release. This bug breaks web apps that reply on setInterval (at least two I know of, there will undoubtably be more) for some users, so the merges are justified my opinion.
,
Jul 18 2016
Thank you alexclarke@. Would it be possible for you to test/verify on ToT?
,
Jul 18 2016
Sorry M53 merge is not approved yet.
,
Jul 18 2016
Approving merge to M53 branch 2785 based on offline chat with alexclarke@ so it gets picked for this week dev release tomorrow (Although change is not baked in canary & don't have a repro for the bug that occurs in ToT that can test/verify) alexclarke@ will test/verify the change in tomorrow's canary and update the bug.
,
Jul 18 2016
(Un)fortunately https://projectignite.autodesk.com gets fixed in M52 by an unrelated scheduler change. It should be possible to construct a js-fiddle case for this to verify this however and I'll do that tomorrow.
,
Jul 18 2016
Per conversation with govind@ I'm not expecting a perf impact with this patch, but we need to check telemetry to ensure there are no power or smoothness regressions.
,
Jul 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a6348ac8c0afbe3db0991574e905bdec1e265999 commit a6348ac8c0afbe3db0991574e905bdec1e265999 Author: Alex Clarke <alexclarke@chromium.org> Date: Mon Jul 18 21:16:13 2016 (Merge) Fix a bug that could occasionaly cause setInterval to stop There was a bad interaction between throttled task queues and logic in TimerBase::setNextFireTime that could cause setInterval to stop firing. If a timer was posted to a throttled task queue after running on a normal task queue, it was possible for NOW to be in the past. Logic existed in TimeDomain::ComputeDelayedRunTime to compensate for this. Usually that worked fine, however the anti-drift logic in TimerBase for repeating tasks could cause the scheduled run time (as visible to TimerBase) to fall on the exact same time as the previous run. This would cause TimerBase::setNextFireTime to nop out, which would cause the setInterval to stop firing. This patch prevents that by changing ThrottledTimeDomain to be based on RealTimeDomain. This means Now() is the real time rather than a periodically updated virtual time. I.e. it's monotonically increasing. At the same time I've remove the now useless TimeDomain::ComputeDelayedRunTime BUG= 625041 Review-Url: https://codereview.chromium.org/2155143002 Cr-Commit-Position: refs/heads/master@{#406017} (cherry picked from commit d8ef45f5f3b2d1cdfe863d346acda5da03ba6716) Review URL: https://codereview.chromium.org/2158993002 . Cr-Commit-Position: refs/branch-heads/2785@{#200} Cr-Branched-From: 68623971be0cfc492a2cb0427d7f478e7b214c24-refs/heads/master@{#403382} [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/base/real_time_domain.cc [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/base/real_time_domain.h [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/base/task_queue.h [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/base/task_queue_impl.cc [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/base/task_queue_impl.h [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/base/task_queue_manager_unittest.cc [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/base/time_domain.h [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/base/time_domain_unittest.cc [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/base/virtual_time_domain.cc [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/base/virtual_time_domain.h [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/child/compositor_worker_scheduler.cc [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/child/idle_helper.cc [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/child/scheduler_helper_unittest.cc [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/renderer/throttled_time_domain.cc [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/renderer/throttled_time_domain.h [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/renderer/throttling_helper.cc [modify] https://crrev.com/a6348ac8c0afbe3db0991574e905bdec1e265999/components/scheduler/renderer/throttling_helper_unittest.cc
,
Jul 18 2016
https://chromeperf.appspot.com/report?sid=29d65de163b448a26e54c6214d81dc8f438e9d5bdae4f91e0d5836bbcfc5aa63 looks good. I'll want to make sure the various smoothness bots don't show any regression, they've not cycled yet so I'll check in the morning.
,
Jul 19 2016
Thanks all for your great help to confirm the issue and get it to be fixed! Much appreciate! alexclarke@ sorry for the error 500 issue when you signing in, it might because the username is conflict in our www.tinkercad.com and accounts.autodesk.com due to some reason, if you would like to use Tinkercad.com to create 3D modeling in the future, possible solution could be found here: https://support.tinkercad.com/hc/en-us/articles/214090638-I-m-logged-out-automatically-or-get-a-403-or-410-Error- If you still have problem about Tinkercad, please feel free to send us a ticket in our Help Center for Tinkercad related issues anytime. :) Thanks. Regards, Helen
,
Jul 19 2016
And could you please share with us more about how to get the correct bisect Chrome in case we need to bisect it again in the future for other things? We use the bisect script: python tools/bisect-builds.py -a win -g good-revision -b bad-revision. The revision is selected from: https://src.chromium.org/viewvc/chrome/branches/?sortby=date#dirlist Are they correct? Thanks for your help! Regards, Helen
,
Jul 19 2016
I've never used that bisect script myself so I don't know the answer sorry. I'd suggest asking on the chromium-dev mailing list, somebody there will know. RE #44 looking at the perf bots this morning, it looks like the patch had no perf impact (as expected). I'll try and devise a repro to check it's fixed on ToT.
,
Jul 19 2016
Tested the issue on windows 7 using chrome version 53.0.2785.21 with the below steps 1.Sign into site https://projectignite.autodesk.com 2.Start a 3D project 3.Open chrome://tracing and record 4.Able to save the changes to the project 5.Stop the recording 6.Able to save the changes when importing the recording alexclarke@ Please find the attached screen cast and confirm if anything required to verify here?. If so please provide us the any sample test steps to verify the issue from test team end. Thank you!
,
Jul 19 2016
RE #48 an unrelated scheduler change masks this bug since M52. I'm working on a jsfiddle that should hopefully illustrate the bug in M53 pre-patch.
,
Jul 19 2016
I should be more precise: an unrelated scheduler change masks the bug for tinkercad since M52 but I believe that didn't fix the problems reported in http://crbug.com/624077
,
Jul 19 2016
I spent this morning trying to write a repro to illustrate the bug, and I can make setInterval reliably break in M51 (see attachment*). Due to unrelated changes, it's rather awkward to get the page to trigger an affected scheduler policy in M52+. You can do it with my repro if you scroll using the mouse scroll wheel. However even my patch locally reverted (https://codereview.chromium.org/2155143002) I can't make a ToT build show the bug unless I comment out task_queue->SetQueueEnabled(false); in ThrottlingHelper::IncreaseThrottleRefCount. If I do that then it's easy to reproduce the bug. So the bug has been fixed by an earlier patch: https://chromium.googlesource.com/chromium/src/+/82a129abeafa1bd8184b5c3e61ab2c7568edd3f8 This means we probably don't need the M52 merge. However I think the new patch leaves the codebase safer, since ThrottledTimeDomain::Now() always returns the right time, so we should keep the patch in ToT. * How to use repro: 1. Download the file somewhere 2. Load it with file:// 3. You should see two colored bars, a solid red one on the left and another on the right that has an animation applied to it's color 4. In M51 use the left mouse button to scroll the page and shake the mouse a few times. Usually after a few attempts the color will stop animating, this is the bug!
,
Jul 19 2016
RE M53, I tried that this afternoon and I wasn't able to reproduce the bug. I believe it's fixed there.
,
Jul 19 2016
Thank you alexclarke@ for verifying on M53.
,
Jul 20 2016
@alexclarke, thanks for your update:)Could you give us the detail mail address of the "chromium-dev mailing list", and I can send mail to them to ask the bisect script usage.
,
Jul 21 2016
Chrome v52 has been released, and we tested in this version, the issue can't be reproduced, thanks for your support:)
,
Jul 21 2016
re #54: https://groups.google.com/a/chromium.org/forum/#!forum/chromium-dev Thanks for letting us know you've verified the fix. I'll mark this as fixed. |
||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||
Comment 1 Deleted