Gmail chat moles show up without style |
||||||||||||||||||||||||||
Issue descriptionVersion: 51.0.2687.0/dev OS: Ubuntu What steps will reproduce the problem? (1) Visit gmail with chat enabled (2) ??? (3) Open a chat window by clicking on a person's name, and it will sometimes show up without styles. Once in this state, the chat window is useless. Both esprehn and I have hit this but have been unable to debug further. Each chat mole is an iframe. This started 1-2 weeks ago. cc'ing folks who may have a clue when this started. I'm not sure if this is a change in gmail or not.
,
Mar 25 2016
No styles like no <link rel=stylesheet> and no <style> ?
,
Mar 26 2016
esprehn, chrishtr, and I looked at this in more detail and we think it's a paint bug after all, possibly related to issue 593758 . The repro steps are flaky but opening and closing chat windows seems to eventually trigger it. Comment #1 and comment #3 are incorrect: there are in fact styles when this occurs and changing them does not cause a repaint.
,
Mar 26 2016
Also possibly related to go/b598043 (internal gmail bug)?
,
Mar 28 2016
,
Mar 28 2016
,
Mar 30 2016
Test team, please try to repro and bisect, thanks.
,
Mar 30 2016
Test team, please try to repro and bisect, thanks.
,
Mar 30 2016
Unable to reproduce the issue on Ubuntu 14.04, Windows 7, Mac 10.10.5 using 51.0.2687.0, latest dev 51.0.2693.2 with below steps: 1.Opened gmail.com and logged in. 2.Opened a chat window by clicking on a person's name. 3.Observed that it is shown with styles same as earlier. Please find attached screenshot. pdr@Could you please check this issue on clean profile and update the thread if issue still persists.
,
Mar 30 2016
This still reproduces in 51.0.2693.2/dev. Unfortunately, I think it more reliably reproduces with a nontrivial amount of chat history.
,
Mar 31 2016
Unable to reproduce the issue on Ubuntu 14.04, Windows 7, Mac 10.10.5 using 51.0.2687.0, latest dev 51.0.12693.2 as per steps in comment #10 with nontrivial amount of chat history. Please find attached screencast.
,
Apr 4 2016
Unable to reproduce the issue on Mac 10.11.3,Win 7 and Ubuntu 14.04 using latest canary 51.0.2699.0.Hence removing the Blocker for now, please add the same if you still see the issue. pdr@ : Could you please check and update the same.
,
Apr 4 2016
This one is difficult to hit but it is correctly filed as a beta blocker. There's a very serious issue hiding in here that makes gmail barely usable if you have a non-trivial number of contacts (i.e., all enterprise customers).
,
Apr 4 2016
FWIW, I just reproduced this on 51.0.2693.2/dev/linux.
,
Apr 4 2016
> Comment #1 and comment #3 are incorrect: there are in fact styles when this occurs and changing them does not cause a repaint. I did a few changes related to updating stylesheets in March, but if you by changing them mean edit the stylesheet text in inspector, it sounds less likely that it's one of my changes.
,
Apr 6 2016
I was able to reproduce this in 51.0.2700.0/dev on osx but I spent 30m trying to find a reliable repro and haven't found any way to trigger this. I spent a lot of time loading chat moles in gmail in a debug build and there were no asserts. Without a reliable way to reproduce this I don't think we can make progress on this issue. @esprehn, do you have any ideas about what to do here?
,
Apr 6 2016
@pdr: one thing that can be tried is to make a build which force-disables the pending style modes in the paint code (by making Document::didLayoutWithPendingStylesheets() always return false) and see if it repros. That is, assuming we could get a repro on a ToT build at all - have you tried that?
,
Apr 6 2016
@chrishtr, I haven't been able to get this to repro in a debug build, even logging into my corp gmail which has many massive group chats. I tried making didLayoutWithPendingStylesheets return false but didn't see a change. Maybe this has been fixed at ToT? It's also curious that we don't have other people reporting this, only Elliott and me. I definitely have default about:flags and no extensions (other than corp-required ones). I wonder if we're in a gmail experiment or something? Removing RBB because there are only two people reporting this and we don't have a way to make progress on this bug without more information.
,
Apr 6 2016
Did you try a release build? Other than that I am flummoxed also.
,
Apr 8 2016
I *think* I saw something similar on G+ this week, although it was the menu pane that was transparent, not a chat mole, and it looked like it still had an otherwise up-to-date style. I haven't seen it since, and it may be completely unrelated. For paranoia if anyone can repro this, please check if it goes away with --disable-blink-features=RenderingPipelineThrottling
,
Apr 11 2016
+cc boliu@ I've seen this on my coworker's computer. I asked him to save the page, and I tried to load the saved page on Firefox. (with JS disabled. Otherwise the JS will attempt reload the page.) The same glitch still occurs. I don't think it is a style recalc/layout/paint bug because of this. I suspect one of the stylesheets failed to load. Or it could be that we have multiple bugs with gmail. boliu@ still have the snapshot in hand. I don't know if he minds to share because it may contain sensitive information. (Personal/work email, session token, that kind of stuff.)
,
Apr 12 2016
Issue 582294 has been merged into this issue.
,
Apr 27 2016
We are seeing this issue especially when the browser history state is being manipulated. A relatively consistent repro is: 1) Open a couple chat moles 2) Navigate to a couple of different labels inside of gmail to populate browser history. 3) Open some more moles. 4) Hit back Effectively what we are seeing is the chat iframes to be refreshed, causing it to lose state.
,
Apr 27 2016
Thank you pawong, I was able to reproduce this when doing those steps a few times. I think we may be able to make progress on this now.
,
Apr 27 2016
I was able to bisect using pawong's hint that this is related to history. Simpler repro that seems to be reliable: 1) open gmail 2) open a chat to someone 3) switch to another label (like "starred") 4) open a second chat to someone 5) switch to another label (like "drafts") 6) open a third chat to someone 7) hit back twice, then forward twice If the bug repros, some chat moles will turn invisible. You are probably looking for a change made after 371017 (known good), but no later than 371038 (first known bad). CHANGELOG URL: https://chromium.googlesource.com/chromium/src/+log/9a5d1848c533a279ef3b88d71ef5cb080455f1c5..a0023224241728134e43f1b3306da0ba3f78f1e1 I think this is: https://chromium.googlesource.com/chromium/src/+/b3b47ab25d16ae178cd926495681205bc6285a4c I manually reverted this change and the bug no longer reproduces.
,
Apr 27 2016
What's actually different with and without that change though? That doesn't explain the paint system failing to issue repaints, or the bug where <style> elements are missing from the head. (There's actually two bugs here, one is about the frame not having the <style>, the other is the paint system becomes wedged and never repaints.)
,
Apr 27 2016
I'm out on vacation until May 2. I'll look at this then. Note that the CL that you point to is a genuine bug fix to make Chromium behave correctly and restore frames that it wasn't. The last time that there was a bug filed against Chromium that was bisected to that CL, I closed it as a WontFix. Until I return, please note that having a page with an iframe, and doing - frame.url = /* some URL */ - history.pushState() is a RACE. pushState only pulls the URLs of frames once their loading is complete, and if you race the load of frames like this, you RISK CAPTURING BLANK FRAMES. If you want progress before I return, contact the Gmail team and verify that they are not doing this.
,
Apr 27 2016
,
Apr 27 2016
Re comment 26: Simpler repro that seems to be reliable: 2) open a chat to someone 3) switch to another label (like "starred") This is almost guaranteed to be the issue. In step 2, you are creating a whole bunch of frames and loading them. In step 3, you pushState(). This is a race. The Gmail chat system should ensure that all of its frames are loaded. Navigating backwards and forwards WILL overwrite the URLs of the frames, and WILL RESTORE frames to being about:blank if they were before. My CL made Chromium comply with the standards in this regard. When I return I will verify this and re-file this bug in the internal bug tracker.
,
Apr 27 2016
Thanks avi@, A few things, 1) I'm not sure gmail is even calling pushstate, they are simply tweaking the hash. The proposed fix of delaying/blocking # navigation based on child iframe loading state seems intractable. 2) Gmail doesn't even have good insight over the loading states of Hangout's internal iframes; bubbling that state up to the gmail frame via postmessages (if I understand your proposed fix correctly), is a huge amount of work. I understand if your steps 2 and 3 were something we were programmatically doing, but steps 2 and 3 are explicit user actions across different domains, and synchronizing them is just nearly impossible. I don't have permission to see the original bug that this was intended to fix, but I'm a little surprised that this is WAI since AFAICT this does doesn't seem to affect other browsers.
,
Apr 27 2016
That seems like a pretty big web exposed change too, was there an Intent to Ship? I feel like we should have added some counters to see how much content we're breaking. I don't think sites assume pushState() + Back is going to navigate all the iframes in the page. This also makes us divergent with Safari, by matching Firefox.
,
Apr 27 2016
I thought there were cases where the back button causes iframes to navigate in all browsers and cases where it ignored iframes. I didn't realize this wasn't interoperably implemented across browsers. IMO, the goal here should be interoperability. So the first step needs to be seeing what browsers actually do. I'd compare Safari, Chrome, Edge and Firefox for this. Once we've done that analysis, we can figure out the right incremental steps towards interoperability.
,
Apr 27 2016
Ah. I see some analysis was already done at https://github.com/whatwg/html/issues/546.
,
Apr 27 2016
It seems to me that this change isn't web compatible and we should revert and come up with a solution that solves the original problem without breaking gmail. Then we can get that specced hopefully and work towards interoperability.
,
Apr 27 2016
I also agree with Elliott that there may be a paint bug that is triggered somehow by this change. We should investigate that also.
,
Apr 27 2016
It sounds like this has only been reproduced in Chrome 51, but the original change (issue 542299) made it into Chrome 50, right? I'm just trying to understand to what extent we may have a compat emergency on our hands here. It's probably too late to change anything for M50, but is it possible something else is causing this problem to be worse in M51? Issue 600534 looks like the previous discussion about this. The spec discussion is here: https://github.com/whatwg/html/issues/546. But big picture, it doesn't much matter what the spec says (or what is more logical) if we're breaking real websites (like Gmail) - we need to make a thoughtful/informed decision around the compat risk. > I don't think sites assume pushState() + Back is going to navigate all the iframes in the page. This also makes us divergent with Safari, by matching Firefox. This issue is just about the special case of frames and navigation to about:blank, right? If the frame had already navigated, then back() in the top level document absolutely would navigate the iframe history as well in all browsers (the spec says the history operates on the "joint session history"), right? From the discussion in https://github.com/whatwg/html/issues/546 it sounds like the change here actually made us match Safari's behavior more closely (matching Firefox exactly) and only diverged from Edge's behavior.
,
Apr 27 2016
I'm on M50 and repros pretty consistently for me. I'm not sure what the intended behavior is but this might be aggravated by a paint behavior? When doing the repro steps as mentioned above, the iframe src url of the chat window doesn't actually change, but it still triggers a navigation/reload event. I feel like 95% of this bug is because navigation events on iframes to the exact same url is not a noop.
,
Apr 27 2016
Re comment 32, > That seems like a pretty big web exposed change too, was there an Intent to Ship? Elliott, this was intended as a bug fix. The logic behind Chrome's behavior was hard to make a rational case for, and it caused a renderer kill that was encountered by users. I couldn't imagine that any reasonable web page would rely on a snapshot of about:blank *not* being restored, given that half the browsers out there do something different. As I noted earlier, I'm on vacation until Tuesday, so I only stated my best guess as to what's going on. If we can wait, I can get a precise answer for you as to what's going on. If we can't wait then someone else will have to dig.
,
Apr 29 2016
I did some testing in the paint code, and the didLayoutWithPendingStylesheets mechanism never seems to hit. Also, commenting out the call sites to IgnoringPendingStylesheet also did nothing. Must be something else.
,
May 2 2016
Can someone specify which OS this occurs on? Seems like OS-All based on discussion but only repro appears limited to Linux. Will try OS-All, correct me if I'm wrong. In the future, please ensure any bug tagged as a release block has an OS, as that's how release managers track them.
,
May 2 2016
All OSes.
,
May 4 2016
This is super bad lately, I don't know what changed but I saw it 10 times today. We really need to revert that patch.
,
May 4 2016
Just to update: Able to reproduce the issue on mac 10.11 chrome version canary 52.0.2724.0 following the steps in comment #26 - Please find the screenshot
,
May 4 2016
,
May 4 2016
It's not exactly the same as the other bug, but it's close. In this case, we have the same race:
2) open a chat to someone
3) switch to another label (like "starred")
Rather than snapshotting the state of the page with pushState, the page is doing a hash navigation to create a new navigation entry. But the effect is the same: we snapshot the chat mole before it's loaded:
> {target “<!--framePath //<!--frame3-->-->”, url “about:blank”, state {}, children []}
and with the bugfix, we're properly restoring that frame's URL....
If this is critical enough, we can revert the fix, but this doesn't solve the much deeper problem. Besides the fact that the fix is a fix for a real crasher, we're hitting an even deeper problem with how navigation works in browsers.
https://github.com/whatwg/html/issues/546 was quoted, but it doesn't even _start_ to plumb the depths of the disaster that HTML loading is. Let me pose a question: why doesn't Firefox suffer from this issue, even though it resets frames like I do in the fix?
I'm willing to bet (though I haven't tested it) that when it snapshots a page for navigation (during pushState and hash change), it grabs a loading URL even if it hasn't yet committed. That's almost certainly a violation of the spec (which says that we have to grab the URLs from the document), but that means that a web author can sanely order subframe loads with respect to navigation.
The basic question boils down to: Do we want
- frame.url = URL // frame was about:blank
- history.pushState() or window.location.hash = x
to be a race?
It seems like all browsers say no. Firefox (untested but it appears this way) says no by violating the spec and capturing the url in the pushState. Chrome used to say no by capturing the blank but not restoring any subframes.
This is a nasty corner case and I don't know what the right thing to do here is.
,
May 4 2016
The Firefox behavior sounds more sane to me from a developer's point of view. If the frame is viewed as a model object, then it's natural for history.pushState() to save the url currently on the model, even if that URL hasn't committed yet. Maybe I'm missing a subtlety? What do you mean exactly by "capturing the blank but not restoring any subframes"? Trying to understand the difference between that and the current state at ToT in Chromium.
,
May 4 2016
In any case, it is clear to me that this CL needs to be reverted, and a discussion had about the semantics of frame loading on the standards lists.
,
May 4 2016
> What do you mean exactly by "capturing the blank but not restoring any subframes"? See https://github.com/whatwg/html/issues/546 . In brief: 1. Start at a page. 2. Create an iframe with no src. (It is blank.) 3. Do history.pushState(). (It captures the URL of the iframe as "about:blank".) 4. Load the iframe. 5. Go backwards. Before the fix for bug 542299, going backwards would fail to restore the "about:blank" URL to the iframe. This is unintuitive and feels wrong (when you go back, you should put things back to the way they were), and causes NC_IN_PAGE_NAVIGATION kills (because the iframe was supposed to be blank and wasn't, the origin for the navigation was wrong). On the other hand, it papers over the race mentioned in comment 46 by not resetting the URL of the iframe to "about:blank" on a back navigation.
,
May 4 2016
Yeah that does seem wrong. Thanks for explaining.
,
May 4 2016
No problem. The standard seems broken, and someone needs to champion getting this sorted out, as this doesn't seem like a reasonable thing that developers can rely on. But I'm rather busy with other stuff, like the dialog work. I'm going to chat with a standards peep later, but I'm likely going to revert the fix with a fifty line comment explaining it.
,
May 4 2016
This is M50. :(
,
May 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b48cb31b164c516c109e751f1bd8d9f6eb24b147 commit b48cb31b164c516c109e751f1bd8d9f6eb24b147 Author: avi <avi@chromium.org> Date: Thu May 05 21:35:00 2016 Don't update subframes on parent frame commits. This reverts r371031. This reverts a genuine bugfix, but the bug is needed to mask a much deeper and severe problem. Discussion is ongoing there. BUG= 598043 , 542299, 600534, https://github.com/whatwg/html/issues/1191 TEST=disabling CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_site_isolation Review-Url: https://codereview.chromium.org/1949213002 Cr-Commit-Position: refs/heads/master@{#391906} [modify] https://crrev.com/b48cb31b164c516c109e751f1bd8d9f6eb24b147/content/browser/frame_host/navigation_controller_impl.cc [modify] https://crrev.com/b48cb31b164c516c109e751f1bd8d9f6eb24b147/content/browser/frame_host/navigation_controller_impl_browsertest.cc [modify] https://crrev.com/b48cb31b164c516c109e751f1bd8d9f6eb24b147/content/renderer/history_controller.cc
,
May 5 2016
,
May 6 2016
Just FYI: We're cutting M50 Stable RC on Monday @ 1:00 PM PST for M50 Stable refresh release on Tuesday (05/10). It will be safe to approve M50 merge once CL landed in trunk, baked/verified in Canary.
,
May 6 2016
Can we merge? It's a simple fix and puts behavior back to the way it used to be.
,
May 6 2016
Confirmed in 52.0.2726.0 canary.
,
May 6 2016
Approving merge to M50 branch 2661, based on comment #56 and #57. Please merge before 1:00 PM PST on Monday (05/09) so we can take it for next week stable refresh.
,
May 6 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/836596d5da4458b675160abf535db4d074152d4e commit 836596d5da4458b675160abf535db4d074152d4e Author: Avi Drissman <avi@chromium.org> Date: Fri May 06 18:39:36 2016 Don't update subframes on parent frame commits. This reverts r371031. This reverts a genuine bugfix, but the bug is needed to mask a much deeper and severe problem. Discussion is ongoing there. BUG= 598043 , 542299, 600534, https://github.com/whatwg/html/issues/1191 TEST=disabling CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_site_isolation Review-Url: https://codereview.chromium.org/1949213002 Cr-Commit-Position: refs/heads/master@{#391906} (cherry picked from commit b48cb31b164c516c109e751f1bd8d9f6eb24b147) Review URL: https://codereview.chromium.org/1954213003 . Cr-Commit-Position: refs/branch-heads/2661@{#662} Cr-Branched-From: ef6f6ae5e4c96622286b563658d5cd62a6cf1197-refs/heads/master@{#378081} [modify] https://crrev.com/836596d5da4458b675160abf535db4d074152d4e/content/browser/frame_host/navigation_controller_impl.cc [modify] https://crrev.com/836596d5da4458b675160abf535db4d074152d4e/content/browser/frame_host/navigation_controller_impl_browsertest.cc [modify] https://crrev.com/836596d5da4458b675160abf535db4d074152d4e/content/renderer/history_controller.cc
,
May 6 2016
[Automated comment] Reverts referenced in bugdroid comments, after merge request, needs manual review.
,
May 10 2016
Merge approved for M51 (branch 2704)
,
May 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d3db3aaf9ea914e4c2a4df55a8c7448defe8ddb5 commit d3db3aaf9ea914e4c2a4df55a8c7448defe8ddb5 Author: Avi Drissman <avi@chromium.org> Date: Tue May 10 18:30:28 2016 Don't update subframes on parent frame commits. This reverts r371031. This reverts a genuine bugfix, but the bug is needed to mask a much deeper and severe problem. Discussion is ongoing there. BUG= 598043 , 542299, 600534, https://github.com/whatwg/html/issues/1191 TEST=disabling CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_site_isolation Review-Url: https://codereview.chromium.org/1949213002 Cr-Commit-Position: refs/heads/master@{#391906} (cherry picked from commit b48cb31b164c516c109e751f1bd8d9f6eb24b147) Review URL: https://codereview.chromium.org/1965003002 . Cr-Commit-Position: refs/branch-heads/2704@{#479} Cr-Branched-From: 6e53600def8f60d8c632fadc70d7c1939ccea347-refs/heads/master@{#386251} [modify] https://crrev.com/d3db3aaf9ea914e4c2a4df55a8c7448defe8ddb5/content/browser/frame_host/navigation_controller_impl.cc [modify] https://crrev.com/d3db3aaf9ea914e4c2a4df55a8c7448defe8ddb5/content/browser/frame_host/navigation_controller_impl_browsertest.cc [modify] https://crrev.com/d3db3aaf9ea914e4c2a4df55a8c7448defe8ddb5/content/renderer/history_controller.cc
,
May 11 2016
Tested the issue on Windows 7, Mac 10.11.4, Ubuntu 14.04 using 50.0.2661.102 as per steps in comment #26.Observed that chat window is shown with styles same as earlier. Please find attached screencast. Marking it as TE-Verified. |
||||||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||||||
Comment 1 by pdr@chromium.org
, Mar 25 2016