Chrome 63 prerender, implemented as prefetch, needs an identifying HTTP header
Reported by
ari1...@gmail.com,
Dec 20 2017
|
|||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36 Steps to reproduce the problem: In Chrome 63, HTML like: <link href="http://www.yourdomain.com" rel="prerender"> will cause Chrome to generate a request to http://www.yourdomain.com. This request is indistinguishable from a link click. No request was sent for Chrome 58-62. Previous versions of Chrome allowed for JavaScript execution when the browser received the page during prerender. In combination with the visibility API, this allowed for identifying the request as a prerender, by issuing an additional tracking record. The visibility API allowed for detection of visibility state changes as well, if the page was actually displayed to the user. What is the expected behavior? The expected behavior is the ability to identify a prerender request from an HTTP Header, now that it is implemented as a prefetch. Prefetch requests include an HTTP Header, Purpose:prefetch The NoState Prefetch implementation of prerender in Chrome 63 (http://goo.gl/EJjTCM) does not include that header. Previous to switching to a prefetch like implementation of prerender, the visibility API was sufficient as a workaround for server side tracking. Adding a header was considered as part of the design of NoState Prefetch, and discarded. This decision should be revisited. It was predicated on a previous design decision not to implement a prerender header, in crbug.com/86175 . However, a part of the reason that crbug.com/86175 was marked WontFix was the ability to use the Visibility API to distinguish prerender requests. This is no longer possible because of the change in prerender implementation, since prefetch does not execute JavaScript. Note that google search results that return a top level site include a prerender link in the omnibar, so this problem is causing current issues with server side tracking. e.g. https://www.google.com/search?q=cnn includes <link href="http://www.cnn.com/" rel="prerender">, which causes Chrome 63 to prerender www.cnn.com For server side tracking, this new behavior is a serious problem. In Chrome 57, if we received a prerender, we could use the visibility API to cause the browser to send an additional HttpRequest, e.g. /ParentRequestWasAPrerender We then knew the parent request wasn't a user initiated request. If the user then did follow the link, we could use the visibility API to issue an additional HTTP Request, e.g. /ParentRequestWasViewedByUser This allows for accurate statistics and attribution based on traffic source -- we could identify which page views were user initiated. This ability was why an HTTP Header, while convenient, was not necessary. With Chrome 63, you can no longer issue additional request with the visibility API on prerender, since JavaScript is not executed. You would need to issue a request on every page view via JavaScript, and then discard those pageviews without such a request. The problem here is volume -- a typical site might have .1% prerender requests. Sending special indicator requests, just for prerenders, doesn't increase tracking volume or request volume significantly. However, sending an additional request on every page view via JavaScript, and then discarding page views without that request, effectively doubles the number of requests. What went wrong? Chrome prerenders are included in analytics as a user initiated action. Did this work before? Yes Chrome 57 (though it worked differently), also Chrome 62, where prerender was disabled Does this work in other browsers? Yes Chrome version: 63.0.3239.108 Channel: stable OS Version: OS X 10.13.2 Flash Version:
,
Dec 21 2017
,
Dec 21 2017
,
Dec 22 2017
ari1974@ - Thanks for filing the issue...!! Could you please provide consistent reproducible steps to test the issue from TE-end. This will help us in triaging the issue further. A screenshot/screencast explaining the issue will be more helpful. Thanks...!!
,
Dec 22 2017
@krajshree: This is a known issue, see also the blocking bug. We can consider this triaged; I forgot to change the status when I was organizing things. @ari1927: Thanks for the details, its very useful. This is the expected behavior for a <link rel=prerender> fetch. We are considering whether a header should be added, as is done for <link rel=prefetch>, even though there is no clear specification on the matter.
,
Dec 22 2017
,
Dec 26 2017
> You would need to issue a request on every page view via JavaScript, and then
> discard those pageviews without such a request.
Agreed that to count pages displayed, one would need to send pings from JS. The
sendBeacon() seems to be the recommended mechanism for this (survives renderer
deaths, low pri, and other corner case friendly).
> The problem here is volume -- a typical site might have .1% prerender
> requests. Sending special indicator requests, just for prerenders, doesn't
> increase tracking volume or request volume significantly.
Yes, this looks handy. Let me understand better: if the prerender is promoted
into visible content, the page visibility sends another ping to the server to
say that the view is in fact real. Is this something common to do or it is OK to
discard everything that originated from prerender?
> However, sending an additional request on every page view via JavaScript, and
> then discarding page views without that request, effectively doubles the
> number of requests.
Agreed that this is a lot.
Suppose we have NoState that fetches the main resource with an extra header.
How do we count the real page load that starts and reuses the cache entry
without server revalidation? Would we need to count it as a real view? If so,
how? AFAIR request initiator is not very reliable and would gladly tell 'served
from cache' even if there was server revalidation.
I can see an argument like:
prerender/nostate-prefetch are a small portion of traffic, and promotion
from them to visible is even smaller, so we do not need to count the latter.
However, according to our measurements, on search result pages the amount of
wasted prerenders is about equal to the amount of used ones.
Also a few quotes about the X-Purpose from eseidel@ in
https://bugs.webkit.org/show_bug.cgi?id=46529 which makes me extra skeptical:
eseidel> Why would we want to let servers know it's a prefetch request? We
eseidel> don't let them know other things about how we're going to display or
eseidel> not-display their content.
eseidel>
eseidel> For example, we don't tell servers that we're about to display their
eseidel> content in a display:none iframe. Or on what screen size, etc.
eseidel> Proxies also don't necessarily tell servers that they're a proxy and
eseidel> thus not displaying the content... This seems like an invalid bug to
eseidel> me.
eseidel>
eseidel> A request is a request. If servers feel they need to filter out
eseidel> requests based on what was done with that request, it seems they should
eseidel> be gathering that information differently (like via JavaScript?).
eseidel>
eseidel> I think we should follow whatever the relevant working group decides.
eseidel> So my comments may be invalid. I can only assume that more
eseidel> knowledgeable folks than I have thought about this issue. But on the
eseidel> surface, exposing this in a header seems at best a very slippery slope
eseidel> and likely just plain wrong. :)
,
Dec 26 2017
@pasko
The intent is to accurately measure user initiated page views. That information can then be used to measure and improve the effectiveness of the page in delivering value to the user, supporting metrics like bounce rate and linger time.
The unidentified prefetch being issued in Chrome 63 interferes with those metrics. In order to support those metrics, the initial request must be identifiable as a browser (rather than user) initiated page load, and there must be a mechanism to identify when the user actually intentionally requests the page. Both cases were previously supported in Chrome prerenders through the JavaScript visibility API.
This is especially important given the use of prerender in search results page from Google.
As you point out, if >50% of prerender requests are followed by an actual user initiated request, it is important to be able to tell when the user initiates the intentional request for the page. It isn't okay to just unconditionally disregard every prerender request.
The ability to see the state change should be made possible by compliance with the w3 spec:
{
"https://w3c.github.io/resource-hints/#dfn-prerender"
When prerendering a document the user agent MUST set the document's visibilityState ([PAGE-VISIBILITY]) value to prerender.
}
I'm not seeing the visibility state set with the NoState Prefetch implementation, though I might be missing the correct pathway for page reuse that would trigger this visibility setting.
When you say, "However, according to our measurements, on search result pages the amount of wasted prerenders is about equal to the amount of used ones." -- this might be true in aggregate, but for a particular target domain, the percentage of converted requests could be zero, and completely disregarding any requests made via prerender might be appropriate. For example, from experimentation, when I set the target page not to cache, Chrome 63 always rerequests the prerendered page when I click through on the link. Ideally, from a server side analytics standpoint, the logic would have sufficient information to handle any behavior, even if there is no diversity of behavior for prerenders for that particular domain because of cache settings used by that domain.
In terms of the quotes about the X-Purpose header from eseidel@ in https://bugs.webkit.org/show_bug.cgi?id=46529
Note that the header was implemented, and all prefetch requests send a Purpose:Prefetch request header in Chrome.
So despite this line of reasoning, the case for an identifying header for prefetch was considered compelling.
The Chrome 63 prerender is implemented as a prefetch, so the same logic and reasoning for a header applies.
The comment also refers to the use of JavaScript as a fallback, which is explicitly prevented, unique to the Chrome 63 prerender implementation.
,
Dec 27 2017
> I'm not seeing the visibility state set with the NoState Prefetch > implementation, though I might be missing the correct pathway for page reuse > that would trigger this visibility setting. You reading is correct. Running no JS when prefetched is a feature of NoState Prefetch, hence there is no way to use the page visibility API in this state. > When you say, "However, according to our measurements, on search result pages > the amount of wasted prerenders is about equal to the amount of used ones." -- > this might be true in aggregate, but for a particular target domain, the > percentage of converted requests could be zero, and completely disregarding > any requests made via prerender might be appropriate. I agree that for some origins amount of used prefetches can be very low compared to overall prefetches. These sites can be common, even though not the majority. We do not have per-origin stats for privacy reasons, so it is hard to tell how widespread the issue is. > For example, from experimentation, when I set the target page not to cache, > Chrome 63 always rerequests the prerendered page when I click through on the > link. Right, this is by design. NoState Prefetch just feeds the cache and disables the first revalidation if requested within 5 minutes (except for Vary mismatch, just like with regular <link rel=prefetch>), the real load then has a chance to get resources from cache. > Ideally, from a server side analytics standpoint, the logic would have > sufficient information to handle any behavior, even if there is no diversity > of behavior for prerenders for that particular domain because of cache > settings used by that domain. I am not following. With 'any' are you referring to any caching behavior or something else? > In terms of the quotes about the X-Purpose header from eseidel@ in > https://bugs.webkit.org/show_bug.cgi?id=46529 > > Note that the header was implemented, and all prefetch requests send a > Purpose:Prefetch request header in Chrome. So despite this line of reasoning, > the case for an identifying header for prefetch was considered compelling. > The Chrome 63 prerender is implemented as a prefetch, so the same logic and > reasoning for a header applies. The comment also refers to the use of > JavaScript as a fallback, which is explicitly prevented, unique to the Chrome > 63 prerender implementation. This reminds me that we discussed (on net-dev@/loading-dev@) the usefulness of setting NavigationType to "prefetch" for the prefetched loads (and a workaround that approximates it: transferSize being == 0 for prefetched resources). Let me restart the discussion.
,
Dec 27 2017
,
Dec 27 2017
,
Dec 27 2017
@pasko In regard to: "Ideally, from a server side analytics standpoint, the logic would have > sufficient information to handle any behavior, even if there is no diversity > of behavior for prerenders for that particular domain because of cache > settings used by that domain. I'm saying that, for the purposes of server side analytics, the target domain would need sufficient logic to handle both the case of a prerender request that is not then displayed to the user, and one that is displayed to the user. That is true even in the case where currently, the page is always rerequested from the target domain on user request, since that behavior could change with the prerender implementation. In the presence of an identifying header for the initial request, this is fairly easy. The target domain can add additional JavaScript to the page when we see the prefetch (or prerender) header. That logic will only trigger if the page is actually displayed to the user (since JavaScript is only executed when the user accesses the page), sending an additional server request (e.g. /ParentRequestWasViewedByUser). For proofing against future implementation changes, we can leverage the visibility API to make sure that extra JavaScript is only implemented with visibility state = visible. Take the case where .2% of all requests made to a given domain are prerender, and half of those are then followed by the user. .2% is actually very high, but makes the math below easier to follow. With the header, the request count/logic goes: 1000 page view requests to the server. 2 have a NoStatePrefetch header, and the target domain responds by adding special JavaScript to indicate if the page becomes visible. 1 of the 2 prerendered pages is displayed to the user. 1 /ParentRequestWasViewedByUser request sent to the server for that user. Total page views: 999 (998 without header, 1 with header but with follow on viewed request) Total server requests: 1001 Without the header, the request count/logic goes: 1000 page view requests to the server. 999 beacon requests send to the server indicating the page was visible (998 immediately, 1 eventually when the prerendered page was shown to the user). Total page views: 999 (999 beacon requests) Total server requests: 1999 The header logic describe above is fairly similar to the previous logic that leveraged the visibility API (and which Chrome 63 prevented due to the NoState Prefetch implementation).
,
Dec 27 2017
ah, I did not think about the server being intellectual enough to respond with a modified resource for visibility accounting. It is possible, though it does not allow revalidating the main resource lightly on repeated visits to the same page after a single nostate-prefetch, right?
,
Dec 27 2017
@pasko That is a good point. In any scenario where the target domain allows significant page caching, there needs to be some mechanism in JavaScript to account for repeated page views. The header wouldn't change that need. I think any site that allowed significant page caching would be relying on client side tracking, rather than server side tracking -- and would issue a tracking request for each view by the user via JavaScript. I suspect (but don't know) that many sites have no cache policies enabled for the main page, as they get more sophisticated about offering personalized content to users based on previous interactions. That is the case I'm familiar with -- no cache for the pages themselves, with CDN for every peripheral resource used by the page (css, js, images). Note that the ability to correctly identify when a user has previously viewed content is important in the personalized content case as well.
,
Dec 28 2017
OK OK, one can do intellectual localStorage/whatever-based accounting for repeated visits. As for cacheability of the main resource that nostate-prefetch sees, the the cacheable/nostore split is about 3/2, see: https://docs.google.com/document/d/16VCYGGWau483IMSxODpg5faZny1FJ6vNK2v-BuM5EhU/edit#bookmark=id.blthm5okzuig
,
Dec 28 2017
of course, cacheable does not mean usable .. for example, amazon.com always responds with 200 to validation requests, sending back the whole frontpage each time.
,
Sep 10
I have a tiny bit of good news. As of M69 (in Stable now) we send Purpose:prefetch headers with all nostate-prefetch requests, just as we do with <link rel=prefetch>. (Nostate-prefetch is what replaced Prerender as a more lightweight alternative that's easier to support in front of growing web platform capabilities). More details: https://chromium.googlesource.com/chromium/src/+/d52474c3202ace50d3a849ec2b19f7ae4191a3d5
,
Sep 10
@pasko Thanks. We tested our internal analytics to make sure we could distinguish the omnibar use case, using both the beta for M69 and the released version, and we now can distinguish. This change solves our issues.
,
Sep 10
\o/ |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by mattcary@chromium.org
, Dec 21 2017Cc: pasko@chromium.org mattcary@chromium.org