New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 742397 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 807640
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug



Sign in to add a comment

Share FrameIsAd data with content consumers

Project Member Reported by jkarlin@chromium.org, Jul 13 2017

Issue description

TopDocumentIsolation would like to know which frames are ads. We should make the information that PageLoadMetrics has available to others, including content/.

As a first thought, we could create a NavigationHandle::AdDetected() method. See https://docs.google.com/document/d/1fZj6g5Xjhk9ddu3ljyxCCitiNxsVd8E8H3vN7sFtYSw/edit for more detail.
 
Description: Show this description
Cc: lukasza@chromium.org creis@chromium.org
CC'ing related people. What do you think of adding NavigationHandle::AdDetected?
I support the idea of associating an |is_this_an_ad| bit with a NavigationHandle.

I think it would be great if //content didn't have to know about the |is_this_an_ad| bit specifically, but instead could provide a generic way of associating extra data with a NavigationHandle.  I think this can be accomplished by having NavigationHandle inherit from base::SupportsUserData.  I think this decoupling is desirable, because 1) different //content embedders (Chrome / Opera / Brave / ...) might want to provide different ad-detection heuristics for TDI and 2) the mechanism can be used for other scenarios where extra data needs to be associated with a NavigationHandle (without having to teach the //content layer about them via new delegates and/or methods).

If NavigationHandle inherited from base::SupportsUserData, then ads-aware code could use the following class to store the |is_this_an_ad| bit:

    class NavigationHandleAdsData : public base::SupportsUserData::Data {
     public:
      static NavigationHandleAdsData* FromNavigationHandle(
          NavigationHandle* navigation_handle) {
        return static_cast<NavigationHandleAdsData*>(
            navigation_handle->GetUserData(kUserDataKey));
      }

      static NavigationHandleAdsData* CreateFor(NavigationHandle* navigation_handle) {
        DCHECK(!FromNavigationHandle(navigation_handle));
        navigation_handle->SetUserData(kUserDataKey, new NavigationHandleAdsData());
        return FromNavigationHandle(navigation_handle);
      }

      ~NavigationHandleAdsData() override {};

      bool is_navigating_to_an_ad() { return is_navigating_to_an_ad_; }

      void set_is_navigating_to_an_ad(bool new_value) {
        is_navigating_to_an_ad_ = new_value;
      }

     private:
      static const char kUserDataKey[] = "AdsData";
      bool is_navigating_to_an_ad_ = false;
    };

The code that wants to check if a navigation will end up in an ad (e.g. code in AdsPageLoadMetricsObserver or the TopDocumentIsolation code) could do this like so:

    bool IsAdFrame(content::NavigationHandle* nav_handle) {
      if (IsGoogleAd(nav_handle))  // url + frame_name checks here
        return true;

      NavigationHandleAdsData* ads_data =
          NavigationHandleAdsData::FromNavigationHandle(nav_handle);
      if (ads_data)
        return ads_data->is_navigating_to_an_ad();

      return false;
    }

The code in AdsPageLoadMetricsObserver that wants to mark a navigation as associated with an ad could do it like this:

    NavigationHandleAdsData* ads_data =
        NavigationHandleAdsData::CreateFor(nav_handle);
    ads_data->set_is_navigating_to_an_ad(...);


AFAIK, AdsPageLoadMetricsObserver already has access to a NavigationHandle both when 1) it wants to read the ads bit and 2) when it wants to set the ads bit.  OTOH, I note that currently the TopDocumentIsolation decision is made in RenderFrameHostManager::DetermineSiteInstanceForURL which doesn't have direct access to a NavigationHandle.  I am a little bit surprised by this - it seems that there should be a handle present at this point (at commit time) but I have a bit of trouble trying to figure out how to get hold of it.
I think user data makes sense. Happy to take that approach. Would like to hear from //content owners if they have a preference.
Cc: lpz@chromium.org
I've just realized that RenderFrameHostManager::DetermineSiteInstanceForURL can extract a NavigationHandleImpl via RenderFrameImpl::navigation_handle accessor.  So - there should be no problem with getting a hold of a NavigationHandle in the TopDocumentIsolation path.
Let me add my current understanding of why a *synchronous*
    bool IsAdFrame(base::StringPiece frame_name, const GURL& frame_url);
is insufficient:

1. SubframeNavigationFilteringThrottle::WillStartRequest and WillRedirectRequest asynchronously check the URL and store the reponse in load_policy_ field from SubframeNavigationFilteringThrottle::OnCalculatedLoadPolicy.

2. SubframeNavigationFilteringThrottle::WillProcessResponse notifies observers what the current policy is.  This includes AdsPageLoadMetricsObserver::OnSubframeNavigationEvaluated which marks the frame as an ad.

3. At commit time, TopDocumentIsolation or AdsPageLoadMetricsObserver want to know about the ad-status calculated in 1 and 2 above.

+nasko@ in case he wants to chime in while creis@ is OOO - from a //content OWNER perspective the main decision here is whether content::NavigationHandle should be a base::SupportsUserData.
Cc: nasko@chromium.org
really adding nasko@...
jkarlin@ - I see a comment in AdsPageLoadMetricsObserver::OnSubframeNavigationEvaluated that says that WOULD_DISALLOW is only seen in dry runs.  I guess that also means that AdsPageLoadMetricsObserver::DetectSubresourceFilterAd would only work in dry runs.  Does that mean that TopDocumentIsolation cannot reliably depend on this kind of detection being available?
FWIW, I've updated https://codereview.chromium.org/2946113002 to:
- use class NavigationHandleAdsData : public base::SupportsUserData::Data
  that subresource filters use for marking navigations as ads
- use NavigationHandle in the TopDocumentIsolation API
  (instead of separately passing the frame name and URL)
Re #10, yes you'll only get SubResourceFilter detection in dry runs. It's good for metrics, but not so great for TDI to rely on.


Comment 13 by creis@chromium.org, Jul 19 2017

Components: UI>Browser>Navigation Internals>Sandbox>SiteIsolation
The SupportsUserData approach sounds good to me.  I'll take a look at the CL.
Before relanding https://codereview.chromium.org/2946113002, I'd like to propose moving the ad bits to RenderFrameHost. It's data that consumers will want (I can think of one in development off the top of my head) after navigation has completed and the NavigationHandle is long gone.
creis@, what do you think about comment 14? Lukasza@ is out for a few weeks and I'd be interested in starting on that work soon.


Comments 14-15: There's kind of a chicken and egg problem there, isn't there?  With PlzNavigate, we don't really have the RenderFrameHost until after the process model decision has been made: we need to know the ad status before we pick which RFH to use.  (Plus RFH doesn't have SupportsUserData either, but I guess that can probably be changed if needed.)

Also, is this bit always sticky for a frame, or can it sometimes be document-specific?  In theory a frame could be navigated from an ad to a non-ad page, but I'm not sure if you're intentionally excluding that possibility from your tracking.
Re 16: Good point that we won't be able to write the ad bits to the RFH until the RFH is actually created, which is too late for the TDI case. 

In terms of stickiness, it would be document specific. So the ad bits associated with the RFH would be for the last committed url. But, as mentioned above, that's too late for TDI. 
Okay, I can see keeping it in NavigationHandle. However, there is need to make the bits available to content:: as well. I'll look into an implementation of that and add you (Charlie) as a reviewer.
Sounds good-- I'll be curious to see the content/ use cases for it.
Happy to chat about it. I've found a way to signal from chrome:: to content:: as necessary, so no need to have it in content:: for now.
Any update here?
I don't believe TDI ever used the ads data. We do have plans to move the detection and tagging into the render process: https://docs.google.com/document/d/1ELsbnZiLK3VYxAJkwHYnfi6F32F2BqRVkPPR4JH7flM/edit

Comment 23 by dproy@chromium.org, Mar 15 2018

100 day bug triage ping: any update here? 
Mergedinto: 807640
Status: Duplicate (was: Assigned)

Sign in to add a comment