New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 762705 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Feature

Blocked on:
issue 785397

Blocking:
issue 773359
issue 762230



Sign in to add a comment

Snap-it should capture external images.

Project Member Reported by wkorman@chromium.org, Sep 6 2017

Issue description

Breakout from:

https://chromium-review.googlesource.com/c/chromium/src/+/653601/2/tools/perf/page_sets/static_top_25_pages.py#50

many images show as broken even though they work when viewing the static html file locally in a browser, but not when serving through whatever proxy server telemetry is starting up to serve the static data files.

As an example, try opening:

file:///.../tools/perf/page_sets/static_top_25/googleplus.html

locally, and then try running (with this patch):

% ./tools/perf/run_benchmark --browser=system rasterize_and_record_micro.top_25

and watch the googleplus story when it comes up. 
 
Blocking: 762230
Cc: pdr@chromium.org
Continuing to use the googleplus page as a specific example, two reference images:

1. Obama's profile icon. An external image, with src='https://lh3.googleusercontent.com/-2lJYGtfXKwQ/AAAAAAAAAAI/AAAAAAAB2kQ/bw8UipwgoJE/s60-p-rw-no/photo.jpg'

2. Obama's profile page background image. Inspecting is strange as it gives a div with no background-image. That div's child is another div which includes a data-cpu attribute whose value is 'https://lh3.googleusercontent.com/-OQ6-R5H3sM0/UzyITLj0WZI/AAAAAAABjFs/TpxAySkAMqUFHmYJeJv_ugotKEL5QuLXwCJkCGAYYCw/s1000-fcrop64=1,00000000de04bc14/10928287974_132a6d890a_b.jpg' which is in fact the image we should be seeing.

Thus I expect that both of these reduce in some manner to https://github.com/progers/snap-it/issues/1 though (2) seems like it may involve some wacky js/style issue as well.

Ned's thought there from chat discussion is something along the lines of:

(a) snap-it annotates external images with some pre-ordained identifiable attribute
(b) telemetry snapshot wrapper script looks for those, fetches the images, stores them locally, and rewrites html to reference them in the new local position.

Technically we probably don't need (a) since the wrapper script can look at the img src urls and make some effort to determine whether they're external since we know the url we requested. Possibly famous last words.
We probably can't really use these static pages for the benchmark effectively until we fix this. See attached Amazon page amazon-telemetry.png (compare to amazon-local.png) for an example. Unless vmpstr@ thinks it would still be cool benchmark wise to benchmark raster with pretty much all images missing.
amazon-local.png
476 KB View Download
amazon-telemetry.png
63.7 KB View Download
The tablet image in the amazon hero image thing is an external image, src='https://images-na.ssl-images-amazon.com/images/G/01/kindle/merch/2017/837028456567/k-gw2-2pack-intel-1500x300._CB517351461_.jpg'.
I agree that missing images should be treated as a high-prioty problem to fix.

Brainstormed offline with Walter, and we felt the most complete way to solve
it is within snap-it itself, as follows:

1. Snap-it collects all cross-domain image URLs (it's already doing this)
2. From output of #1, collect all origins
3. For each origin from #2, make an iframe
4. Inject a snap-it script into each iframe from #3 which will inject <img> elements and create base64
encodings of them, and send back to the root script
5. Emplace the base64 results into the holes for the cross-domain images

This should work in all cases, and also has the correct referer and cookie arguments
when fetching said images.

The above algorithm requires an API for snap-it to request injecting things into new
iframes. The same API could also be used to do the multi-iframe processing generally.
Cc: -nednguyen@chromium.org nedngu...@google.com
Labels: -Type-Task Type-Feature
Summary: Snap-it should capture external images. (was: Some images in static html snapshots don't render when served through telemetry web server.)
The proposal in c#6 may be overkill for a first step. Pursuing Ned's thought in c#3 is probably sufficient for most cases.
Cc: nednguyen@chromium.org khushals...@chromium.org chrishtr@chromium.org wkorman@chromium.org
 Issue 769059  has been merged into this issue.
Project Member

Comment 11 by bugdroid1@chromium.org, Oct 4 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/6e1b9da33639fab613841190571911102f9b527f

commit 6e1b9da33639fab613841190571911102f9b527f
Author: Walter Korman <wkorman@chromium.org>
Date: Wed Oct 04 19:52:41 2017

Simplify asDict field copy for maintainability.

Bug:  chromium:762705 
Change-Id: I7c8229d4d773f85b4313a4200d0677356001ebe9
Reviewed-on: https://chromium-review.googlesource.com/698804
Commit-Queue: Walter Korman <wkorman@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>

[modify] https://crrev.com/6e1b9da33639fab613841190571911102f9b527f/third_party/snap-it/HTMLSerializer.js

Status: Started (was: Assigned)
Brief update and notes -- I have a hack patch that fetches external images and stores them locally. I'll break it out into a sequence of smaller patches. However, I am OOO from tomorrow until Oct 11.

Current approach involves passing the intended local image subdirectory path in to the HTMLSerializer, and as we encounter external image urls, rewriting the url for that img element to the local path. The file portion of each image url is renamed to be based on the unique snap-it-generated element id such as 'snap-it1207.png'.

When images are subsequently fetched by the snap_page script for say --snapshot-path=page.html they're placed in a subdirectory at same level as page.html with the suffix stripped, so in this case 'page'.

I think this will also serve successfully via run_benchmark which will end up using something like 'file://static_top_25/page/snap-it1207.png' as a typical external img src value, but have not yet tested this.

I also discussed how to upload these to GCS with nednguyen@. Current plan for short/medium term is to just upload_to_google_storage.py each individual file. Eventually say for 10K sites with ctpixeldiff we may want to upload as a .tar.gz archive, which upload_to_google_storage.py supports. However, telemetry doesn't currently support automatically unpacking such an archive when it fetches from GCS. We can extend it to do so.

I noted that the simple techcrunch home page has near-200 images, many of which are 1x1 tracking pixels. Intent is not to add heuristics to filter what we fetch for now, since doing so is prone to error in one way or another.
Cc: -nednguyen@chromium.org
Blocking: 773359
Project Member

Comment 15 by bugdroid1@chromium.org, Oct 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/9108e3d0a47e6a4a96f5f0c0cf22f2f63f4661ca

commit 9108e3d0a47e6a4a96f5f0c0cf22f2f63f4661ca
Author: Walter Korman <wkorman@chromium.org>
Date: Thu Oct 12 17:26:19 2017

Ensure snapshot-path ends with .html suffix.

Precursor to using the suffix-stripped snapshot-path as a directory
into which external images will be fetched.

We could alternately, or in addition, add an option to specify the
directory for external image storage, but it seems like the user
should always write output to a file ending with .html in any case.

Bug:  chromium:762705 
Change-Id: I6e24792296e4362cf6540bb3496aa71187d75e8e
Reviewed-on: https://chromium-review.googlesource.com/714451
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Walter Korman <wkorman@chromium.org>

[modify] https://crrev.com/9108e3d0a47e6a4a96f5f0c0cf22f2f63f4661ca/telemetry/telemetry/internal/snap_page_util.py
[modify] https://crrev.com/9108e3d0a47e6a4a96f5f0c0cf22f2f63f4661ca/telemetry/telemetry/internal/snap_page_util_unittest.py
[modify] https://crrev.com/9108e3d0a47e6a4a96f5f0c0cf22f2f63f4661ca/telemetry/bin/snap_page

Project Member

Comment 16 by bugdroid1@chromium.org, Oct 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/77da38694233bd9407c3edbdd944a01404288746

commit 77da38694233bd9407c3edbdd944a01404288746
Author: Walter Korman <wkorman@chromium.org>
Date: Fri Oct 13 23:14:27 2017

Process image data urls explicitly.

Bug:  chromium:762705 
Change-Id: Id972839be1dd5ec95a52f4d031a2457c2afcee4a
Reviewed-on: https://chromium-review.googlesource.com/719422
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Walter Korman <wkorman@chromium.org>

[modify] https://crrev.com/77da38694233bd9407c3edbdd944a01404288746/third_party/snap-it/tests/tests.js
[modify] https://crrev.com/77da38694233bd9407c3edbdd944a01404288746/third_party/snap-it/HTMLSerializer.js

Project Member

Comment 17 by bugdroid1@chromium.org, Oct 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/03a5bff8121a3887a3948dbd4c7e31b9f5341cca

commit 03a5bff8121a3887a3948dbd4c7e31b9f5341cca
Author: Walter Korman <wkorman@chromium.org>
Date: Fri Oct 13 23:51:57 2017

Allow rewrite of external image urls to host locally.

Bug:  chromium:762705 
Change-Id: Ic91e09245abee9803059c6aba60d25a3d461e642
Reviewed-on: https://chromium-review.googlesource.com/719353
Commit-Queue: Walter Korman <wkorman@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>

[modify] https://crrev.com/03a5bff8121a3887a3948dbd4c7e31b9f5341cca/third_party/snap-it/tests/tests.js
[modify] https://crrev.com/03a5bff8121a3887a3948dbd4c7e31b9f5341cca/third_party/snap-it/HTMLSerializer.js

Project Member

Comment 18 by bugdroid1@chromium.org, Oct 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/733ecb1825d08c999a7fabb8a85383c5c87095b6

commit 733ecb1825d08c999a7fabb8a85383c5c87095b6
Author: Walter Korman <wkorman@chromium.org>
Date: Mon Oct 16 17:48:59 2017

Strip any query params from file suffix.

Bug:  chromium:762705 
Change-Id: I5266ec2a6e360c28b1a823c1808443bbc1ae8f82
Reviewed-on: https://chromium-review.googlesource.com/720059
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Walter Korman <wkorman@chromium.org>

[modify] https://crrev.com/733ecb1825d08c999a7fabb8a85383c5c87095b6/third_party/snap-it/tests/tests.js
[modify] https://crrev.com/733ecb1825d08c999a7fabb8a85383c5c87095b6/third_party/snap-it/HTMLSerializer.js

https://chromium-review.googlesource.com/c/catapult/+/721725 is in flight and should allow us to re-snapshot the top 25 static pages with most images.

Note that it does not yet support images set via 'background-image: url(...)'.
Project Member

Comment 21 by bugdroid1@chromium.org, Oct 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/0ba7405a1f44277040c4d960d6fd00c10b0067ca

commit 0ba7405a1f44277040c4d960d6fd00c10b0067ca
Author: Juan Antonio Navarro Pérez <perezju@chromium.org>
Date: Wed Oct 18 16:16:51 2017

Revert "Fetch external images after processing DOM."

This reverts commit 0e1491041002bd91ef4ab1dc8da5f9d7412f754b.

Reason for revert: Blocking catapult roll

See e.g.:
https://chromium-review.googlesource.com/c/chromium/src/+/724272

Error is:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/loader.py", line 252, in _find_tests
    module = self._get_module_from_name(name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/loader.py", line 230, in _get_module_from_name
    __import__(name)
  File "/b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/snap_page_util_unittest.py", line 13, in <module>
    from telemetry.internal import snap_page_util
  File "/b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/snap_page_util.py", line 10, in <module>
    import requests
ImportError: No module named requests

Original change's description:
> Fetch external images after processing DOM.
> 
> Bug:  chromium:762705 
> Change-Id: I0e6769fba105c734dea0316a1c0b60e1991d8297
> Reviewed-on: https://chromium-review.googlesource.com/721725
> Reviewed-by: Ned Nguyen <nednguyen@google.com>
> Commit-Queue: Walter Korman <wkorman@chromium.org>

TBR=nednguyen@google.com,wkorman@chromium.org

Change-Id: I23506d03d2ad0425695599cfd8de48bd7f82355d
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  chromium:762705 
Reviewed-on: https://chromium-review.googlesource.com/726219
Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org>
Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org>

[modify] https://crrev.com/0ba7405a1f44277040c4d960d6fd00c10b0067ca/telemetry/telemetry/internal/snap_page_util_unittest.py
[delete] https://crrev.com/3aa40a56e96598884631fbf4aa5368159f2a2168/telemetry/telemetry/internal/testing/image.html
[modify] https://crrev.com/0ba7405a1f44277040c4d960d6fd00c10b0067ca/telemetry/telemetry/internal/snap_page_util.py
[modify] https://crrev.com/0ba7405a1f44277040c4d960d6fd00c10b0067ca/third_party/snap-it/tests/tests.js
[modify] https://crrev.com/0ba7405a1f44277040c4d960d6fd00c10b0067ca/third_party/snap-it/HTMLSerializer.js
[modify] https://crrev.com/0ba7405a1f44277040c4d960d6fd00c10b0067ca/telemetry/bin/snap_page

Project Member

Comment 22 by bugdroid1@chromium.org, Oct 18 2017

WIP for processing background-image at https://chromium-review.googlesource.com/c/catapult/+/736904

I will clean it up and add tests soon.
Project Member

Comment 24 by bugdroid1@chromium.org, Oct 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/5da4837140ae16586b2b9097efbb08a7f4726a05

commit 5da4837140ae16586b2b9097efbb08a7f4726a05
Author: Walter Korman <wkorman@chromium.org>
Date: Wed Oct 25 23:42:59 2017

Light method restructuring in prep for background-image work.

Rename fullyQualifiedXxx methods to qualifiedXxx for brevity.

Rename fullyQualifiedURL method to qualifiedUrlForElement for
specificity, and to enable the next item:

Break out part of what was fullyQualifiedUrl into a separate
qualifiedUrl method so that we can use it with a raw string url
rather than an element.

Create wrapUrl helper method to wrap a raw url string with the
CSS required 'url("x")' boilerplate for use in upcoming work.

Use Url rather than URL in our own names following common style
guide precepts though it flies in the face of at least some
JavaScript naming as URL.

Bug:  chromium:762705 
Change-Id: I04e516987a5feba61638bbb5f357b1c68f9b9b50
Reviewed-on: https://chromium-review.googlesource.com/738824
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Walter Korman <wkorman@chromium.org>

[modify] https://crrev.com/5da4837140ae16586b2b9097efbb08a7f4726a05/third_party/snap-it/tests/tests.js
[modify] https://crrev.com/5da4837140ae16586b2b9097efbb08a7f4726a05/third_party/snap-it/HTMLSerializer.js

Project Member

Comment 25 by bugdroid1@chromium.org, Oct 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/52d748d48b4d5b334c0416954ac73f4c352b6627

commit 52d748d48b4d5b334c0416954ac73f4c352b6627
Author: Walter Korman <wkorman@chromium.org>
Date: Thu Oct 26 23:34:09 2017

Support snapping images defined in background-image style.

Does not yet correctly handle multiple images in a single element's
background-image style due to conflicting generated local filenames.

Bug:  chromium:762705 
Change-Id: Ibb06e89dbd1a7888b8bb13affdec376ac2456e5c
Reviewed-on: https://chromium-review.googlesource.com/739228
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Walter Korman <wkorman@chromium.org>

[modify] https://crrev.com/52d748d48b4d5b334c0416954ac73f4c352b6627/third_party/snap-it/tests/tests.js
[modify] https://crrev.com/52d748d48b4d5b334c0416954ac73f4c352b6627/third_party/snap-it/HTMLSerializer.js

Project Member

Comment 26 by bugdroid1@chromium.org, Oct 30 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/eec0bded09a9804b9437ee2921b14965025a3b2d

commit eec0bded09a9804b9437ee2921b14965025a3b2d
Author: Walter Korman <wkorman@chromium.org>
Date: Mon Oct 30 22:30:44 2017

Increase timeout for expensive JS operations.

Noted timeout on a number of sites while attempting to re-snapshot the
top 25 sites. For example: facebook, linkedin, wikipedia.

Bug:  chromium:762705 
Change-Id: Ie14730ae69daa2222928fd4e1613e343f8b07bb4
Reviewed-on: https://chromium-review.googlesource.com/744482
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Walter Korman <wkorman@chromium.org>

[modify] https://crrev.com/eec0bded09a9804b9437ee2921b14965025a3b2d/telemetry/telemetry/internal/snap_page_util.py

Status: Fixed (was: Started)
Closing this as general external image support is now present. We can consider filing
separate issues to track remaining work which I currently have in mind as:

- fetch and embed non-external background-image images directly
- handle multiple background-image images correctly
- potential edge cases around blob: urls, long file names, frame set changing
  during snapshot
- consider updating telemetry infra to handle fetching/unpacking archives
  from GCS to more performantly handle thousands of static images

Blockedon: 785397

Sign in to add a comment