Snap-it should capture external images. |
||||||||||
Issue descriptionBreakout from: https://chromium-review.googlesource.com/c/chromium/src/+/653601/2/tools/perf/page_sets/static_top_25_pages.py#50 many images show as broken even though they work when viewing the static html file locally in a browser, but not when serving through whatever proxy server telemetry is starting up to serve the static data files. As an example, try opening: file:///.../tools/perf/page_sets/static_top_25/googleplus.html locally, and then try running (with this patch): % ./tools/perf/run_benchmark --browser=system rasterize_and_record_micro.top_25 and watch the googleplus story when it comes up.
,
Sep 6 2017
For code pointer, static files are served by https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/core/memory_cache_http_server.py
,
Sep 6 2017
Continuing to use the googleplus page as a specific example, two reference images: 1. Obama's profile icon. An external image, with src='https://lh3.googleusercontent.com/-2lJYGtfXKwQ/AAAAAAAAAAI/AAAAAAAB2kQ/bw8UipwgoJE/s60-p-rw-no/photo.jpg' 2. Obama's profile page background image. Inspecting is strange as it gives a div with no background-image. That div's child is another div which includes a data-cpu attribute whose value is 'https://lh3.googleusercontent.com/-OQ6-R5H3sM0/UzyITLj0WZI/AAAAAAABjFs/TpxAySkAMqUFHmYJeJv_ugotKEL5QuLXwCJkCGAYYCw/s1000-fcrop64=1,00000000de04bc14/10928287974_132a6d890a_b.jpg' which is in fact the image we should be seeing. Thus I expect that both of these reduce in some manner to https://github.com/progers/snap-it/issues/1 though (2) seems like it may involve some wacky js/style issue as well. Ned's thought there from chat discussion is something along the lines of: (a) snap-it annotates external images with some pre-ordained identifiable attribute (b) telemetry snapshot wrapper script looks for those, fetches the images, stores them locally, and rewrites html to reference them in the new local position. Technically we probably don't need (a) since the wrapper script can look at the img src urls and make some effort to determine whether they're external since we know the url we requested. Possibly famous last words.
,
Sep 6 2017
We probably can't really use these static pages for the benchmark effectively until we fix this. See attached Amazon page amazon-telemetry.png (compare to amazon-local.png) for an example. Unless vmpstr@ thinks it would still be cool benchmark wise to benchmark raster with pretty much all images missing.
,
Sep 6 2017
The tablet image in the amazon hero image thing is an external image, src='https://images-na.ssl-images-amazon.com/images/G/01/kindle/merch/2017/837028456567/k-gw2-2pack-intel-1500x300._CB517351461_.jpg'.
,
Sep 7 2017
I agree that missing images should be treated as a high-prioty problem to fix. Brainstormed offline with Walter, and we felt the most complete way to solve it is within snap-it itself, as follows: 1. Snap-it collects all cross-domain image URLs (it's already doing this) 2. From output of #1, collect all origins 3. For each origin from #2, make an iframe 4. Inject a snap-it script into each iframe from #3 which will inject <img> elements and create base64 encodings of them, and send back to the root script 5. Emplace the base64 results into the holes for the cross-domain images This should work in all cases, and also has the correct referer and cookie arguments when fetching said images. The above algorithm requires an API for snap-it to request injecting things into new iframes. The same API could also be used to do the multi-iframe processing generally.
,
Sep 11 2017
,
Sep 27 2017
The proposal in c#6 may be overkill for a first step. Pursuing Ned's thought in c#3 is probably sufficient for most cases.
,
Sep 27 2017
Issue 769059 has been merged into this issue.
,
Oct 4 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/8894309be5ce6daa80d0fccdc4afacec21f472ce commit 8894309be5ce6daa80d0fccdc4afacec21f472ce Author: Walter Korman <wkorman@chromium.org> Date: Wed Oct 04 19:05:55 2017 Export external image element ids and urls. Bug: chromium:762705 Change-Id: I46818ada5d39cceb3c7ee64bb87e60a440305274 Reviewed-on: https://chromium-review.googlesource.com/698299 Commit-Queue: Walter Korman <wkorman@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/8894309be5ce6daa80d0fccdc4afacec21f472ce/third_party/snap-it/tests/tests.js [modify] https://crrev.com/8894309be5ce6daa80d0fccdc4afacec21f472ce/third_party/snap-it/HTMLSerializer.js [modify] https://crrev.com/8894309be5ce6daa80d0fccdc4afacec21f472ce/third_party/snap-it/content_script.js
,
Oct 4 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/6e1b9da33639fab613841190571911102f9b527f commit 6e1b9da33639fab613841190571911102f9b527f Author: Walter Korman <wkorman@chromium.org> Date: Wed Oct 04 19:52:41 2017 Simplify asDict field copy for maintainability. Bug: chromium:762705 Change-Id: I7c8229d4d773f85b4313a4200d0677356001ebe9 Reviewed-on: https://chromium-review.googlesource.com/698804 Commit-Queue: Walter Korman <wkorman@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/6e1b9da33639fab613841190571911102f9b527f/third_party/snap-it/HTMLSerializer.js
,
Oct 4 2017
Brief update and notes -- I have a hack patch that fetches external images and stores them locally. I'll break it out into a sequence of smaller patches. However, I am OOO from tomorrow until Oct 11. Current approach involves passing the intended local image subdirectory path in to the HTMLSerializer, and as we encounter external image urls, rewriting the url for that img element to the local path. The file portion of each image url is renamed to be based on the unique snap-it-generated element id such as 'snap-it1207.png'. When images are subsequently fetched by the snap_page script for say --snapshot-path=page.html they're placed in a subdirectory at same level as page.html with the suffix stripped, so in this case 'page'. I think this will also serve successfully via run_benchmark which will end up using something like 'file://static_top_25/page/snap-it1207.png' as a typical external img src value, but have not yet tested this. I also discussed how to upload these to GCS with nednguyen@. Current plan for short/medium term is to just upload_to_google_storage.py each individual file. Eventually say for 10K sites with ctpixeldiff we may want to upload as a .tar.gz archive, which upload_to_google_storage.py supports. However, telemetry doesn't currently support automatically unpacking such an archive when it fetches from GCS. We can extend it to do so. I noted that the simple techcrunch home page has near-200 images, many of which are 1x1 tracking pixels. Intent is not to add heuristics to filter what we fetch for now, since doing so is prone to error in one way or another.
,
Oct 4 2017
,
Oct 11 2017
,
Oct 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/9108e3d0a47e6a4a96f5f0c0cf22f2f63f4661ca commit 9108e3d0a47e6a4a96f5f0c0cf22f2f63f4661ca Author: Walter Korman <wkorman@chromium.org> Date: Thu Oct 12 17:26:19 2017 Ensure snapshot-path ends with .html suffix. Precursor to using the suffix-stripped snapshot-path as a directory into which external images will be fetched. We could alternately, or in addition, add an option to specify the directory for external image storage, but it seems like the user should always write output to a file ending with .html in any case. Bug: chromium:762705 Change-Id: I6e24792296e4362cf6540bb3496aa71187d75e8e Reviewed-on: https://chromium-review.googlesource.com/714451 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Walter Korman <wkorman@chromium.org> [modify] https://crrev.com/9108e3d0a47e6a4a96f5f0c0cf22f2f63f4661ca/telemetry/telemetry/internal/snap_page_util.py [modify] https://crrev.com/9108e3d0a47e6a4a96f5f0c0cf22f2f63f4661ca/telemetry/telemetry/internal/snap_page_util_unittest.py [modify] https://crrev.com/9108e3d0a47e6a4a96f5f0c0cf22f2f63f4661ca/telemetry/bin/snap_page
,
Oct 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/77da38694233bd9407c3edbdd944a01404288746 commit 77da38694233bd9407c3edbdd944a01404288746 Author: Walter Korman <wkorman@chromium.org> Date: Fri Oct 13 23:14:27 2017 Process image data urls explicitly. Bug: chromium:762705 Change-Id: Id972839be1dd5ec95a52f4d031a2457c2afcee4a Reviewed-on: https://chromium-review.googlesource.com/719422 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Walter Korman <wkorman@chromium.org> [modify] https://crrev.com/77da38694233bd9407c3edbdd944a01404288746/third_party/snap-it/tests/tests.js [modify] https://crrev.com/77da38694233bd9407c3edbdd944a01404288746/third_party/snap-it/HTMLSerializer.js
,
Oct 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/03a5bff8121a3887a3948dbd4c7e31b9f5341cca commit 03a5bff8121a3887a3948dbd4c7e31b9f5341cca Author: Walter Korman <wkorman@chromium.org> Date: Fri Oct 13 23:51:57 2017 Allow rewrite of external image urls to host locally. Bug: chromium:762705 Change-Id: Ic91e09245abee9803059c6aba60d25a3d461e642 Reviewed-on: https://chromium-review.googlesource.com/719353 Commit-Queue: Walter Korman <wkorman@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/03a5bff8121a3887a3948dbd4c7e31b9f5341cca/third_party/snap-it/tests/tests.js [modify] https://crrev.com/03a5bff8121a3887a3948dbd4c7e31b9f5341cca/third_party/snap-it/HTMLSerializer.js
,
Oct 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/733ecb1825d08c999a7fabb8a85383c5c87095b6 commit 733ecb1825d08c999a7fabb8a85383c5c87095b6 Author: Walter Korman <wkorman@chromium.org> Date: Mon Oct 16 17:48:59 2017 Strip any query params from file suffix. Bug: chromium:762705 Change-Id: I5266ec2a6e360c28b1a823c1808443bbc1ae8f82 Reviewed-on: https://chromium-review.googlesource.com/720059 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Walter Korman <wkorman@chromium.org> [modify] https://crrev.com/733ecb1825d08c999a7fabb8a85383c5c87095b6/third_party/snap-it/tests/tests.js [modify] https://crrev.com/733ecb1825d08c999a7fabb8a85383c5c87095b6/third_party/snap-it/HTMLSerializer.js
,
Oct 16 2017
https://chromium-review.googlesource.com/c/catapult/+/721725 is in flight and should allow us to re-snapshot the top 25 static pages with most images. Note that it does not yet support images set via 'background-image: url(...)'.
,
Oct 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/0e1491041002bd91ef4ab1dc8da5f9d7412f754b commit 0e1491041002bd91ef4ab1dc8da5f9d7412f754b Author: Walter Korman <wkorman@chromium.org> Date: Tue Oct 17 22:26:43 2017 Fetch external images after processing DOM. Bug: chromium:762705 Change-Id: I0e6769fba105c734dea0316a1c0b60e1991d8297 Reviewed-on: https://chromium-review.googlesource.com/721725 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Walter Korman <wkorman@chromium.org> [modify] https://crrev.com/0e1491041002bd91ef4ab1dc8da5f9d7412f754b/telemetry/telemetry/internal/snap_page_util_unittest.py [add] https://crrev.com/0e1491041002bd91ef4ab1dc8da5f9d7412f754b/telemetry/telemetry/internal/testing/image.html [modify] https://crrev.com/0e1491041002bd91ef4ab1dc8da5f9d7412f754b/telemetry/telemetry/internal/snap_page_util.py [modify] https://crrev.com/0e1491041002bd91ef4ab1dc8da5f9d7412f754b/third_party/snap-it/tests/tests.js [modify] https://crrev.com/0e1491041002bd91ef4ab1dc8da5f9d7412f754b/third_party/snap-it/HTMLSerializer.js [modify] https://crrev.com/0e1491041002bd91ef4ab1dc8da5f9d7412f754b/telemetry/bin/snap_page
,
Oct 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/0ba7405a1f44277040c4d960d6fd00c10b0067ca commit 0ba7405a1f44277040c4d960d6fd00c10b0067ca Author: Juan Antonio Navarro Pérez <perezju@chromium.org> Date: Wed Oct 18 16:16:51 2017 Revert "Fetch external images after processing DOM." This reverts commit 0e1491041002bd91ef4ab1dc8da5f9d7412f754b. Reason for revert: Blocking catapult roll See e.g.: https://chromium-review.googlesource.com/c/chromium/src/+/724272 Error is: Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/loader.py", line 252, in _find_tests module = self._get_module_from_name(name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/loader.py", line 230, in _get_module_from_name __import__(name) File "/b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/snap_page_util_unittest.py", line 13, in <module> from telemetry.internal import snap_page_util File "/b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/snap_page_util.py", line 10, in <module> import requests ImportError: No module named requests Original change's description: > Fetch external images after processing DOM. > > Bug: chromium:762705 > Change-Id: I0e6769fba105c734dea0316a1c0b60e1991d8297 > Reviewed-on: https://chromium-review.googlesource.com/721725 > Reviewed-by: Ned Nguyen <nednguyen@google.com> > Commit-Queue: Walter Korman <wkorman@chromium.org> TBR=nednguyen@google.com,wkorman@chromium.org Change-Id: I23506d03d2ad0425695599cfd8de48bd7f82355d No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: chromium:762705 Reviewed-on: https://chromium-review.googlesource.com/726219 Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> [modify] https://crrev.com/0ba7405a1f44277040c4d960d6fd00c10b0067ca/telemetry/telemetry/internal/snap_page_util_unittest.py [delete] https://crrev.com/3aa40a56e96598884631fbf4aa5368159f2a2168/telemetry/telemetry/internal/testing/image.html [modify] https://crrev.com/0ba7405a1f44277040c4d960d6fd00c10b0067ca/telemetry/telemetry/internal/snap_page_util.py [modify] https://crrev.com/0ba7405a1f44277040c4d960d6fd00c10b0067ca/third_party/snap-it/tests/tests.js [modify] https://crrev.com/0ba7405a1f44277040c4d960d6fd00c10b0067ca/third_party/snap-it/HTMLSerializer.js [modify] https://crrev.com/0ba7405a1f44277040c4d960d6fd00c10b0067ca/telemetry/bin/snap_page
,
Oct 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/79f62e6c96d8dcc4e25424941690c9d64db60562 commit 79f62e6c96d8dcc4e25424941690c9d64db60562 Author: Walter Korman <wkorman@chromium.org> Date: Wed Oct 18 18:30:02 2017 Reland 'Fetch external images in snap_page.' Updated to use urllib2 module rather than requests as the requests module is not installed on target systems. Reland of original change as: Fetch external images after processing DOM. https://chromium-review.googlesource.com/c/catapult/+/721725 Bug: chromium:762705 Change-Id: I9b914e3e2d9e9ad6e7be4d3908a09080d02ef731 Reviewed-on: https://chromium-review.googlesource.com/726246 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Walter Korman <wkorman@chromium.org> [modify] https://crrev.com/79f62e6c96d8dcc4e25424941690c9d64db60562/telemetry/telemetry/internal/snap_page_util_unittest.py [add] https://crrev.com/79f62e6c96d8dcc4e25424941690c9d64db60562/telemetry/telemetry/internal/testing/image.html [modify] https://crrev.com/79f62e6c96d8dcc4e25424941690c9d64db60562/telemetry/telemetry/internal/snap_page_util.py [modify] https://crrev.com/79f62e6c96d8dcc4e25424941690c9d64db60562/third_party/snap-it/tests/tests.js [modify] https://crrev.com/79f62e6c96d8dcc4e25424941690c9d64db60562/third_party/snap-it/HTMLSerializer.js [modify] https://crrev.com/79f62e6c96d8dcc4e25424941690c9d64db60562/telemetry/bin/snap_page
,
Oct 25 2017
WIP for processing background-image at https://chromium-review.googlesource.com/c/catapult/+/736904 I will clean it up and add tests soon.
,
Oct 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/5da4837140ae16586b2b9097efbb08a7f4726a05 commit 5da4837140ae16586b2b9097efbb08a7f4726a05 Author: Walter Korman <wkorman@chromium.org> Date: Wed Oct 25 23:42:59 2017 Light method restructuring in prep for background-image work. Rename fullyQualifiedXxx methods to qualifiedXxx for brevity. Rename fullyQualifiedURL method to qualifiedUrlForElement for specificity, and to enable the next item: Break out part of what was fullyQualifiedUrl into a separate qualifiedUrl method so that we can use it with a raw string url rather than an element. Create wrapUrl helper method to wrap a raw url string with the CSS required 'url("x")' boilerplate for use in upcoming work. Use Url rather than URL in our own names following common style guide precepts though it flies in the face of at least some JavaScript naming as URL. Bug: chromium:762705 Change-Id: I04e516987a5feba61638bbb5f357b1c68f9b9b50 Reviewed-on: https://chromium-review.googlesource.com/738824 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Walter Korman <wkorman@chromium.org> [modify] https://crrev.com/5da4837140ae16586b2b9097efbb08a7f4726a05/third_party/snap-it/tests/tests.js [modify] https://crrev.com/5da4837140ae16586b2b9097efbb08a7f4726a05/third_party/snap-it/HTMLSerializer.js
,
Oct 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/52d748d48b4d5b334c0416954ac73f4c352b6627 commit 52d748d48b4d5b334c0416954ac73f4c352b6627 Author: Walter Korman <wkorman@chromium.org> Date: Thu Oct 26 23:34:09 2017 Support snapping images defined in background-image style. Does not yet correctly handle multiple images in a single element's background-image style due to conflicting generated local filenames. Bug: chromium:762705 Change-Id: Ibb06e89dbd1a7888b8bb13affdec376ac2456e5c Reviewed-on: https://chromium-review.googlesource.com/739228 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Walter Korman <wkorman@chromium.org> [modify] https://crrev.com/52d748d48b4d5b334c0416954ac73f4c352b6627/third_party/snap-it/tests/tests.js [modify] https://crrev.com/52d748d48b4d5b334c0416954ac73f4c352b6627/third_party/snap-it/HTMLSerializer.js
,
Oct 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/eec0bded09a9804b9437ee2921b14965025a3b2d commit eec0bded09a9804b9437ee2921b14965025a3b2d Author: Walter Korman <wkorman@chromium.org> Date: Mon Oct 30 22:30:44 2017 Increase timeout for expensive JS operations. Noted timeout on a number of sites while attempting to re-snapshot the top 25 sites. For example: facebook, linkedin, wikipedia. Bug: chromium:762705 Change-Id: Ie14730ae69daa2222928fd4e1613e343f8b07bb4 Reviewed-on: https://chromium-review.googlesource.com/744482 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Walter Korman <wkorman@chromium.org> [modify] https://crrev.com/eec0bded09a9804b9437ee2921b14965025a3b2d/telemetry/telemetry/internal/snap_page_util.py
,
Nov 6 2017
Closing this as general external image support is now present. We can consider filing separate issues to track remaining work which I currently have in mind as: - fetch and embed non-external background-image images directly - handle multiple background-image images correctly - potential edge cases around blob: urls, long file names, frame set changing during snapshot - consider updating telemetry infra to handle fetching/unpacking archives from GCS to more performantly handle thousands of static images
,
Nov 15 2017
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by wkorman@chromium.org
, Sep 6 2017