Add --dump-html flag to headless chrome to dump all html not only body like with --dump-dom
Reported by
liesislu...@gmail.com,
Aug 5 2017
|
||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.78 Safari/537.36 Steps to reproduce the problem: 1. run command from the image to. Headless chrome with --dump-dom flag What is the expected behavior? get full document html including <title> tag which is in <head> part What went wrong? Got only <body> part w/o <head> and <title>. No way to get title's value. Did this work before? No Does this work in other browsers? Yes Chrome version: 60.0.3112.78 Channel: n/a OS Version: OS X 10.12.6 Flash Version: While --dump-dom might be already used by people and they assume it gives only <body> part i suggest using another flag to dump full rendered html --dump-html
,
Aug 6 2017
,
Aug 6 2017
This issue seems to be easily solved while --dump-dom is the same thing, just javascript string has to be changed. https://cs.chromium.org/chromium/src/headless/app/headless_shell.cc?dr=C&l=346 JS string should be `document.documentElement.innerHTML` to get full html Related code: https://cs.chromium.org/chromium/src/headless/app/headless_shell_switches.h?dr=C&l=17 https://cs.chromium.org/chromium/src/headless/app/headless_shell_switches.cc?dr=C&l=27 https://cs.chromium.org/chromium/src/headless/app/headless_shell.cc?dr=C&l=328 https://cs.chromium.org/chromium/src/headless/app/headless_shell.cc?dr=C&l=546
,
Aug 7 2017
Thanks for the feature request! Is there a reason why this cannot be done through DevTools' Runtime.Evaluate command? Our policy is to keep headless flags at a minimum, and only implement those that aren't possible through devtools or can be example of using c++ bindings (such as --dump-dom or --screenshot).
,
Aug 7 2017
In theory, it could be done but i can't find a stable solution. I fully understand that it's better to have fewer flags but i think there was a bad decision made while designing --dump-dom and now we need --dump-html or update --dump-dom which might break something. Which, by the way, in theory, could also be done with DevTools API and what i see from code at https://cs.chromium.org/chromium/src/headless/app/headless_shell.cc?dr=C&l=346 it is using dev tools. I have talked with 3 different persons who tried to use dev tools to get full HTML and all had some issues. I personally ran sample codes provided by Google Blog https://developers.google.com/web/updates/2017/04/headless-chrome and some times it does work but i got random connection reset/refuse or similar errors time to time and it's super hard to debug random bugs. Probably all could be solved but after x hours it becomes frustrating just to deal with all this just to get full html when we have --dump-dom which gives *almost* full html. To get title currently it's needed to have a lot more code and it's needed to be managed, tested. A lot more dependencies. I wonder why --dump-dom doesn't give full html in a first place. Chromium has all kind of tests and flag would just work for everyone every time. It's an easy addition. It would never break. It is critical and probably will be one of the most used parts of headless chrome. All the UI testing environments would benefit, all HTML parsers would benefit. I see couple ways how people are using headless chrome and developer experience is OK/great for all except getting full html: 1. get full final html (run js etc. and get final html). Developer experience: just run command line. And to get the head part - learn dev tools API, make the connection work, test, monitor. All crazy stuff for a trivial task. Frustrating. 2. run any automation inside chrome using dev tools API. Developer experience: learn all dev tools api and make connection work, tested, monitored. You can expect to learn more when you want to manage page automatically inside chrome so it's OK. 3. screenshot output. DX: just run command line. Great. 4. pdf output. DX: just run command line. Great.
,
Aug 7 2017
Swapping document.body.outerHTML to document.documentElement.innerHTML sounds like a good change to me -- that's what the flag is meant to do anyway.
,
Aug 7 2017
It sounds good for me too but it might introduce issues for other developers if they assume that head is not there and will not be there. I would suggest introducing this change with some depreciation warning. Full roadmap could be: 1. introduce --dump-html and add depreciation warning for --dump-dom 2. after several full release cycles drop --dump-dom completely --dump-html is more intuitive than --dump-dom because: 1. The thing flag return is not DOM model but text. HTML is text. DOM is a module with all APIs and stuff, not only plain text html. 2. HTML is more recognizable by all developers. Experienced and not.
,
Aug 8 2017
I'll argue against adding a new flag for the same reasons dvallet@ mentioned: these flags are really just meant to be quick debugging aids and anyone doing anything serious with headless is assumed to be using the DevTools protocol. I suggest we make --dump-dom return the full document and announce the change on headless-dev@ (and also make headless_example.cc do the same).
,
Aug 8 2017
I think breaking backward compatibility is never a good idea. People are doing all kind of stuff and i wonder if it's OK to assume that flags are not used in production. Anyways, i look forward to having any flag --dump-dom or --dump-html to get full HTML w/o any extra dependencies.
,
Aug 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1328136ea1a5a11aa43c008a9314df00033d6cba commit 1328136ea1a5a11aa43c008a9314df00033d6cba Author: Sami Kyostila <skyostil@chromium.org> Date: Tue Aug 22 11:46:18 2017 headless: Dump entire document with --dump-dom Change from using document.body.outerHTML to document.documentElement.outerHTML and added the doctype when dumping the document (--dump-dom) so that things like the head tag are also produced. BUG= 752747 Change-Id: I2fc383bb68097c3f25ecd91494a3f92e8aacb545 Reviewed-on: https://chromium-review.googlesource.com/623731 Commit-Queue: Sami Kyöstilä <skyostil@chromium.org> Reviewed-by: Peter Beverloo <peter@chromium.org> Cr-Commit-Position: refs/heads/master@{#496282} [modify] https://crrev.com/1328136ea1a5a11aa43c008a9314df00033d6cba/headless/app/headless_example.cc [modify] https://crrev.com/1328136ea1a5a11aa43c008a9314df00033d6cba/headless/app/headless_shell.cc
,
Aug 22 2017
|
||||
►
Sign in to add a comment |
||||
Comment 1 by phistuck@chromium.org
, Aug 6 2017Labels: -Type-Bug -Hotlist-Interop Type-Feature
Status: Untriaged (was: Unconfirmed)