New issue
Advanced search Search tips

Issue 752747 link

Starred by 13 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Aug 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 2
Type: Feature



Sign in to add a comment

Add --dump-html flag to headless chrome to dump all html not only body like with --dump-dom

Reported by liesislu...@gmail.com, Aug 5 2017

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.78 Safari/537.36

Steps to reproduce the problem:
1. run command from the image to. Headless chrome with --dump-dom flag

What is the expected behavior?
get full document html including <title> tag which is in <head> part

What went wrong?
Got only <body> part w/o <head> and <title>. No way to get title's value.

Did this work before? No 

Does this work in other browsers? Yes

Chrome version: 60.0.3112.78  Channel: n/a
OS Version: OS X 10.12.6
Flash Version: 

While --dump-dom might be already used by people and they assume it gives only <body> part i suggest using another flag to dump full rendered html --dump-html
 
img.png
79.3 KB View Download
Components: Internals>Headless
Labels: -Type-Bug -Hotlist-Interop Type-Feature
Status: Untriaged (was: Unconfirmed)
Labels: OS-Android OS-Chrome OS-Fuchsia OS-Linux OS-Windows
Status: Available (was: Untriaged)
Thanks for the feature request!

Is there a reason why this cannot be done through DevTools' Runtime.Evaluate command?

Our policy is to keep headless flags at a minimum, and only implement those that aren't possible through devtools or can be example of using c++ bindings (such as --dump-dom or --screenshot). 

In theory, it could be done but i can't find a stable solution. 

I fully understand that it's better to have fewer flags but i think there was a bad decision made while designing --dump-dom and now we need --dump-html or update --dump-dom which might break something. Which, by the way, in theory, could also be done with DevTools API and what i see from code at https://cs.chromium.org/chromium/src/headless/app/headless_shell.cc?dr=C&l=346 it is using dev tools.

I have talked with 3 different persons who tried to use dev tools to get full HTML and all had some issues. I personally ran sample codes provided by Google Blog https://developers.google.com/web/updates/2017/04/headless-chrome and some times it does work but i got random connection reset/refuse or similar errors time to time and it's super hard to debug random bugs. Probably all could be solved but after x hours it becomes frustrating just to deal with all this just to get full html when we have --dump-dom which gives *almost* full html.
To get title currently it's needed to have a lot more code and it's needed to be managed, tested. A lot more dependencies. I wonder why --dump-dom doesn't give full html in a first place.

Chromium has all kind of tests and flag would just work for everyone every time. It's an easy addition. It would never break. It is critical and probably will be one of the most used parts of headless chrome. All the UI testing environments would benefit, all HTML parsers would benefit. 

I see couple ways how people are using headless chrome and developer experience is OK/great for all except getting full html:

1. get full final html (run js etc. and get final html). Developer experience: just run command line. And to get the head part - learn dev tools API, make the connection work, test, monitor. All crazy stuff for a trivial task. Frustrating.
2. run any automation inside chrome using dev tools API. Developer experience: learn all dev tools api and make connection work, tested, monitored. You can expect to learn more when you want to manage page automatically inside chrome so it's OK.
3. screenshot output. DX: just run command line. Great.
4. pdf output. DX: just run command line. Great.



Swapping document.body.outerHTML to document.documentElement.innerHTML sounds like a good change to me -- that's what the flag is meant to do anyway.
It sounds good for me too but it might introduce issues for other developers if they assume that head is not there and will not be there. 

I would suggest introducing this change with some depreciation warning. Full roadmap could be:

1. introduce --dump-html and add depreciation warning for --dump-dom 
2. after several full release cycles drop --dump-dom completely

--dump-html is more intuitive than --dump-dom because:

1. The thing flag return is not DOM model but text. HTML is text. DOM is a module with all APIs and stuff, not only plain text html.
2. HTML is more recognizable by all developers. Experienced and not.
I'll argue against adding a new flag for the same reasons dvallet@ mentioned: these flags are really just meant to be quick debugging aids and anyone doing anything serious with headless is assumed to be using the DevTools protocol.

I suggest we make --dump-dom return the full document and announce the change on headless-dev@ (and also make headless_example.cc do the same).

Comment 9 Deleted

I think breaking backward compatibility is never a good idea. People are doing all kind of stuff and i wonder if it's OK to assume that flags are not used in production. 

Anyways, i look forward to having any flag --dump-dom or --dump-html to get full HTML w/o any extra dependencies.
Project Member

Comment 11 by bugdroid1@chromium.org, Aug 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1328136ea1a5a11aa43c008a9314df00033d6cba

commit 1328136ea1a5a11aa43c008a9314df00033d6cba
Author: Sami Kyostila <skyostil@chromium.org>
Date: Tue Aug 22 11:46:18 2017

headless: Dump entire document with --dump-dom

Change from using document.body.outerHTML to
document.documentElement.outerHTML and added the doctype when dumping
the document (--dump-dom) so that things like the head tag are
also produced.

BUG= 752747 

Change-Id: I2fc383bb68097c3f25ecd91494a3f92e8aacb545
Reviewed-on: https://chromium-review.googlesource.com/623731
Commit-Queue: Sami Kyöstilä <skyostil@chromium.org>
Reviewed-by: Peter Beverloo <peter@chromium.org>
Cr-Commit-Position: refs/heads/master@{#496282}
[modify] https://crrev.com/1328136ea1a5a11aa43c008a9314df00033d6cba/headless/app/headless_example.cc
[modify] https://crrev.com/1328136ea1a5a11aa43c008a9314df00033d6cba/headless/app/headless_shell.cc

Status: Fixed (was: Available)

Sign in to add a comment