New issue
Advanced search Search tips

Issue 672313 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug



Sign in to add a comment

MHTML tests are not strongly validating the generated file

Project Member Reported by carlosk@chromium.org, Dec 8 2016

Issue description

Most of the existing tests that cover MHTML generation are asserting their results based on loading the created file as simple text and asserting on its contents. This makes the tests less robust and don't guarantee that the generated file is an actually valid MHTML file.

This almost allowed a change I was working on to land as tests passed but the created file was missing its footer, being an invalid MHTML file.

My current suggestion is to at least validate that the MHTML file is minimally valid as part of the tests checks. Going a bit further we could load the file as MHTML as there are tests that could enforce greater guarantees on their checks if they switched from sub-string matching to using DOM inspections.
 
content/browser/download/mhtml_generation_browsertest.cc indeed don't do a thorough verification.  The tests in chrome/browser/download/save_page_browsertest.cc OTOH, load the saved/generated MHTML file and verify (via find-in-page / Ctrl-F) that specific things still appear on the page after saving.
I didn't know about the existence of that test file. But the problem there is that the file is still being read as simple text. The content checks might be more thorough but, finally, the generated file could still be malformed.

But this could be a solution too: create one or more shared/common MHTML "validation" methods that check contents thoroughly but generically, so that we're as certain as possible that the file is valid MHTML.

This doesn't solve the text versus DOM analysis issue though. For instance, some tests check the number of occurrences of certain attributes in generated MHTML. But this doesn't assert that those are in the right places/elements.

Extra note: I noticed a few non-MHTML related tests in that file compare the full contents of the generated file with an expected file. I don't think we want to do the same for MHTML as its contents and ordering could more wildly vary. Also keeping all expected files up to date would be troublesome with the ongoing changes in MHTML saving logic.

Comment 3 by dim...@chromium.org, Dec 14 2016

Cc: jianli@chromium.org
Owner: carlosk@chromium.org
Status: Assigned (was: Available)
What would be the concrete possible steps to fix the situation?

Comment 4 by chili@chromium.org, Jan 25 2017

Labels: Hotlist-Fixit
Fixit update: good one for the fixit; hopefully can be done in less than 3 days.

Concrete steps:
- List all tests that check MHTML writing correctness.
- For each of them try implementing the loading step with our actual MHTML reader code and use the loaded page instead of plain text.
-- If that's impossible or impractical consider also loading the MHTML file using our reader and just validate it's well formed.
-- If even that's impossible or impractical then see how to improve the checks to better cover potentially malformed files.

Extra note: The motivation for filing this issue was that I was working on a change that had a side effect of sometimes creating invalid MHTML files (no footer) and our current automatic tests did not catch that problem.
Suggested codesearch for finding all MHTML tests:
https://cs.chromium.org/search/?q=mhtml+file:test%5B-/%5D*%5C.&type=cs
Using the query from #6 I built the following list of MHTML related test suites/classes:

content_browsertests:
 * MHTMLGenerationTest

webkit_unit_tests:
 * MHTMLTest
 * WebFrameSerializerTest
 * WebFrameSerializerSanitizationTest

browser_tests:
 * SavePageAsMHTMLBrowserTest (NOT Android)
 * ExtensionPageCaptureApiTest (NOT Android; some tests are ChromeOS only)

unit_tests:
 * OfflinePageMHTMLArchiverTest (Android only)

android_webview_test_apk: (Android only, obviously)
 * ArchiveTest


And to try to further justify this issue, while I was working on  issue 672292  I disabled the writing of the MHTML footer from the generation code and only two specific tests:
 * MHTMLGenerationTest.ViewedMHTMLContainsNoStoreContentIfNoCacheControlPolicy
 * MHTMLGenerationTest.ViewedMHTMLDoesNotContainNoStoreContent 

I worked a little with our MHTML parser and it is a robust parser that will not clearly fail if the file is malformed. I haven't investigated yet if this is a configurable setting but I can assert we cannot simply use it as is.

Next steps would be to investigate this possibility and if that's not possible then come up with other ways to check for well formed files. Some ideas for the latter case:
- check for the number of expected boundaries in the file
- check for precise or approximate file size.

Even if these new checks are not added to all tests listed in #7 they must at the very least be added to MHTMLGenerationTest.
Status: Started (was: Assigned)
Project Member

Comment 10 by bugdroid1@chromium.org, Apr 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8201c9cbc87f806a3fce3df667b36221c600caf0

commit 8201c9cbc87f806a3fce3df667b36221c600caf0
Author: carlosk <carlosk@chromium.org>
Date: Tue Apr 25 21:47:11 2017

MHTMLTest::TestMHTMLEncoding checks generated MTHML is well formed.

After generating MHTML data in memory checks that it is well formed by
parsing it using MHTMLParser.

BUG= 672313 

Review-Url: https://codereview.chromium.org/2841483007
Cr-Commit-Position: refs/heads/master@{#467125}

[modify] https://crrev.com/8201c9cbc87f806a3fce3df667b36221c600caf0/third_party/WebKit/Source/web/tests/MHTMLTest.cpp

Project Member

Comment 11 by bugdroid1@chromium.org, May 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0179de0d5ca89c2bbdcf24f7a583b304fc053502

commit 0179de0d5ca89c2bbdcf24f7a583b304fc053502
Author: carlosk <carlosk@chromium.org>
Date: Tue May 02 01:07:13 2017

WebFrameSerializerSanitizationTest validates generated MHTML.

Changes GenerateMHTMLPats into GenerateMHTML and makes it generate
full and validated MHTML contents by default. For one test it made
more sense to only generate the body parts as before so there's an
optional argument to replicate the old behavior.

BUG= 672313 

Review-Url: https://codereview.chromium.org/2842653005
Cr-Commit-Position: refs/heads/master@{#468520}

[modify] https://crrev.com/0179de0d5ca89c2bbdcf24f7a583b304fc053502/third_party/WebKit/Source/web/tests/WebFrameSerializerTest.cpp

Project Member

Comment 12 by bugdroid1@chromium.org, May 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/183d5a9be5ecd0321996d17e84e6352a96113ee5

commit 183d5a9be5ecd0321996d17e84e6352a96113ee5
Author: carlosk <carlosk@chromium.org>
Date: Tue May 02 04:04:21 2017

Removed unusual ASSERT_FALSE(HasFailure()) calls from MHTMLTest.cpp

This change reverts the previous additions of this unusual check
calls.

BUG= 672313 

Review-Url: https://codereview.chromium.org/2858443002
Cr-Commit-Position: refs/heads/master@{#468552}

[modify] https://crrev.com/183d5a9be5ecd0321996d17e84e6352a96113ee5/third_party/WebKit/Source/web/tests/MHTMLTest.cpp

Project Member

Comment 13 by bugdroid1@chromium.org, May 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6d1d4e9207ec3f17854b037d1582b7568b42d4d4

commit 6d1d4e9207ec3f17854b037d1582b7568b42d4d4
Author: carlosk <carlosk@chromium.org>
Date: Thu May 11 06:37:57 2017

Split WebFrameSerializerSanitizationTest into its own file.

As WebFrameSerializerSanitizationTest grew much and in isolation of its parent
WebFrameSerializerTest it makes more sense to have it on its own separate test
file.

BUG= 672313 

Review-Url: https://codereview.chromium.org/2858873007
Cr-Commit-Position: refs/heads/master@{#470848}

[modify] https://crrev.com/6d1d4e9207ec3f17854b037d1582b7568b42d4d4/third_party/WebKit/Source/web/BUILD.gn
[add] https://crrev.com/6d1d4e9207ec3f17854b037d1582b7568b42d4d4/third_party/WebKit/Source/web/tests/WebFrameSerializerSanitizationTest.cpp
[modify] https://crrev.com/6d1d4e9207ec3f17854b037d1582b7568b42d4d4/third_party/WebKit/Source/web/tests/WebFrameSerializerTest.cpp

Project Member

Comment 14 by bugdroid1@chromium.org, May 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/656b9cae0b536ee2f9b8a2777f76ba4a4d775173

commit 656b9cae0b536ee2f9b8a2777f76ba4a4d775173
Author: carlosk <carlosk@chromium.org>
Date: Thu May 18 21:46:00 2017

Validate generated MHTML in mhtml_generation_browsertest.cc tests.

Attempts to load the generated MHTML file after creating it for all tests in
mhtml_generation_browsertest.cc. This allows for validation of its contents
being well formed by verifying that the newly added console message is not
reported.

Also simplifies the checking for the MHTML generation succeeding by asserting
from inside the generation call instead of having checks for test failure in
multiple tests.

BUG= 672313 

Review-Url: https://codereview.chromium.org/2866873002
Cr-Commit-Position: refs/heads/master@{#472939}

[modify] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/content/browser/download/mhtml_generation_browsertest.cc
[add] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/third_party/WebKit/LayoutTests/mhtml/malformed_mhtml_no_footer-expected.png
[add] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/third_party/WebKit/LayoutTests/mhtml/malformed_mhtml_no_footer-expected.txt
[add] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/third_party/WebKit/LayoutTests/mhtml/malformed_mhtml_no_footer.mht
[modify] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/third_party/WebKit/LayoutTests/mhtml/mhtml_in_iframe-expected.txt
[modify] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/third_party/WebKit/Source/platform/loader/fetch/ResourceFetcher.cpp

Status: Fixed (was: Started)
Looking at MHTMLArchive and MHTMLParser I noticed we can trust that the MHTML file is well formed by using them to parse it and checking that the returned result is not empty. The same is done when normally loading an MHTML file. The changes landed all rely on this method.

Not all MHTML related tests initially listed actually needed this validation to be added. All where it made sense and it was practical to run MHTML validation are now updated to have it done:

content_browsertests:
 * MHTMLGenerationTest

webkit_unit_tests:
 * MHTMLTest
 * WebFrameSerializerSanitizationTest (that now has its own file)

Project Member

Comment 16 by bugdroid1@chromium.org, Jun 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a6ac84d4a0cbf393c2360a704e4cce8192ebf633

commit a6ac84d4a0cbf393c2360a704e4cce8192ebf633
Author: Vaclav Brozek <vabr@chromium.org>
Date: Wed Jun 21 09:13:50 2017

Disable the flaky MHTMLGenerationTest.InvalidPath on Win

TBR=carlosk@chromium.org
BUG=735320, 672313 

Change-Id: I2e5edd020e46b4b6ace4ca946f750d34885b723f
Reviewed-on: https://chromium-review.googlesource.com/542836
Reviewed-by: Vaclav Brozek <vabr@chromium.org>
Commit-Queue: Vaclav Brozek <vabr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#481154}
[modify] https://crrev.com/a6ac84d4a0cbf393c2360a704e4cce8192ebf633/content/browser/download/mhtml_generation_browsertest.cc

Sign in to add a comment