MHTML tests are not strongly validating the generated file |
|||||
Issue descriptionMost of the existing tests that cover MHTML generation are asserting their results based on loading the created file as simple text and asserting on its contents. This makes the tests less robust and don't guarantee that the generated file is an actually valid MHTML file. This almost allowed a change I was working on to land as tests passed but the created file was missing its footer, being an invalid MHTML file. My current suggestion is to at least validate that the MHTML file is minimally valid as part of the tests checks. Going a bit further we could load the file as MHTML as there are tests that could enforce greater guarantees on their checks if they switched from sub-string matching to using DOM inspections.
,
Dec 8 2016
I didn't know about the existence of that test file. But the problem there is that the file is still being read as simple text. The content checks might be more thorough but, finally, the generated file could still be malformed. But this could be a solution too: create one or more shared/common MHTML "validation" methods that check contents thoroughly but generically, so that we're as certain as possible that the file is valid MHTML. This doesn't solve the text versus DOM analysis issue though. For instance, some tests check the number of occurrences of certain attributes in generated MHTML. But this doesn't assert that those are in the right places/elements. Extra note: I noticed a few non-MHTML related tests in that file compare the full contents of the generated file with an expected file. I don't think we want to do the same for MHTML as its contents and ordering could more wildly vary. Also keeping all expected files up to date would be troublesome with the ongoing changes in MHTML saving logic.
,
Dec 14 2016
What would be the concrete possible steps to fix the situation?
,
Jan 25 2017
,
Mar 1 2017
Fixit update: good one for the fixit; hopefully can be done in less than 3 days. Concrete steps: - List all tests that check MHTML writing correctness. - For each of them try implementing the loading step with our actual MHTML reader code and use the loaded page instead of plain text. -- If that's impossible or impractical consider also loading the MHTML file using our reader and just validate it's well formed. -- If even that's impossible or impractical then see how to improve the checks to better cover potentially malformed files. Extra note: The motivation for filing this issue was that I was working on a change that had a side effect of sometimes creating invalid MHTML files (no footer) and our current automatic tests did not catch that problem.
,
Mar 1 2017
Suggested codesearch for finding all MHTML tests: https://cs.chromium.org/search/?q=mhtml+file:test%5B-/%5D*%5C.&type=cs
,
Mar 8 2017
Using the query from #6 I built the following list of MHTML related test suites/classes: content_browsertests: * MHTMLGenerationTest webkit_unit_tests: * MHTMLTest * WebFrameSerializerTest * WebFrameSerializerSanitizationTest browser_tests: * SavePageAsMHTMLBrowserTest (NOT Android) * ExtensionPageCaptureApiTest (NOT Android; some tests are ChromeOS only) unit_tests: * OfflinePageMHTMLArchiverTest (Android only) android_webview_test_apk: (Android only, obviously) * ArchiveTest And to try to further justify this issue, while I was working on issue 672292 I disabled the writing of the MHTML footer from the generation code and only two specific tests: * MHTMLGenerationTest.ViewedMHTMLContainsNoStoreContentIfNoCacheControlPolicy * MHTMLGenerationTest.ViewedMHTMLDoesNotContainNoStoreContent
,
Apr 22 2017
I worked a little with our MHTML parser and it is a robust parser that will not clearly fail if the file is malformed. I haven't investigated yet if this is a configurable setting but I can assert we cannot simply use it as is. Next steps would be to investigate this possibility and if that's not possible then come up with other ways to check for well formed files. Some ideas for the latter case: - check for the number of expected boundaries in the file - check for precise or approximate file size. Even if these new checks are not added to all tests listed in #7 they must at the very least be added to MHTMLGenerationTest.
,
Apr 24 2017
,
Apr 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8201c9cbc87f806a3fce3df667b36221c600caf0 commit 8201c9cbc87f806a3fce3df667b36221c600caf0 Author: carlosk <carlosk@chromium.org> Date: Tue Apr 25 21:47:11 2017 MHTMLTest::TestMHTMLEncoding checks generated MTHML is well formed. After generating MHTML data in memory checks that it is well formed by parsing it using MHTMLParser. BUG= 672313 Review-Url: https://codereview.chromium.org/2841483007 Cr-Commit-Position: refs/heads/master@{#467125} [modify] https://crrev.com/8201c9cbc87f806a3fce3df667b36221c600caf0/third_party/WebKit/Source/web/tests/MHTMLTest.cpp
,
May 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0179de0d5ca89c2bbdcf24f7a583b304fc053502 commit 0179de0d5ca89c2bbdcf24f7a583b304fc053502 Author: carlosk <carlosk@chromium.org> Date: Tue May 02 01:07:13 2017 WebFrameSerializerSanitizationTest validates generated MHTML. Changes GenerateMHTMLPats into GenerateMHTML and makes it generate full and validated MHTML contents by default. For one test it made more sense to only generate the body parts as before so there's an optional argument to replicate the old behavior. BUG= 672313 Review-Url: https://codereview.chromium.org/2842653005 Cr-Commit-Position: refs/heads/master@{#468520} [modify] https://crrev.com/0179de0d5ca89c2bbdcf24f7a583b304fc053502/third_party/WebKit/Source/web/tests/WebFrameSerializerTest.cpp
,
May 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/183d5a9be5ecd0321996d17e84e6352a96113ee5 commit 183d5a9be5ecd0321996d17e84e6352a96113ee5 Author: carlosk <carlosk@chromium.org> Date: Tue May 02 04:04:21 2017 Removed unusual ASSERT_FALSE(HasFailure()) calls from MHTMLTest.cpp This change reverts the previous additions of this unusual check calls. BUG= 672313 Review-Url: https://codereview.chromium.org/2858443002 Cr-Commit-Position: refs/heads/master@{#468552} [modify] https://crrev.com/183d5a9be5ecd0321996d17e84e6352a96113ee5/third_party/WebKit/Source/web/tests/MHTMLTest.cpp
,
May 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6d1d4e9207ec3f17854b037d1582b7568b42d4d4 commit 6d1d4e9207ec3f17854b037d1582b7568b42d4d4 Author: carlosk <carlosk@chromium.org> Date: Thu May 11 06:37:57 2017 Split WebFrameSerializerSanitizationTest into its own file. As WebFrameSerializerSanitizationTest grew much and in isolation of its parent WebFrameSerializerTest it makes more sense to have it on its own separate test file. BUG= 672313 Review-Url: https://codereview.chromium.org/2858873007 Cr-Commit-Position: refs/heads/master@{#470848} [modify] https://crrev.com/6d1d4e9207ec3f17854b037d1582b7568b42d4d4/third_party/WebKit/Source/web/BUILD.gn [add] https://crrev.com/6d1d4e9207ec3f17854b037d1582b7568b42d4d4/third_party/WebKit/Source/web/tests/WebFrameSerializerSanitizationTest.cpp [modify] https://crrev.com/6d1d4e9207ec3f17854b037d1582b7568b42d4d4/third_party/WebKit/Source/web/tests/WebFrameSerializerTest.cpp
,
May 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/656b9cae0b536ee2f9b8a2777f76ba4a4d775173 commit 656b9cae0b536ee2f9b8a2777f76ba4a4d775173 Author: carlosk <carlosk@chromium.org> Date: Thu May 18 21:46:00 2017 Validate generated MHTML in mhtml_generation_browsertest.cc tests. Attempts to load the generated MHTML file after creating it for all tests in mhtml_generation_browsertest.cc. This allows for validation of its contents being well formed by verifying that the newly added console message is not reported. Also simplifies the checking for the MHTML generation succeeding by asserting from inside the generation call instead of having checks for test failure in multiple tests. BUG= 672313 Review-Url: https://codereview.chromium.org/2866873002 Cr-Commit-Position: refs/heads/master@{#472939} [modify] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/content/browser/download/mhtml_generation_browsertest.cc [add] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/third_party/WebKit/LayoutTests/mhtml/malformed_mhtml_no_footer-expected.png [add] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/third_party/WebKit/LayoutTests/mhtml/malformed_mhtml_no_footer-expected.txt [add] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/third_party/WebKit/LayoutTests/mhtml/malformed_mhtml_no_footer.mht [modify] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/third_party/WebKit/LayoutTests/mhtml/mhtml_in_iframe-expected.txt [modify] https://crrev.com/656b9cae0b536ee2f9b8a2777f76ba4a4d775173/third_party/WebKit/Source/platform/loader/fetch/ResourceFetcher.cpp
,
May 22 2017
Looking at MHTMLArchive and MHTMLParser I noticed we can trust that the MHTML file is well formed by using them to parse it and checking that the returned result is not empty. The same is done when normally loading an MHTML file. The changes landed all rely on this method. Not all MHTML related tests initially listed actually needed this validation to be added. All where it made sense and it was practical to run MHTML validation are now updated to have it done: content_browsertests: * MHTMLGenerationTest webkit_unit_tests: * MHTMLTest * WebFrameSerializerSanitizationTest (that now has its own file)
,
Jun 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a6ac84d4a0cbf393c2360a704e4cce8192ebf633 commit a6ac84d4a0cbf393c2360a704e4cce8192ebf633 Author: Vaclav Brozek <vabr@chromium.org> Date: Wed Jun 21 09:13:50 2017 Disable the flaky MHTMLGenerationTest.InvalidPath on Win TBR=carlosk@chromium.org BUG=735320, 672313 Change-Id: I2e5edd020e46b4b6ace4ca946f750d34885b723f Reviewed-on: https://chromium-review.googlesource.com/542836 Reviewed-by: Vaclav Brozek <vabr@chromium.org> Commit-Queue: Vaclav Brozek <vabr@chromium.org> Cr-Commit-Position: refs/heads/master@{#481154} [modify] https://crrev.com/a6ac84d4a0cbf393c2360a704e4cce8192ebf633/content/browser/download/mhtml_generation_browsertest.cc |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by lukasza@chromium.org
, Dec 8 2016