I think this test has always been a little flaky on Windows, even before your patch. It looks like your patch picked up the flaky baseline which causes the test to fail 90% of the time (instead of 10% as before).
The flakiness looks like an issue getting the font for the teletype text between <tt> tags. WDYT of removing the <tt> tags from the test since they are unrelated to the actual test?
Comment 1 by pdr@chromium.org
, Jul 7