According to https://github.com/w3c/web-platform-tests/issues/10246#issuecomment-379026578, failure messages aren't guaranteed to be stable. The only stable identifier is (test_id, subtest_name).
This is a problem for us, as we include failure messages in the baselines. If a message contains some randomly generated IDs, the test may still pass upstream's stability checks (meet the definition of "stable" in the upstream), but is flaky in Chromium and can't be rebaselined. I have seen at least a couple examples in the past month. (All were resolved by reaching out to the test author, nice folks at Mozilla, and they changed the tests to not include IDs in the failure messages.)
We may consider exclude the messages from baselines (by changing testharnessreport.js), or add some magic to r-w-t's output comparison. Both would be non-trivial work (the former requires baselining almost all existing WPT, while the latter is perhaps a bit too hacky).
Comment 1 by ajuma@chromium.org
, Apr 20 2018