Copying some notes here from an email thread:
Track down the source of each entry between the Google sections at the top and the new sections (BULK and MANUAL) at the bottom.
That would entail the following:
- Look at the history of commits for net/http/transport_security_state_static.json, and associate each entry with the latest commit that modified the entry (i.e. adding expect_ct counts, but moving an entry around on its own doesn't).
- Note: the data used to live at net/base/transport_security_state_static.json and net/base/hsts_preloaded.json, so you'll have to look at those to get the full history for all entries.
- The entries already in the bulk and manual sections don't need annotation, since we've kept those diligently.
- Classify all the commits by type: added through hstspreload.appspot.com vs. added manually.
- Associate the type of commit to each entry (based on the latest commit that modified it).
- Split the old entries (everything between the Google entries and the new bulk domains) into sections with comments delimiting the start and end of each. Each section should still have the domains in the order that they appear in the file today.
- OLD MANUAL CUSTOM ENTRIES: Domains that have any setting except the exact values of {"include_subdomains": true, "mode": "force-https"}.
- OLD MANUAL HSTS ENTRIES: Domains with {"include_subdomains": true, "mode": "force-https"} that were not added through hstspreload.appspot.com
- OLD BULK HSTS ENTRIES: Domains with {"include_subdomains": true, "mode": "force-https"} that were not added through hstspreload.appspot.com
- Put any large groups of domains (I think this includes only Yahoo! domains and Facebook domains?) in their own sections.
In the end, a sorted, canonicalized version of the JSON file should still match the old values. Outside of the source itself, it would also be good to have the classification of commits, so that we can easily sanity check the classification if something looks off.
The most important part here is to know which ones are the old bulk entries, so that we can whitelist them for https://hstspreload.org/removal/
I also experimented with adding explicit annotations at https://chromium-review.googlesource.com/c/chromium/src/+/588344 , which would make the scripts more robust. But I don't think that's as important until we have a fully automated roller (Issue 736188), and I've been holding off on landing it so that I don't risk painting us into a corner.
martijn@: Not a hurry, but you're welcome to pick this up if you'd like. :-)
Comment 1 by bugdroid1@chromium.org
, Oct 24 2017