Here is the current status:
llvm-profdata had a built-in multi-threading support which was disabled in Chromium, but once I enabled that, its performance became good enough as executing llvm-profdata takes minutes compared to hours taken by executing llvm-cov. I didn't take a closer look into its implementation, so there definitely might be low-hanging fruits, though the total win might be pretty small.
llvm-cov is a more complicated beast. Its execution consists of 3 main steps:
1) 10-20% of execution time: reading and parsing a given .profdata file. This part is single-threaded and I didn't look into it, because back then the time spent on this was insignificant (steps 2 and 3 were an order of magnitude slower, and so the step 1 was taking a few % of execution time at most). It might be worth taking a closer look now.
2) 40-45% of execution time: preparing file reports (a function called prepareFileReports). This part was not parallelized and I implemented a multi-threaded version in https://reviews.llvm.org/rL321871 which worked out pretty well (~14.8X speedup on a machine with 16 physical CPU cores). However, it still might be helpful to take another look into the underlying implementation.
3) 40-45% of execution time: writing HTML files on disk. This part had a multi-threaded implementation which was disabled in Chromium. Once I enabled it, it became much more sane. I was trying to improve it more (played with different types of output buffers), but that didn't help much. It also feels like the actual bottleneck here might be the disk output speed. For instance, even with NVMe SSD disk, switching from 32 vCPU instance to 64 vCPU instance gave us a small improvement (<10%), so we stick with 32 vCPU + NVMe SSD (more here: https://chrome-internal.googlesource.com/chrome/tools/code-coverage/+/master/docs/GCE.md). I don't think it would make sense to try to optimize this part. Instead, we need some other representation for coverage data which would be fast to output and easy to read + render as HTML alongside with a corresponding source
These 3 steps above are applicable to "llvm-cov show" command. Besides that, we also use "llvm-cov export" to produce machine readable stats which can be stored in a database + we use it for building a per directory report view. By default, it was crazy slow, but I've added a flag to skip most of the computations and output per file summary numbers only (https://reviews.llvm.org/rL320435 + https://reviews.llvm.org/rL322397). That made the command execution an order of magnitude or two faster than it was, but still there might be some other potential improvements.
Thanks Max for the summary!