Nicholas Nethercote: How to speed up the Rust compiler one last time |
Due to recent changes at Mozilla my time working on the Rust compiler is drawing to a close. I am still at Mozilla, but I will be focusing on Firefox work for the foreseeable future.
So I thought I would wrap up my “How to speed up the Rust compiler” series, which started in 2016.
I wrote ten “How to speed up the Rust compiler” posts.
memcpy
, and several improving the ObligationForest
data structure. It discussed some PRs by others that reduced library code bloat. I also included a table of overall performance changes since the previous post, something that I continued doing in subsequent posts.Beyond those, I wrote several other posts related to Rust compilation.
As well as sharing the work I’d been doing, a goal of the posts was to show that there are people who care about Rust compiler performance and that it was actively being worked on.
Boiling down compiler speed to a single number is difficult, because there are so many ways to invoke a compiler, and such a wide variety of workloads. Nonetheless, I think it’s not inaccurate to say that the compiler is at least 2-3x faster than it was a few years ago in many cases. (This is the best long-range performance tracking I’m aware of.)
When I first started profiling the compiler, it was clear that it had not received much in the way of concerted profile-driven optimization work. (It’s only a small exaggeration to say that the compiler was basically a stress test for the allocator and the hash table implementation.) There was a lot of low-hanging fruit to be had, in the form of simple and obvious changes that had significant wins. Today, profiles are much flatter and obvious improvements are harder for me to find.
My approach has been heavily profiler-driven. The improvements I did are mostly what could be described as “bottom-up micro-optimizations”. By that I mean they are relatively small changes, made in response to profiles, that didn’t require much in the way of top-down understanding of the compiler’s architecture. Basically, a profile would indicate that a piece of code was hot, and I would try to either (a) make that code faster, or (b) avoid calling that code.
It’s rare that a single micro-optimization is a big deal, but dozens and dozens of them are. Persistence is key.
I spent a lot of time poring over profiles to find improvements. I have measured a variety of different things with different profilers. In order of most to least useful:
counts
)cargo-llvm-lines
)memcpy
s (DHAT)Every time I did a new type of profiling, I found new things to improve. Often I would use multiple profilers in conjunction. For example, the improvements I made to DHAT for tracking allocations and memcpy
s were spurred by Cachegrind/Callgrind’s outputs showing that malloc
/free
and memcpy
were among the hottest functions for many benchmarks. And I used counts
many times to gain insight about a piece of hot code.
Off the top of my head, I can think of some unexplored (by me) profiling territories: self-profiling/queries, threading stuff (e.g. lock contention, especially in the parallel front-end), cache misses, branch mispredictions, syscalls, I/O (e.g. disk activity). Also, there are lots of profilers out there, each one has its strengths and weaknesses, and each person has their own areas of expertise, so I’m sure there are still improvement to be found even for the profiling metrics that I did consider closely.
I also did two larger “architectural” or “top-down” changes: pipelined compilation and LLVM bitcode elision. These kinds of changes are obviously great to do when you can, though they require top-down expertise and can be hard for newcomers to contribute to. I am pleased that there is an incremental compilation working group being spun up, because I think that is an area where there might be some big performance wins.
Good benchmarks are important because compiler inputs are complex and highly variable. Different inputs can stress the compiler in very different ways. I used rustc-perf
almost exclusively as my benchmark suite and it served me well. That suite changed quite a bit over the past few years, with various benchmarks being added and removed. I put quite a bit of effort into getting all the different profilers to work with its harness. Because rustc-perf
is so well set up for profiling, any time I needed to do some profiling of some new code I would simply drop it into my local copy of rustc-perf
.
Compilers are really nice to profile and optimize because they are batch programs that are deterministic or almost-deterministic. Profiling the Rust compiler is much easier and more enjoyable than profiling Firefox, for example.
Contrary to what you might expect, instruction counts have proven much better than wall times when it comes to detecting performance changes on CI, because instruction counts are much less variable than wall times (e.g. ±0.1% vs ±3%; the former is highly useful, the latter is barely useful). Using instruction counts to compare the performance of two entirely different programs (e.g. GCC vs clang) would be foolish, but it’s reasonable to use them to compare the performance of two almost-identical programs (e.g. rustc before PR #12345 and rustc after PR #12345). It’s rare for instruction count changes to not match wall time changes in that situation. If the parallel version of the rustc front-end ever becomes the default, it will be interesting to see if instruction counts continue to be effective in this manner.
I was surprised by how many people said they enjoyed reading this blog post series. (The positive feedback partly explains why I wrote so many of them.) The appetite for “I squeezed some more blood from this stone” tales is high. Perhaps this relates to the high level of interest in Rust, and also the pain people feel from its compile times. People also loved reading about the failed optimization attempts.
Many thanks to all the people who helped me with this work. In particular:
rustc-perf
and the CI performance infrastructure, and helping me with many rustc-perf
changes;Rust’s existence and success is something of a miracle. I look forward to being a Rust user for a long time. Thank you to everyone who has contributed, and good luck to all those who will contribute to it in the future!
https://blog.mozilla.org/nnethercote/2020/09/08/how-to-speed-up-the-rust-compiler-one-last-time/
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |