Robert O'Callahan: Some Dynamic Measurements Of Firefox On x86-64 |
This follows up on my previous measurements of static properties of Firefox code on x86-64 with some measurements of dynamic properties obtained by instrumenting code. These are mostly for my own amusement but intuitions about how programs behave at the machine level, grounded in data, have sometimes been unexpectedly useful.
Dynamic properties are highly workload-dependent. Media codecs are more SSE/AVX intensive than regular code so if you do nothing but watch videos you'd expect qualitatively different results than if you just load Web pages. I used a mixed workload that starts Firefox (multi-process enabled, optimized build), loads the NZ Herald, scrolls to the bottom, loads an article with a video, plays the video for several seconds, then quits. It ran for about 30 seconds under rr and executes about 60 billion instructions.
I repeated my register usage result analysis, this time weighted by dynamic execution count and taking into account implicit register usage such as push using rsp. The results differ significantly on whether you count the consecutive iterations of a repeated string instruction (e.g. rep movsb) as a single instruction execution or one instruction execution per iteration, so I show both. Unlike the static graphs, these results for all instructions executed anywhere in the process(es), including JITted code, not just libxul.
I was also interested in exploring the distribution of instruction execution frequencies:
A dot at position x, y on this graph means that fraction y of all instructions executed at least once is executed at most x times. So, we can see that about 19% of all instructions executed are executed only once. About 42% of instructions are executed at most 10 times. About 85% of instructions are executed at most 1000 times. These results treat consecutive iterations of a string instruction as a single execution. (It's hard to precisely define what it means for an instruction to "be the same" in the presence of dynamic loading and JITted code. I'm assuming that every execution of an instruction at a particular address in a particular address space is an execution of "the same instruction".)
Interestingly, the five most frequently executed instructions are executed about 160M times. Those instructions are for this line, which is simply filling a large buffer with 0xff000000. gcc is generating quite slow code:
132e7b2: cmp %rax,%rdxThat's five instructions executed for every four bytes written. This could be done a lot faster in a variety of different ways --- rep stosd or rep stosq would probably get the fast-string optimization, but SSE/AVX might be faster.
132e7b5: je 132e7d1
132e7b7: movl $0xff000000,(%r9,%rax,4)
132e7bf: inc %rax
132e7c2: jmp 132e7b2
http://robert.ocallahan.org/2016/06/some-dynamic-measurements-of-firefox-on.html
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |