Nicholas Nethercote: An even slimmer pdf.js

Понедельник, 16 Июня 2014 г. 05:49 + в цитатник

TL;DR: Firefox’s built-in PDF viewer is on track to benefit from additional large reductions in memory consumption when Firefox 33 is released in mid-October.

Earlier this year I made some major improvements to the memory usage and speed of pdf.js, Firefox’s built-in PDF viewer. At the time I was quite pleased with myself. I had picked all the low-hanging fruit and reached a point of diminishing returns.

Some unexpected measurements

Shortly after that, while on a plane I tried pdf.js on my Mac laptop. I was horrified to find that memory usage was much higher than on the Linux desktop machine on which I had done my optimizations. The first document I measured made Firefox’s RSS (resident set size, a.k.a. physical memory usage) peak at over 1,000 MiB. The corresponding number on my Linux desktop had been 400 MiB! What was going on?

The short answer is “canvases”. pdf.js uses HTML canvas elements to render each PDF page. These can be millions of pixels in size, and each canvas pixel takes 32-bits of RGBA data. But on Linux this data is not stored within the Firefox process. Instead, it gets passed to the X server. Firefox’s about:memory page does report the canvas usage under the “canvas-2d-pixels” and “gfx-surface-xlib” labels, but I hadn’t seen those — during my optimization work I had only measured the RSS of the Firefox process. RSS is usually the best single number to focus on, but in this case it was highly misleading.

In contrast, on Mac the canvas data is stored within the process, which is why Firefox’s RSS is so much higher on Mac. (There’s also the fact that my MacBook has a retina screen, which means the canvases used have approximately four times as many pixels as the canvases used on my Linux machine.)

Fixing the problem

It turns out that pdf.js was intentionally caching an overly generous number of canvases (20) and then unintentionally failing to dispose of them in a timely manner. This could result in hundreds of canvases being held onto unnecessarily if you scrolled quickly through a large document. On my MacBook each canvas is over 20 MiB, so the resultant memory spike could be enormous.

Happily, I was able to fix this behaviour with four tiny patches. pdf.js now caches only 10 canvases, and disposes of any excess ones immediately.

Measurements

I measured the effects of both rounds of optimization on the following three documents on my Mac.

TraceMonkey: This is an academic paper about the TraceMonkey JIT compiler. It’s a 14 page Latex document containing several pictures and graphs.
Telef'onica: This is the 2009 annual report from Telef'onica. It’s a 122 page full-colour document with many graphs, tables and pictures.
Decontamination: This is an old report about the decontamination of an industrial site in New Jersey. It’s a 226 page scan of a paper document that mostly consists of black and white typewritten text and crude graphs, with only few pages featuring colour.

For each document, I measure the peak RSS while scrolling quickly from the first page to the last by holding down the “fn” and down keys at the same time. Although this is a stress test, it’s not unrealistic; it’s easy to imagine people quickly scrolling through 10s or even 100s of pages of a PDF to find a particular page.

I did this three times for each document and picked the highest value. The results are shown by the following graph. The “Fx28'' bars show the original peak RSS values; the “Fx29'' bars show the peak RSS values after my first round of optimizations; and the “Fx33'' bars show the peak RSS values after my second round of optimizations.

Comparing Firefox 28 and Firefox 33, the reductions in peak RSS for the three documents are 1.2x, 4.4x, and 7.8x. It makes sense that the relative improvements increase with the length of the documents.

Despite these major improvements, pdf.js still uses substantially more memory than native PDF viewers. So we need to keep chipping away… but it’s also worth recognizing how far we’ve come.

Status

These patches have landed in the master pdf.js repository. They have not yet imported into Firefox’s code, but this will happen at some point during the development cycle for Firefox 33. Firefox 33 is on track to be released in mid-October.

https://blog.mozilla.org/nnethercote/2014/06/16/an-even-slimmer-pdf-js/