David Rajchenbach Teller: Designing the Firefox Performance Monitor (2): Monitoring Add-ons and Webpages

Пятница, 06 Ноября 2015 г. 16:56 + в цитатник

In part 1, we discussed the design of time measurement within the Firefox Performance Monitor. Despite the intuition, the Performance Monitor had neither the same set of objectives as the Gecko Profiler, nor the same set of constraints, and we ended up picking a design that was not a sampling profiler. In particular, instead of capturing performance data on stacks, the Monitor captures performance data on Groups, a notion that we have not discussed yet. In this part, we will focus on bridging the gap between our low-level instrumentation and actual add-ons and webpages, as may be seen by the user.

I. JavaScript compartments

The main objective of the Performance Monitor is to let users and developers quickly find out which add-ons or webpages are slowing down Firefox. The main tool of the Performance Monitor is an instrumentation of SpiderMonkey, the JavaScript VM used by Firefox, to detect slowdowns caused by code taking too long to execute.

SpiderMonkey is a general-purpose VM, used in Firefox, Thunderbird, but also in Gnome, as a command-line scripting tool, as a test suite runner and more. Out of the box, SpiderMonkey knows nothing about webpages or add-ons.

However, SpiderMonkey defines a notion of JavaScript Compartment. Compartments were designed to provide safe and manageable isolation of code and memory between webpages, as well as between webpages and the parts of Firefox written in JavaScript. In terms of JavaScript, each compartment represents a global object (typically, in a webpage, the window object), all the code parsed as part of this object, and all the memory owned by either. In particular, if a compartment A defines an event listener and attaches it to an event handler offered through some API by another compartment B, the event handler is still considered part of A.

Compartments do not offer a one-to-one mapping to add-ons or webpages, but they are close. We just need to remember a few things:

some compartments belong neither to an add-on, nor to a webpage (e.g. the parts of Firefox written in JavaScript);
each add-on can define any number of modules and worker threads, each of which its own compartment;
each webpage can define any number of frames and worker threads, each of which has its own compartment;
there are a number of ways to create compartments dynamically.

In addition, while Firefox executing JS code, it is possible to find out whether this code belongs to a window, using xpc::CurrentWindowOrNull(JSContext*). This information is not available to SpiderMonkey, but it is available to the embedding of SpiderMonkey, i.e. Firefox itself. Using a different path, one can find out whether an object belongs to an add-on – and, in particular, if the global object of a compartment belongs to an add-on – using JS::AddonIdOfObject(JSObject*).

Putting all of this together, in terms of JavaScript, both add-ons and web pages are essentially groups of compartments. We call these groups Performance Groups.

II. Maintaining Performance Groups

We extend SpiderMonkey with a few callbacks to let it grab Performance Groups from its embedding. Whenever SpiderMonkey creates a new Compartment, whether during the load of a page, during that of an add-on, or in more sophisticated dynamic cases, it requests the list of Performance Groups to which it belongs.

static bool
GetPerformanceGroupsCallback(JSContext* cx,
                             Vector&,
                             void* closure);

Attaching performance groups to a compartment during creation lets us ensure that we can update the performance cost of a compartment in constant-time, without complex indirections.

In the current implementation, a compartment typically belongs to a subset of the following groups:

its own group, which may be used to track performance of the single compartment;
a group shared by all compartments in the add-on on the current thread (typically, several modules);
a group shared by all compartments in the webpage on the current thread (typically, several iframes);
the “top group”, shared by all compartments in the VM, which may be used to track the performance of the entire JavaScript VM – while this has not always been the case, this currently maps to a single JavaScript thread.

Note that a compartment can theoretically belong to both a webpage and an add-on, although I haven’t encountered this situation yet.

As we saw in part 1 of this series, we start and stop a stopwatch to measure the duration of code execution whenever we enter/leave a Performance Group that does have a stopwatch yet. Consequently, each JavaScript stack has a single “top” stopwatch, which serves both to measure the performance of the “top group” and the performance of whichever JS code lies on top of the stack.

For performance reasons, groups can be marked as active or inactive, where inactive groups do not need a stopwatch. In a general run of Firefox, all the “own groups”, specific to a single compartment each, are inactive to avoid having to start/stop too many stopwatches at once and to commit too many results at the end of the event, while all the other groups are active. Own groups can be activated individually when investigating a performance issue, or to help tracking the effect of a module.

Note that we do not have to limit ourselves to the above kinds of groups. Indeed, we have plans to provide additional groups in the future, to be able to:

turn on/off monitoring of entire features implemented in JavaScript;
inspect the performance effect of entire content domains (e.g. all “facebook.com” pages, or all Google “+1” buttons);
…

In a different embedding, for instance an operating system, one could envision envision a completely different repartition of performance groups, such as a group shared by all services acting on behalf of of a single user.

III. Threads and processes

Nowadays, Firefox Nightly is a multi-threaded, multi-process application. Firefox Release has not reached that point yet, but should within a few versions. As defined above, performance groups cross neither threads nor processes.

As of this writing, we have not implemented collection of data from various threads, as the information is not as interesting as one could think. Indeed, in SpiderMonkey, a single non-main thread can only contain a single compartment, and it is difficult to impact the framerate with a background thread. Other tools dedicated to monitoring threads would therefore be better suited than the mechanism of Performance Groups.

On the other hand, activity across processes can cause user-visible jank, so we need to be able to track it. In particular, a single add-on can have performance impact on several processes at once. For this reason, the Performance Monitor is executed on each process. Higher-level APIs provide two ways of accessing application-wide information.

1/ Polling

The first API implements polling, as follows:

Task.spawn(function*() {
  // We are interested in jank and blocking cross-process communications.
  // Other probes do not need to be activated on behalf of of this monitor.
  let monitor = PerformanceStats.getMonitor([“jank”, “cpow”]);

  // Collect data from all processes. Dead or frozen processes are ignored.
  let snapshot = yield monitor.promiseSnapshot();

  // … wait

  // Collect data, once again. Again, dead or frozen processes are ignored.
  let snapshot2 = yield monitor.promiseSnapshot();

  // Compute the resource usage between the two timestamps.
  let delta = snapshot2.subtract(snapshot);

  let myAddon = delta.addons.get(“foo@bar”);

  // `durations` recapitulates the frame impact of `myAddon` during the interval
  // as an array containing the number of times we have missed 1 frame, 2 successive frames, 4 successive frames, ...
  let durations = myAddon.durations;
  console.log(“Jank info”, durations);
});

The underlying implementation of this API is relatively straightforward:

in each process, the Performance Stats Service collects all the data at the end of each event, updating `durations` accordingly;
when `promiseSnapshot` is called, we broadcast to all processes, requesting the latest data collected by the Performance Stats Service;
if an add-on appears in several processes, we sum the resource impact and collapse the add-on data into a single item.

Polling is useful to get an overview of the resource usage between two instants for the entire system. At the time of this writing, however, it is somewhat oversized if the objective is simply to follow one add-on/webpage (as it always collects and processes data from all add-ons and webpage), or one process (as it always collects data from all processes). In addition, polling is not appropriate to generate performance alerts, as it needs to communicate with all processes, even if these processes are idle. This prevents the processes from sleeping, which is both bad for battery and for virtual memory usage.

2/ Events

For these reasons, we have developed a second, event-based API, which is expected to land on Firefox Nightly within a few days.

PerformanceWatcher.addPerformanceListener({addonId: “foo@bar”}, function(source, details) {
  // This callback is triggered whenever the add-on causes too many consecutive frames to be skipped.
  console.log(“Highest Jank (µs)”, details.highestJank);
});

This same API can be used to watch tabs, or to watch all add-ons or all tabs at once.

The implementation of this API is slightly more sophisticated, as we wish to avoid saturating API clients with alerts, in particular if some of these clients may themselves be causing jank:

in each process, the Performance Stats Service collects all the data at the end of each event;
if the execution duration of at least one group has exceeded some threshold (typically 64ms), we add it to the list of “performance alerts”, unless it is already in that list;
performance alerts are collected after ~100ms – the timer is active only if at least one collection is needed;
each performance alert for an add-on is then dispatched to any observer for this add-on and to the universal add-on observers (if any);
each performance alert for a window is then dispatched to any observer for this window and to the universal window observers (if any);
each child process buffers alerts, to minimise IPC cost, then propagates them to the parent process;
the parent process collects all alerts and dispatches them to observers.

There are a few subtleties, as we may wish to register observers for add-ons that have not started yet (or even that have not been installed or have been uninstalled), and similarly for windows that are not open yet, or that have already been closed. Other subtleties ensure that, once again, most operations are constant-time, with the exception of dispatching to observers, which is linear in the number of alerts (deduplicated) + observers.

Future versions may extend this to watching specific Firefox features, or watching specific process, or the activity of the VM itself, and possibly more. We also plan to extend the API to improve the ability to detect whether the jank may actually be noticed by the user, or is somehow invisible, e.g. because the janky process was not visible at the time, or neither interactive nor animated.

To be continued

At this stage, I have presented most of the important design of the Performance Monitor. In a followup post, I intend to explain some of the work we have done to weed out false positives and show the user with user-actionable results.