Rob Wood: Raptor on Gaia CI

Понедельник, 01 Июня 2015 г. 23:38 + в цитатник

What is Raptor?

No, I’m not talking about Jurassic World (although the movie looks great), I’m actually referring to Raptor – the Firefox OS Performance Framework, created by Eli Perelman.

If you work on Gaia you’ve probably heard about Raptor already. If not, I’ll give you a quick intro. Raptor is the next generation performance measurement tool for Firefox OS. Raptor-based tests perform automated actions on Firefox OS devices, while listening for performance markers (generated by Gaia via the User Timing Gecko API). The performance data is gathered, then results calculated and summarized in a nifty table in the console. For more details about the Raptor tool itself, please see the excellent article “Raptor: Performance Tools for Gaia” that Eli authored on MDN.

Raptor Launch Test

The Raptor launch test automatically launches the specified Firefox OS application, listens for specific end markers, and gathers launch-related performance data. In the case of the launch test, once the ‘fullyLoaded’ marker has been heard, the test iteration is considered complete.

Each time the launch test runs, the application is launched 30 times. Raptor then calculates results based on data from all 30 launches and displays it in the console like this:

Currently the main value that we are concerned with for application launches is the 95th percentile (p95) value for the ‘visuallyLoaded’ metric. In the above example the value is 38,316 ms. The ‘visuallyLoaded’ metric is the one to keep your eye on when concerned about application launch time.

Where Raptor is Running

I am pleased to note that Raptor performance tests are now running on Gaia on a per check-in basis. The test suite runs on Gaia (try) when every Gaia pull request is created. The same suite also runs on gaia-master, after each pull request is merged. The test suite is powered by TaskCluster, runs on the b2g emulator, and reports results on Treeherder.

Currently the Raptor suite running on Gaia check-in consists of the application launch test running on the Firefox OS Settings app. In the near future the launch test will be extended to run on several more Firefox OS apps, and the suite will also be expanded to include the B2G restart test.

The Raptor suite is also running on real Firefox OS devices against several commits per day. That suite is maintained by Eli and reports to an awesome Raptor dashboard. This blog article concerns the Raptor suite running on a per check-in basis, not the daily device suite.

Why Run on Gaia CI?

The purpose of the Raptor test suite running on Gaia check-in is to find Gaia performance regressions before or at the time they are introduced. Pull requests that result in a flagged performance regression can be denied merge, or reverted.

Why the Emulator?

We use the B2G emulator for the Raptor tests running on Gaia CI because of scalability. There are potentially a large number of Gaia pull requests and merges submitted on a daily basis, and it wouldn’t be feasible to use a real Firefox OS device to run the tests on each check-in.

The B2G emulator is sufficient for this task because we are concerned with the change in performance numbers, not the actual performance numbers themselves. The Raptor suite running on actual Firefox OS device hardware mentioned previously provides real, and historical, performance numbers.

The latest successful TaskCluster emulator-ics build is used for all of the tasks; the same emulator and Gecko build is used for all of the tests.

Raptor Base Tasks vs Patch Tasks

The Raptor launch test is run six times on the emulator using the Gaia “base” build, and six times using the Gaia “patch” build, all concurrently. When the tests are triggered by a pull request (Gaia try), the “base” build is the current base/branch master at the time of the pull request creation, and the “patch” build is the base build + the pull request code. Therefore the test measures performance without the pull request code, and with the pull request code added.

When the tests are triggered by a Gaia merge (gaia-master) the “base” build is the current base/master -1 commit (rev before the code merge), and the “patch” build is base rev (has the merged code). Therefore the test measures performance without the merged code, and with the merged code.

Interpreting the Treeherder Dashboard

There is a new Raptor group now appearing on Treeherder for Gaia. At the time of writing this article, the launch test is running on the Settings app only. After the launch test suite is expanded, the explanation of how to interpret the dashboard can be applied to other apps in the same manner.

Raptor launch test on Gaia Treeherder

The Raptor test tasks appear on Treeherder in the “Rpt” group. The task name has a two letter prefix, which matches the first two letters of the Firefox OS application under test (in the above example “se”, which is for the Settings app).

The first set of numbered tasks (se1, se2, se3, se4, se4, se6) are the tasks that ran the Raptor launch test on the ‘base’ build. On Gaia try (testing a pull request), these tasks run the launch test on Gaia BASE_REV (code without the PR). On gaia-master (not testing a PR), these tasks run the launch test on Gaia BASE_REV – 1 commit (commit right before the merged PR).

The second set of numbered tasks (se6, se7, se8, se9, se10, se11, se12) are the tasks that ran the Raptor launch test on the ‘patch’ build. On Gaia try (testing a PR), these tasks run the launch test on Gaia HEAD_REV (code with the PR). On gaia-master (not testing a PR) these tasks run the launch test on BASE_REV (commit with the PR).

The task with no number suffix (se) represents the launch test results task for the settings app. This is the task that will determine if a performance regression has been introduced. If this task shows orange on Treeherder then that means a performance regression was detected.

How the Results are Calculated

The Raptor launch test suite results are determined by comparing the results from the tests run on the base build vs the same tests run on the patch build.

Each individual Raptor launch test has a single result that we are concerned with, the p95 value for the ‘visuallyLoaded’ performance metric. All six of the results for the base tasks (se1 to se6) are taken and then the median of these is calculated and used as the median base build result. The same thing is applied to the patch builds; all six of the results of the patch tasks (se6 to se12) are taken and the median of those calculated and used as the median patch build result.

If the median patch build result is greater than the median base build result, that means that the median app launch time in the patch has increased. If this increase is greater than 15%, then it is flagged as a performance regression, and the result task (se) is marked as a failure on Treeherder.

To view the results task (se) results, click on the results task symbol (se) on Treeherder. Then on the panel that appears towards the bottom, click on “Job details”, and then “Inspect Task”. The Task Inspector opens in a new tab. Scroll down and you will see the console output for the results task. This includes the summary table for each base and patch task, the final median for each set, and the overall result (if a regression has been detected or not). If a regression has been detected, the regression percentage is displayed; this represents the increase in application launch time.

Questions or comments?

For more information feel free to join the #raptor channel on Mozilla IRC, or send me a message via the contact form on the About page of this blog. I hope you found this post useful, and don’t forget to go see Jurassic World in theatres June 12.

http://robwood.zone/raptor-on-gaia-ci/