Chris Cooper: RelEng & RelOps Weekly Highlights - September 18, 2015

Суббота, 19 Сентября 2015 г. 00:00 + в цитатник

Pending job numbers continued to be a concern this week. Investigations are underway to look for slowdowns unrelated to the enabling of e10s tests, which on its own has double the number of test run in many cases. More information below.

Modernize infrastructure: Dustin participated in the TaskCluster work-week, discussing plans for TaskCluster itself and for Releng’s work to port the CI and release processes to run on the TaskCluster platform.

Morgan gave a fantastic presentation on air mozilla describing how github / TaskCluster integration works: https://air.mozilla.org/taskcluster-github-continuous-integration-for-mozillians-by-mozillians-2/

Improve CI pipeline: We’re ready to un-hide OS X and Linux64 builds via TaskCluster in TreeHerder, elevating them to “tier 2” status. This is a necessary precursor to replacing the buildbot-generated versions of these builds.

Jordan landed a patch to enable bundleclone for mock-based builds, which may help fix problems with the Android nightly builds. (https://bugzil.la/1191859)

Alin and Vlad are working on releng configs to add new 10.10 hardware to the test pool (https://bugzil.la/1203128)

Release: Ben continues to work out a plan to cope with SHA-1 certificate deprecation.(https://bugzilla.mozilla.org/show_bug.cgi?id=1079858#c64)

We are entering the end-game for Firefox 41. Release candidate builds are underway.

Operational: Kim and Vlad increased the size of the tst-emulator-64 pool by 200 instances which has significantly reduced the wait times for Android tests that use this instance type. (https://bugzil.la/1205409)

Kim is also in the process of bringing up four new buildbot masters to serve these expanding pools and reduce some of the buildbot lag we have seen in our monitoring tools (https://bugzil.la/1205409)

We have had high pending counts for the past few weeks which have significantly increased wait times, especially for Windows tests on Try. Joel Maher (from Developer Productivity team) and Kim analyzed the data for the end to end test times for Windows for the past month. They discovered that total compute time per push has increased by around 13% or 2.5 compute hours on Windows, primarily driven by the addition of new e10s tests. Given that our pool of Windows machines has a fixed size, we are looking at ways to reduce the wait times given existing hardware constraints.

See you again next week!

http://coopcoopbware.tumblr.com/post/129371383000