Laura Thomson: 2014: Engineering Operations Year in Review
|
|
Суббота, 20 Декабря 2014 г. 05:00
+ в цитатник
On the first day of Mozlandia, Johnny Stenback and Doug Turner presented a list of key accomplishments in Platform Engineering/Engineering Operations in 2014.
I have been told a few times recently that people don’t know what my teams do, so in the interest of addressing that, I thought I’d share our part of the list. It was a pretty damn good year for us, all things considered, and especially given the level of organizational churn and other distractions.
We had a bit of organizational churn ourselves. I started the year managing Web Engineering, and between March and September ended up also managing the Release Engineering teams, Release Operations, SUMO and Input Development, and Developer Services. It’s been a challenging but very productive year.
Here’s the list of what we got done.
Web Engineering
- Migrate crash-stats storage off HBase and into S3
- Launch Crash-stats “hacker” API (access to search, raw data, reports)
- Ship fully-localized Firefox Health Report on Android
- Many new crash-stats reports including GC-related crashes, JS crashes, graphics adapter summary, and modern correlation reports
- Crash-stats reporting for B2G
- Pluggable processing architecture for crash-stats, and alternate crash classifiers
- Symbol upload system for partners
- Migrate l10n.mozilla.org to modern, flexible backend
- Prototype services for checking health of the browser and a support API
- Solve scaling problems in Moztrap to reduce pain for QA
- New admin UI for Balrog (new update server)
- Bouncer: correctness testing, continuous integration, a staging environment, and multi-homing for high availability
- Grew Air Mozilla community contributions from 0 to 6 non-staff committers
- Many new features for Air Mozilla including: direct download for offline viewing of public events, tear out video player, WebRTC self publishing prototype, Roku Channel, multi-rate HLS streams for auto switching to optimal bitrate, search over transcripts, integration with Mozilla Popcorn functionality, and access control based on Mozillians groups (e.g. “nda”)
DXR
- Modeless, explorable UI with all-new JS
- Case-insensitive searching
- Proof-of-concept Rust analysis
- Improved C++ analysis, with lots of new search types
- Multi-tree support
- Multi-line selection (linkable!)
- HTTP API for search
- Line-based searching
- Multi-language support (Python already implemented, Rust and JS in progress)
- Elasticsearch backend, bringing speed and features
- Completely new plugin API, enabling binary file support and request-time analysis
SUMO
- Offline SUMO app in Marketplace
- SUMO Community Hub
- Improved SUMO search with Synonyms
- Instant search for SUMO
- Redesigned and improved SUMO support forums
- Improved support for more products in SUMO (Thunderbird, Webmaker, Open Badges, etc.)
- BuddyUP app (live support for FirefoxOS) (in progress, TBC Q1 2015)
Input
- Dashboards for everyone infrastructure: allowing anyone to build charts/dashboards using Input data
- Backend for heartbeat v1 and v2
- Overhauled the feedback form to support multiple products, streamline user experience and prepare for future changes
- Support for Loop/Hello, Firefox Developer Edition, Firefox 64-bit for Windows
- Infrastructure for automated machine and human translations
- Massive infrastructure overhaul to improve overall quality
Release Engineering
- Cut AWS costs by over 70% during 2014 by switching builds to spot instances and using intelligent bidding algorithms
- Migrated all hardware out of SCL1 and closed datacenter to save $1 million per year (with Relops)
- Optimized network transfers for build/test automation between datacenters, decreasing bandwidth usage by 50%
- Halved build time on b2g-inbound
- Parallelized verification steps in release automation, saving over an hour off the end-to-end time required for each release
- Decommissioned legacy systems (e.g. tegras, tinderbox) (with Relops)
- Enabled build slave reboots via API
- Self-serve arbitrary builds via API
- b2g FOTA updates
- Builds for open H.264
- Built flexible new update service (Balrog) to replace legacy system (will ship first week of January)
- Support for Windows 64 as a first class platform
- Supported FX10 builds and releases
- Release support for switch to Yahoo! search
- Update server support for OpenH264 plugins and Adobe’s CDM
- Implement signing of EME sandbox
- Per-checkin and nightly Flame builds
- Moved desktop firefox builds to mach+mozharness, improving reproducibility and hackability for devs.
- Helped mobile team ship different APKs targeted by device capabilities rather than a single, monolithic APK.
Release Operations
- Decreased operating costs by $1 million per year by consolidating infrastructure from one datacenter into another (with Releng)
- Decreased operating costs and improved reliability by decommissioning legacy systems (kvm, redis, r3 mac minis, tegras) (with Releng)
- Decreased operating costs for physical Android test infrastructure by 30% reduction in hardware
- Decreased MTTR by developing a simplified releng self-serve reimaging process for each supported build and test hardware platforms
- Increased security for all releng infrastructure
- Increased stability and reliability by consolidating single point of failure releng web tools onto a highly available cluster
- Increased network reliability by developing a tool for continuous validation of firewall flows
- Increased developer productivity by updating windows platform developer tools
- Increased fault and anomaly detection by auditing and augmenting releng monitoring and metrics gathering
- Simplified the build/test architecture by creating a unified releng API service for new tools
- Developed a disaster recovery and business continuation plan for 2015 (with RelEng)
- Researched bare-metal private cloud deployment and produced a POC
Developer Services
- Ship Mozreview, a new review architecture integrated with Bugzilla (with A-team)
- Massive improvements in hg stability and performance
- Analytics and dashboards for version control systems
- New architecture for try to make it stable and fast
- Deployed treeherder (tbpl replacement) to production
- Assisted A-team with Bugzilla performance improvements
I’d like to thank the team for their hard work. You are amazing, and I look forward to working with you next year.
At the start of 2015, I’ll share our vision for the coming year. Watch this space!
http://www.laurathomson.com/2014/12/2014-engineering-operations-year-in-review/
-
Запись понравилась
-
0
Процитировали
-
0
Сохранили
-