Mark C^ot'e: A Vision for Engineering Workflow at Mozilla (part one) |
The OED’s second definition of “vision” is “the ability to think about or plan the future with imagination or wisdom.” Thus I felt more than a little trepidation when I was tasked with creating a vision for my team. What should this look like? How do I scope it? What should it cover? The Internet was of surprisingly little help; it seems that either no one thinks about tooling and engineering processes at this level, or (perhaps more likely) they keep it a secret when they do. The best article I found was from Microsoft Research in which they studied how tools are adopted at Microsoft, and their conclusion was essentially that they had no overarching strategy.
Around six months later, I presented a Vision for Engineering Workflow at our fortnightly managers' meeting. But first, some context: a bit about Mozilla’s Engineering Workflow team, and about the challenges we face.
The Engineering Workflow team was created in the Great Reorg of 2017, when, amongst other large changes, its predecessor, the Automation & Tools team (aka the A-Team) was split into two, with the part focussed more on test automation joining the newly formed Product Integrity org. The other half remained in the Engineering Operations org, along with a related team, managed by coop, that worked on the build and version-control systems. In February, these two teams were consolidated into a single team, with kmoir joining the team as a new manager while coop headed off to manage the Taskcluster team.
We named this new team “Engineering Workflow” to reflect that it is focussed on the first stages of the Firefox engineering pipeline, that is, tools and processes that most developers use on a day-to-day basis. Our mission is as follows:
The engineering workflow team exists to improve the quality, clarity, and efficiency of Firefox development through the integration and development of tools and automation.
More specifically, the major pieces of the engineering pipeline that we work on are
Just as importantly, there are many related systems that we don’t own. These include
Of course, we can and do work with many of these other teams on joint ventures. Over time I would like to better coordinate our respective road maps to deliver even more impact to engineering at Mozilla.
Mozilla is a unique place. Not only are we a nonprofit that works in the open, but we develop a massive application with contributors, both paid and volunteer, located all around the world. This means we also face unique challenges when it comes to figuring out what tools and automation to integrate, build, and/or improve to maximize impact. I’ll touch on a few here.
Tales of “religious wars” within software develop stretch back decades, so it is no surprise that many Mozillians have strong opinions about the way they prefer to work. What is less common is that Mozilla has generally shied away from defining official (or even officially supported) tools and processes. I won’t get into the merits of this approach, but it does impact tooling teams, who have to either support multiple workflows in their tools or unilaterally decide to prioritize some over others.
A few examples:
I am happy to report, however, that there is more and more support for consolidating workflows at Mozilla.
Firefox is a huge application. A full Mercurial clone currently takes up 3.6 GB of disk space. Without Mercurial metadata the codebase, including build, test, and third-party libs and apps, contains over 245 000 files in more than 17 000 directories totalling almost 20 million lines of code. There aren’t too many projects the size of Firefox, open source or otherwise.
Unsurprisingly, since Firefox remains very active, there are a lot of changes going into the codebase: about 180 per day in April 2018. Contrast this with about 25-30 per day going into the Linux kernel. This also doesn’t count pushes to the try server for testing works in progress. April saw 210 compute years in our CI system.
Finally, we have complicated security requirements. Mozilla is open by design, with many tools and processes exposed to the public (and the public Internet!). Our approach to governance does not restrict positions of authority and responsibility to employees. These complexities and subleties can be seen in BMO’s permission system, which is much more fine-grained than what is built into most issue-tracking and code-review tools.
All these factors create difficult problems when integrating third-party tools into our systems. In addition, although we do this less than we used to, our scale means that there are not always existing solutions out there that meet our needs, requiring us to write custom applications that need to be highly scalable and secure.
Related to the scale of Firefox development is its long legacy. The Mozilla Foundation is 20 years old, with the Netscape code dating back even further. Although Mozilla has grown dramatically as an entity, many workflows persist over the years, including the reviewing of patches in Bugzilla and the use of Mercurial queues. Understandably, when developers have used a certain workflow for many years, they are often skeptical of change. Yet newer contributors are more familiar with modern workflows, so modern tooling can help attract and retain both employees and volunteers, in addition to the various other advantages in terms of ergonomics, usability, and efficiency. Contending with these two perspectives can be difficult.
In addition to legacy workflows, we also have a number of legacy systems. Many of these systems continue to serve us well, and we are constantly making improvements to them. However, large-scale changes can be difficult, both because of the age of these systems and codebases, but also because over time they have been integrated with many other applications and used in ways we aren’t aware of and sometimes don’t expect. This makes planning challenging and requires a lot of communication.
I’m happy to say that this set of challenges has seen the most improvement of the ones I’ve highlighted. I’ll mention them regardless as we can always be improving, and an understanding of our history helps.
Decision-making at Mozilla has been challenging for a number of reasons, mainly due to its history and rapid growth. In particular, there was a common view that we aimed for consensus on all major decisions, which, while well-intentioned, did not scale, and in fact was contradicted by both our management and module systems. This has led both to stalled decisions and sudden decisions that avoided discussions altogether. I’ve previously written about my perspectives and experiences in making decisions at Mozilla. Thankfully, as I also noted, this culture is changing, and making effective, reasoned decisions is getting easier.
Within my own team, or at least its previous incarnation as the A-Team, we experienced our own difficulties making decisions and prioritizing work. There were no clear lines of authority and responsibility when it came to tooling, which also contributed to our team becoming too service oriented. Again, this is changing for the better with existence of both Engineering Operations and Product Integrity, whose directors are peers of those of the product-focussed departments.
Thus ends my preamble on the context of developing a Vision for Engineering Workflow. In my next post, I’ll delve into the Vision itself.
https://mrcote.info/blog/2018/05/31/a-vision-for-engineering-workflow-at-mozilla-part-one/
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |