Maja Frydrychowicz: Untangling WebDriver and the Browser Automation Landscape I Live In |
This piece is about too few names for too many things, as well as a kind of origin story for a web standard. For the past year or so, I’ve been contributing to a Mozilla project broadly named Marionette — a set of tools for automating and testing Gecko-based browsers like Firefox. Marionette is part of a larger browser automation universe that I’ve managed to mostly ignore so far, but the time has finally come to make sense of it.
The main challenge for me has been nailing down imprecise terms that have changed over time. From my perspective, “Marionette” may refer to any combination of two to four things, and it’s related to equally vague names like “Selenium” and “WebDriver”… and then there are things like “FirefoxDriver” and “geckodriver”. Blargh. Untangling needed.
Aside: integrating a new team member (like, say, a volunteer contributor or an intern) is the best! They ask big questions and you get to teach them things, which leads to filling in your own knowledge. Everyone wins.
Okay, so let’s work our way backwards, starting from the future. (“The future is now.”) We want to remotely control browsers so that we can do things like write automated tests for the content they run or tests for the browser UI itself. It sucks to have to write the same test in a different way for each browser or each platform, so let’s have a common interface for testing all browsers on all platforms. (Yay, open web standards!) To this end, a group of people from several organizations is working on the WebDriver Specification.
The main idea in this specification is the WebDriver Protocol, which provides a platform- and browser- agnostic way to send commands to the browser you want to control, commands like “open a new window” or “execute some JavaScript.” It’s a communication protocol1 where the payload is some JSON data that is sent over HTTP. For example, to tell the browser to navigate to a url, a client sends a POST request to the endpoint /session/{session id of the browser instance you're talking to}/url
with body {"url": "http://example.com/"}
.
The server side of the protocol, which might be implemented as a browser add-on or might be built into the browser itself, listens for commands and sends responses. The client side, such as a Python library for automating browsers, send commands and processes the responses.
This broad idea is already implemented and in use: an open source project for browser automation, Selenium WebDriver, became widely adopted and is now the basis for an open web standard. Awesome! (On the other hand, oh no! The overlapping names begin!)
Where does this WebDriver concept come from? You may have noticed that lots of web apps are tested across different browsers with Selenium — that’s precisely what it was built for back in 2004-20092. One of its components today is Selenium WebDriver.
(Confusingly3, the terms “Selenium Webdriver, “Webdriver”, “Selenium 2” and “Selenium” are often used interchangeably, as a consequence of the project’s history.)
Selenium WebDriver provides APIs so that you can write code in your favourite language to simulate user actions like this:
client.get("https://www.mozilla.org/") link = client.find_element_by_id("participate") link.click()
Underneath that API, commands are transmitted via JSON over HTTP, as described in the previous section. A fair name for the protocol currently implemented in Selenium is Selenium JSON Wire Protocol. We’ll come back to this distinction later.
As mentioned before, we need a server side that understands incoming commands and makes the browser do the right thing in response. The Selenium project provides this part too. For example, they wrote FirefoxDriver which is a Firefox add-on that takes care of interpreting WebDriver commands. There’s also InternetExplorerDriver, AndroidDriver and more. I imagine it takes a lot of effort to keep these browser-specific “drivers” up-to-date.
A while after Selenium 2 was released, browser vendors started implementing the Selenium JSON Wire Protocol themselves! Yay! This makes a lot of sense: they’re in the best position to maintain the server side and they can build the necessary behaviour directly into the browser.
It started with OperaDriver in 2009-2011, and then others followed such as ChromeDriver and Mozilla’s geckodriver with Marionette.4 This is where the motivation for a WebDriver standard comes from.
Selenium Webdriver (a.k.a. Selenium 2, WebDriver) provides a common API, protocol and browser-specific “drivers” to enable browser automation. Browser vendors started implementing the Selenium JSON Wire Protocol themselves, thus gradually replacing some of Selenium’s browser-specific drivers. Since WebDriver is already being implemented by all major browser vendors to some degree, it’s being turned into a rigorous web standard, and some day all browsers will implement it in a perfectly compatible way and we’ll all live happily ever after.
Is the Selenium JSON Wire Protocol the same as the W3C WebDriver protocol? Technically, no. The W3C spec is describing the future of WebDriver5, but it’s based on what Selenium WebDriver and browser vendors are already doing. The goal of the spec is to coordinate the browser automation effort and make sure we’re all implementing the same interface; each command in the protocol should mean the same thing across all browsers.
Now that I understand the context, my view of Marionette’s components is much clearer.
As you can see, “Marionette” may refer to many different things. I think this ambiguity will always make me a little nervous… Words are hard, especially as a loose collection of projects evolves and becomes unified. In a few years, the terms will firm up. For now, let’s be extra careful and specify which piece we’re talking about.
Thanks to David Burns for patiently answering my half-baked questions last week, and to James Graham and Andreas Tolfsen for providing detailed and delightful feedback on a draft of this article. Bonus high-five to Anjana Vakil for contributions to Marionette Test Runner this year and for inspiring me to write this post in the first place.
Terminology lesson: the WebDriver protocol is a wire protocol because it’s at the application level and requires several applications working together.
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |