Gregory Szorc: Serving Mercurial Clones from a CDN |
For the past few months, Mozilla has been serving Mercurial clones from Amazon S3. We upload snapshots (called bundles) of large and/or high-traffic repositories to S3. We have a custom Mercurial extension on the client and server that knows how to exchange the URLs for these snapshots and to transparently use them to bootstrap a clone. The end result is drastically reduced Mercurial server load and faster clone times. The benefits are seriously ridiculous when you operate version control at scale.
Amazon CloudFront is a CDN. You can easily configure it up to be backed by an S3 bucket. So we did.
https://hg.cdn.mozilla.net/ is Mozilla's CDN for hosting Mercurial data. Currently it's just bundles to be used for cloning.
As of today, if you install the bundleclone Mercurial extension and hg clone a repository on hg.mozilla.org such as mozilla-central (hg clone https://hg.mozilla.org/mozilla-central), the CDN URLs will be preferred by default. (Previously we preferred S3 URLs that hit servers in Oregon, USA.)
This should result in clone time reductions for Mozillians not close to Oregon, USA, as the CloudFront CDN has servers all across the globe and your Mercurial clone should be bootstrapped from the closest and hopefully therefore fastest server to you.
Unfortunately, you do need the the aforementioned bundleclone extension installed for this to work. But, this should only be temporary: I've proposed integrating this feature into the core of Mercurial so if a client talks to a server advertising pre-generated bundles the clone offload just works. I already have tentative buy-in from one Mercurial maintainer. So hopefully I can land this feature in Mercurial 3.6, which will be released November 1. After that, I imagine some high-traffic Mercurial servers (such as Bitbucket) will be very keen to deploy this so CPU load on their servers is drastically reduced.
http://gregoryszorc.com/blog/2015/09/01/serving-mercurial-clones-from-a-cdn
|
Byron Jones: happy bmo push day! |
the following changes have been pushed to bugzilla.mozilla.org:
today’s push lands initial support for two-factor authentication on BMO. we currently support time-based one-time passwords (totp) with protection around just logging in. 2fa protection will be extended to protect other actions in the upcoming weeks.
visit the ‘two-factor authentication‘ section under your user preferences to enable 2fa.
discuss these changes on mozilla.tools.bmo.
https://globau.wordpress.com/2015/09/01/happy-bmo-push-day-158/
|
QMO: Firefox 41 Beta 7 Testday, September 4th |
I’m writing to let you know that this Friday, September 4th, we’ll be hosting the Firefox 41.0 Beta 7 Testday. The main focus of this event is going to be set on Flash on 64-bit Firefox builds and plug-ins testing. Detailed participation instructions are available in this etherpad.
No previous testing experience is required so feel free to join us on the #qa IRC channel and our moderators will make sure you’ve got everything you need to get started.
Hope to see you all on Friday! Let’s make Firefox better together!
https://quality.mozilla.org/2015/09/firefox-41-beta-7-testday-september-4th/
|
Daniel Stenberg: Blog refresh |
Dear reader,
If you ever visited my blog in the past and you see this, you should’ve noticed a pretty significant difference in appearance that happened the other day here.
When I kicked off my blog here on the site back in August 2007 and moved my blogging from advogato to self-host, I installed WordPress and I’ve been happy with it since then from a usability stand-point. I crafted a look based on an existing theme and left it at that.
Over time, WordPress has had its hefty amount of security problems over and over again and I’ve also suffered from them myself a couple of times, and a few times I ended up patching it manually more than once. At one point when I decided to bite the bullet and upgrade to the latest version it didn’t work to upgrade anymore and I postpone it for later.
Time passed, I tried again without success and then more time passed.
I finally fixed the issues I had with upgrading. With a series of manual fiddling I finally managed to upgrade to the latest WordPress and when doing so my old theme was considered broken/incompatible so I threw that out and started fresh with a new theme. This new one is based on one of the simple default ones WordPress ships for free. I’ve mostly just made it slightly wider and edited the looks somewhat. I don’t need fancy. Hopefully I’ll be able to keep up with WordPress better this time.
Additionally, I added a captcha that now forces users to solve an easy math problem to submit anything to the blog to help me fight spam, and perhaps even more to solve a problem I have with spambots creating new users. I removed over 3300 users yesterday that never posted anything that has been accepted.
Enjoy. Now back to our regular programming!
|
Seif Lotfy: Counting flows (Semi-evaluation of CMS, CML and PMC) |
Assume we have a stream of events coming in one at a time, and we need to count the frequency of the different types of events in the stream.
In other words: We are receiving fruits one at a time in no given order, and at any given time we need to be able to answer how many of a specific fruit did we receive.
The most naive implementation is a dictionary in the form of , and is most accurate and suitable for streams with limited types of events.
Let us assume a unique item consists of 15 bytes and has a dedicated uint32 (4 bytes) counter assigned to it.
At 10 million unique items we end up using 19 MB which is a bit much, but on the plus side its as accurate as it gets.
But what if we don't have the 19 MB. Or what if we have to keep track of several streams?
Maybe saving to a DB? Well when querying the DB upon request, something in the lines of:
SELECT count(event) WHERE event = ?)
The more items we add, the more resource intensive the query becomes.
Thankfully solutions come in the form of Probabalistic datastructures (sketches).
I won't get into details but to solve this problem I semi-evaluated the following data structures:
Test details:
For each sketch I linearly added a new flow with equivalently linear events. So the first flow got 1 event inserted. The second flow for 2 events inserted, all the way up to 10k-th flow with 10k events inserted.
flow 1: 1 event
flow 2: 2 events
...
flow 10000: 10000 events
All three data structures were configured to have a size of 217KB (exactly 1739712 bits).
A couple dozen runs yielded the following results (based on my unoptimized code esp. for PMC and CML)
CMS: 07s for 50005000 insertion (fill rate: 31%)
CML: 42s for 50005000 insertion (fill rate: 09%)
PMC: 18s for 50005000 insertion (fill rate: 54%)
CMS with
http://geekyogre.com/counting-flows-semi-evaluation-of-cms-cml-and-pmc/
|
QMO: An open letter about Mozilla QA |
Dear people of the web,
As some of you may already be aware, Mozilla has experienced a lot of change over the years. Most teams and projects within Mozilla have felt this change in some way, either directly or indirectly. The QA Team is no exception.
As a microcosm of the Mozilla Project, people involved in many disparate projects, QA has changed course many times. To many of you, these changes may have passed by unnoticed. Perhaps you noticed something was different about QA but were not able to understand how or why things had changed. Perhaps it was a feeling, that some of us seemed more distant, or that it just felt different.
This may come as a surprise to some, but there is no longer a single, unified QA team at Mozilla. After going through a few re-organizations, we are spread across the organization, embedded with — and reporting to — various product teams.
Those teams have benefited by having a dedicated QA person on staff full time. However, with so few of us to go around, many teams find themselves without any QA. In this state, we’ve lost the distinguished central QA organization that once was and in doing so we’ve lost a central QA voice.
As a result of these changes and a sense of perpetual reorganization we have reached a tipping point. We’ve lost some very talented and passionate people. Change within itself isn’t a bad thing. The loss of cohesion is. It is time to break this pattern, regain our cohesion and regain our focus on the community.
The core group of QA community members, paid and volunteer, will soon be getting together to formulate a mission statement. We’ll do this with a series of one-on-one conversations between core individuals who are interested in architecting a new QA community. This will serve as the guiding light of our journey toward a more optimistic future together.
In recognition of those who might feel excluded from this process, we want to assure you that there will be opportunity to contribute very early on. Conducting these one on ones is just the first step in a very long journey. We plan to bring everyone along who wants to be here, but this process requires great care and it will take time. If you’d like to help us build the future please get in touch with us.
Please read our wiki page to find out more about what we’re doing and where we’re going.
Sincerely,
Anthony Hughes and Matt Brandt
https://quality.mozilla.org/2015/08/an-open-letter-about-mozilla-qa/
|
Matt Thompson: What we’re working on |
Stuff we’re working on for the Sep 8 community call:
|
Manish Goregaokar: Designing a GC in Rust |
For a while I’ve been working on a garbage collector for Rust with Michael Layzell. I thought this would be a good time to talk of our design and progress so far.
“Wait”, you ask, “why does Rust need a garbage collector”? Rust is supposed to work without a GC, that’s one of its main selling points!
True. Rust does work pretty well without a GC. It’s managed to do without one so far, and we still have all sorts of well-written crates out there (none of which use a GC).
But Rust is not just about low-cost memory safety. It’s also about choosing your costs and
guarantees. Box
and stack allocation are not always sufficient, sometimes one needs to
reach for something like Rc
(reference counting). But even Rc
is not perfect; it can’t handle
cycles between pointers. There are solutions to that issue like using Weak
, but that only works
in limited cases (when you know what the points-to graph looks like at compile time), and isn’t very
ergonomic.
Cases where one needs to maintain a complicated, dynamic graph are where a GC becomes useful. Similarly, if one is writing an interpreter for a GCd language, having a GC in Rust would simplify things a lot.
Not to say that one should pervasively use a GC in Rust. Similar to Rc
, it’s best to use
regular ownership-based memory management as much as possible, and sprinkle Rc
/Gc
in places
where your code needs it.
This isn’t the first GC in Rust. Automatic memory management has existed before in various forms, but all were limited.
Besides the ones listed below, Nick Fitzgerald’s cycle collector based on this paper exists and is something that you should look into if you’re interested. There’s also an RFC by Peter Liniker which sketches out a design for an immutable GC.
Rust itself had a garbage collector until a bit more than a year ago. These “managed pointers”
(@T
) were part of the language. They were removed later with a plan to make GC a library feature.
I believe these were basically reference counted (cycle collected?) pointers with some language integration, but I’m not sure.
Nowadays, the only form of automatic memory management in Rust are via Rc
and Arc
which are nonatomic and atomic reference counted pointers respectively. In other words, they keep
track of the number of shared references via a reference count (incremented when it is cloned,
decremented when destructors run). If the reference count reaches zero, the contents are cleaned up.
This is a pretty useful abstraction, however, as mentioned above, it doesn’t let you create cycles without leaking them.
You can read more about Servo’s Spidermonkey bindings in this blog post (somewhat outdated, but still relevant)
In Servo we use bindings to the Spidermonkey Javascript engine. Since Javascript is a garbage collected language, the Rust representations of Javascript objects are also garbage collected.
Of course, this sort of GC isn’t really useful for generic use since it comes bundled with a JS runtime. However, the Rust side of the GC is of a design that could be used in an independent library.
The Rust side of the Spidermonkey GC is done through a bunch of smart pointers, and a trait called
JSTraceable
. JSTraceable
is a trait which can “trace” recursively down some data, finding and
marking all GC-managed objects inside it. This is autoderived using Rust’s plugin infrastructure, so
a simple #[jstraceable]
annotation will generate trace hooks for the struct it is on.
Now, we have various smart pointers. The first is JS
. This is opaque, but can be held by other
GC-managed structs. To use this on the stack, this must be explicitly rooted, via .root()
. This
produces a Root
, which can be dereferenced to get the inner object. When the Root
is created,
the contained object is listed in a collection of “roots” in a global. A root indicates that the
value is being used on the stack somewhere, and the GC starts tracing usage from these roots. When
the Root
is destroyed, the root is removed.
The problem with this is that JS
doesn’t work on the stack. There is no way for the GC to know
that we are holding on to JS
on the stack. So, if I copy a JS
to the stack, remove all
references to it from objects in the GC heap, and trigger a collection, the JS
will still be
around on the stack after collection since the GC can’t trace to it. If I attempt to root it, I may
get a panic or a segfault depending on the implementation.
To protect against this, we have a bunch of lints. The relevant one here protects
against JS
from being carried around on the stack; but like most lints, it’s not perfect.
To summarize: Spidermonkey gives us a good GC. However using it for a generic Rust program is ill advised. Additionally, Servo’s wrappers around the GC are cheap, but need lints for safety. While it would probably be possible to write safer wrappers for general usage, it’s pretty impractical to carry around a JS runtime when you don’t need one.
However, Spidermonkey’s GC did inspire me to think more into the matter.
For quite a while I’d had various ideas about GCs. Most were simplifications of Servo’s wrappers (there’s some complexity brought in there by Spidermonkey that’s not necessary for a general GC). Most were tracing/rooting with mark-and-sweep collection. All of them used lints. Being rather busy, I didn’t really work on it past that, but planned to work on it if I could find someone to work with.
One day, Michael pinged me on IRC and asked me about GCs. Lots of people knew that I was interested in writing a GC for Rust, and one of them directed him to me when he expressed a similar interest.
So we started discussing GCs. We settled on a tracing mark-and-sweep GC. In other words, the GC runs regular “sweeps” where it first “traces” the usage of all objects and marks them and their children as used, and then sweeps up all unused objects.
This model on its own has a flaw. It doesn’t know about GC pointers held on the stack as local variables (“stack roots”). There are multiple methods for solving this. We’ve already seen one above in the Spidermonkey design – maintain two types of pointers (one for the stack, one for the heap), and try very hard using static analysis to ensure that they don’t cross over.
A common model (used by GCs like Boehm, called “conservative GCs”) is to do something called “stack scanning”. In such a system, the GC goes down the stack looking for things which may perhaps be GC pointers. Generally the GC allocates objects in known regions of the memory, so a GC pointer is any value on the stack which belongs to one of these regions.
Of course, this makes garbage collection rather inefficient, and will miss cases like Box>
where the GCd pointer is accessible, but through a non-GC pointer.
We decided rather early on that we didn’t want a GC based on lints or stack scanning. Both are rather suboptimal solutions in my opinion, and very hard to make sound1. We were also hoping that Rust’s type system and ownership semantics could help us in designing a good, safe, API.
So, we needed a way to keep track of roots, and we needed a way to trace objects.
The latter part was easy. We wrote a compiler plugin (well, we stole Servo’s tracing plugin which
I’d written earlier) which autoderives an implementation of the Trace
trait on any
given struct or enum, using the same internal infrastructure that #[derive(PartialEq)]
and the
rest use. So, with just the following code, it’s easy to make a struct or enum gc-friendly:
#[derive(Trace)]
struct Foo {
x: u8,
y: Bar,
}
#[derive(Trace)]
enum Bar {
Baz(u8), Quux
}
For a foo
of type Foo
foo.trace()
, will expand to a call of foo.x.trace()
and
foo.y.trace()
. bar.trace()
will check which variant it is and call trace()
on the u8
inside
if it’s a Baz
. For most structs this turns out to be a no-op and is often optimized away by
inlining, but if a struct contains a Gc
, the special implementation of Trace
for Gc
will
“mark” the traceability of the Gc
. Types without Trace
implemented cannot be used in types
implementing Trace
or in a Gc
, which is enforced with a T: Trace
bound on Gc
.
So, we have a way of walking the fields of a given object and finding inner Gc
s. Splendid. This
lets us write the mark&sweep phase easily: Take the list of known reachable Gc
s, walk their
contents until you find more Gc
s (marking all you find), and clean up any which aren’t
reachable.
Of course, now we have to solve the problem of keeping track of the known reachable Gc
s, i.e.
the roots. This is a hard problem to solve without language support, and I hope that eventually we
might be able to get the language hooks necessary to solve it. LLVM has support for tracking
GCthings on the stack, and some day we may be able to leverage that in Rust.
As noted above, Spidermonkey’s solution was to have non-rooted (non-dereferencable) heap pointers, which can be explicitly converted to rooted pointers and then read.
We went the other way. All Gc
pointers, when created, are considered “rooted”. The instance of
Gc
has a “rooted” bit set to true, and the underlying shared box (GcBox
, though this is not a
public interface) has its “root count” set to one.
When this Gc
is cloned, an identical Gc
(with rooted bit set to true) is returned, and the
underlying root count is incremented. Cloning a Gc
does not perform a deep copy.
let a = Gc::new(20); // a.root = true, (*a.ptr).roots = 1, (*a.ptr).data = 20
// ptr points to the underlying box, which contains the data as well as
// GC metadata like the root count. `Gc::new()` will allocate this box
let b = a.clone(); // b.root = true, (*a.ptr).roots++, b.ptr = a.ptr
This is rather similar to how Rc
works, however there is no root
field, and the roots
counter
is called a “reference counter”.
For regular local sharing, it is recommended to just use a borrowed reference to the inner variable (borrowing works fine with rust-gc!) since there is no cost to creating this reference.
When a GC thing is put inside another GC thing, the first thing no longer can remain a root. This is handled by “unrooting” the first GC thing:
struct Foo {
bar: u32,
baz: Gc,
}
let a = Gc::new(20); // why anyone would want to GC an integer I'll never know
// but I'll stick with this example since it's simple
let b = Gc::new(Foo {bar: 1, baz: a});
// a.root = false, (*a.ptr).roots--
// b initialized similar to previous example
// `a` was moved into `b`, so now `a` cannot be accessed directly here
// other than through `b`, and `a` is no longer a root.
// To avoid moving a, passing `a.clone()` to `b` will work
Of course, we need a way to traverse the object passed to the Gc
, in this case Foo
, and look
for any contained Gc
s to unroot. Sound familiar? This needs the same mechanism that trace()
needed! We add struct-walking root()
and unroot()
methods to the Trace
trait which are auto-
derived exactly the same way, and continue. (We don’t need root()
right now, but we will need it
later on).
Now, during collection, we can just traverse the list of GcBox
s and use the ones with a nonzero
root count as roots for our mark traversal.
So far, so good. We have a pretty sound design for a GC that works … for immutable data.
Like Rc
, Gc
is by default immutable. Rust abhors aliasable mutability, even in single
threaded contexts, and both these smart pointers allow aliasing.
Mutation poses a problem for our GC, beyond the regular problems of aliasable mutability: It’s possible to move rooted things into heap objects and vice versa:
let x = Gc::new(20);
let y = Gc::new(None);
*y = Some(x); // uh oh, x is still considered rooted!
// and the reverse!
let y = Gc::new(Some(Gc::new(20)));
let x = y.take(); // x was never rooted!
// `take()` moves the `Some(Gc)` out of `y`, replaces it with `None`
Since Gc
doesn’t implement DerefMut
, none of this is possible — one cannot mutate the
inner data. This is one of the places where Rust’s ownership/mutability system works out awesomely
in our favor.
Of course, an immutable GC isn’t very useful. We can’t even create cycles in an immutable GC, so why would anyone need this in the first place2?
So of course, we needed to make it somehow mutable. People using Rc
solve this problem by using
RefCell
, which maintains something similar to the borrow semantics at runtime and is internally
mutable. RefCell
itself can’t be used by us since it doesn’t guard against the problem
illustrated above (and hence won’t implement Trace
, but a similar cell type would work).
So we created GcCell
. This behaves just like RefCell
, except that it will root()
before
beginning a mutable borrow, and unroot()
before ending it (well, only if it itself is not rooted,
which is tracked by an internal field similar to Gc
). Now, everything is safe:
#[derive(Trace)]
struct Foo {
a: u8,
b: GcCell>,
}
let x = Gc::new(20);
let y = Gc::new(Foo {a: 10, b: Gc::new(30)});
{
*y.b.borrow_mut() = x; // the `Gc(30)` from `y.b` was rooted by this call
// but since we don't actually use it here,
// the destructor gets rid of it.
// We could use swap() to retain access to it.
// ...
// x unrooted
}
// and the reverse case works too:
let y = Gc::new(GcCell::new(Some(Gc::new(20))));
let x = y.borrow_mut().take(); // the inner `Some(Gc(20))` gets rooted by `borrow_mut()`
// before `x` can access it
So now, mutation works too! We have a working garbage collector!
I believe this can be solved without lints, but it may require some upcoming features of Rust to be implemented first (like specialization).
In essence, destructors implemented on a value inside Gc
can be unsafe. This will only happen
if they try to access values within a Gc
— if they do, they may come across a box that
has already been collected, or they may lengthen the lifetime of a box scheduled to be collected.
The basic solution to this is to use “finalizers” instead of destructors. Finalizers, like in Java, are not guaranteed to run. However, we may need further drop hooks or trait specialization to make an airtight interface for this. I don’t have a concrete design for this yet, though.
Our model mostly just works in a concurrent situation (with thread safety tweaks, of course); in
fact it’s possible to make it so that the concurrent GC will not “stop the world” unless someone
tries to do a write to a GcCell
. We have an experimental concurrent GC in this pull
request. We still need to figure out how to make interop between both GCs safe, though we may
just end up making them such that an object using one GC cannot be fed to an object using the other.
So far we haven’t really focused on performance, and worked on ensuring safety. Our collection triggering algorithm, for example, was horribly inefficient, though we planned on improving it. The wonderful Huon fixed this, though.
Similarly, we haven’t yet optimized storage. We have some ideas which we may work on later. (If you want to help, contributions welcome!)
Currently, an object deriving Trace
should have Trace
able children. This isn’t always possible
when members from another crate (which does not depend on rust-gc) are involved. At the moment, we
allow an #[unsafe_ignore_trace]
annotation on fields which are of this type (which excludes it
from being traced – if that crate doesn’t transitively depend on rust-gc, its members cannot
contain GCthings anyway unless generics are involved). It should be possible to detect whether or
not this is safe, and/or autoderive Trace
using the opt-in builtin traits framework (needs
specialization to work), but at the moment we don’t do anything other than expose that annotation.
Stdlib support for a global Trace
trait that everyone derives would be awesome.
Designing a GC was a wonderful experience! I didn’t get to write much code (I was busy and Michael was able to implement most of it overnight because he’s totally awesome), but the long design discussions followed by trying to figure out holes in the GC design in every idle moment of the day were quite enjoyable. GCs are very hard to get right, but it’s very satisfying when you come up with a design that works! I’m also quite happy at how well Rust helped in making a safe interface.
I encourage everyone to try it out and/or find holes in our design. Contributions of all kind welcome, we’d especially love performance improvements and testcases.
http://manishearth.github.io/blog/2015/09/01/designing-a-gc-in-rust/
|
David Humphrey: Introducing a New Thimble and Bramble |
This week we're shipping something really cool with Mozilla, and I wanted to pause and tell you about what it is, and how it works.
The tldr; is that we took the Mozilla Foundation's existing web code editor, Thimble, and rewrote it to use Bramble, our forked version of the Brackets editor, which runs in modern web browsers. You can try it now at https://thimble.mozilla.org/
If you're the type who prefers animated pictures to words, I made you a bunch over on the wiki, showing what a few of the features look like in action. You can also check out Luke's great intro video.
If you're the type who likes words, the rest of this is for you.
I started working on this project two years ago. While at MozFest 2013 I wrote about an idea I had for a new concept app that merged Thimble and Brackets; at the time I called it Nimble.
I was interested in merging these two apps for a number of reasons. First, I wanted to eliminate the "ceiling" users had when using Thimble, wherein they would graduate beyond its abilities, and be forced to use other tools. In my view, Thimble should be able to grow and expand along with a learner's abilities, and a teacher's needs.
Second, people were asking for lots of new features in Thimble, and I knew from experience that the best code is code you don't have to write. I wanted to leverage the hard work of an existing community that was already focused on building a great web coding platform. Writing a coding environment is a huge challenge, and our team wasn't equipped to take it on by ourselves. Thankfully the Brackets project had already solved this.
Brackets was an easy codebase to get started on, and the community was encouraging and willing to help us with patches, reviews, and questions (I'm especially thankful for @randyedmunds and @busykai).
Brackets is written in an AMD module system, and uses requirejs, react, CodeMirror, LESS, jQuery, Bootstrap, loadash, acorn, tern, etc. One of the things I've loved most about working with the Brackets source is that it uses so much of the the best of the open web. It's ~1.3 million lines of code offer APIs for things things like:
In short, Brackets isn't an editor so much as a rich platform for coding and designing front-end web pages and apps. Bracket's killer feature is its ability to render a live preview of what's in your editor, including dynamic updates as you type, often without needing to save. The preview even has an awareness of changes to linked files (e.g., external stylesheets and scripts).
Another thing I loved was that Brackets wasn't trying to solve code editing in general: they had a very clear mandate that favoured web development, and front-end web development in particular. HTML, CSS, and JavaScript get elevated status in Brackets, and don't have to fight with every other language for features.
All of these philosophies and features melded perfectly with our goal of making a great learning and teaching tool for web programming.
Obviously there are a ton of code editing tools available. If we start with desktop editors, there are a lot to choose from; but they all suffer from the same problem: you have to download 10s of megs of installer, and then you have to install them, along with a web server, in order to preview your work. Consider what's involved in installing each of these (on OS X):
Thimble, on the other hand, is ~1M (877K for Bramble, the rest for the front-end app). We worked extremely hard to get Brackets (38.5M if you install it) down to something that fits in the size of an average web page. If we changed how Brackets loads more significantly, we could get it smaller yet, but we've chosen to keep existing extensions working. The best part is that there is no install: the level of commitment for a user is the URL.
In addition to desktop editors, there are plenty of popular online options, too:
The list goes on. They are all great, and I use, and recommend them all. Each of these tools has a particular focus, and none of them do exactly what the new Thimble does; specifically, none of them tries to deal with trees of files and folders. We don't need to do what these other tools do, because they already do it well. Instead, we focused on making it possible for users to create a rich and realistic environment for working with arbitrary web site/app structures without needing to install a and run a web server.
I've always been inspired by @jswalden's httpd.js. It was written back before there was node.js, back in a time when it wasn't yet common knowledge that you could do anything in JS. The very first time I saw it I knew that I wanted to find some excuse to make a web server in the browser. With nohost, our in-browser web server, we've done it.
In order to run in a browser, Bramble has to be more than just a code editor; it also has to include a bunch of stuff that would normally be provided by the Brackets Shell (similar to Electron.io) and node.js. This means providing a:
and glue to connect those three. Bracket's uses Chrome's remote debugging protocol and node.js to talk between the editor, browser, and server. This works well, but ties it directly to Chrome.
At first I wasn't sure how we'd deal with this. But then an experimental implementation of the Bracket's LiveDevelopment code landed, which switched away from using Chrome and the remote dev tools protocol to any browser and a WebSocket
. Then, in the middle of the docs, we found an offhand comment that someone could probably rewrite it to use an iframe
and postMessage
...a fantastic idea! So we did.
Making it possible for an arbitrary web site to work in a browser-based environment is a little like Firefox's Save Page... feature. You can't just deal with the HTML alone--you also have to get all the linked assets.
Consider an example web page:
style.css">
In this basic web page we have three external resources referenced by URL. The browser needs to be able to request styles/style.css
, images/cat.png
, and script.js
in order to fully render this page. And we're not done yet.
The stylesheet might also reference other stylesheets using @import
, or might use other images (e.g., background-image: url(...)
).
It gets worse. The script might need to XHR a JSON file from the server in order to do whatever f()
requires.
Bramble tries hard to deal with these situations through a combination of static and dynamic rewriting of the URLs. Eventually, if/when all browsers ship it, we could do a lot of this with ServiceWorkers
. Until then, we made do with what we already have cross browser.
First, Bramble's nohost server recursively rewrites the HTML, and its linked resources, in order to find relative filesystem paths (images/cat.png
) and replace them with Blobs and URL objects that point to cached memory resources read out of the browser filesystem.
Parsing HTML with regex is a non-starter. Luckily browsers have a full parser built in, DOMParser. Once we have an in memory DOM vs. HTML text string, we can accurately querySelectorAll
to find things that might contain URLs (img
, link
, video
, iframe
, etc., avoiding a
due to circular references) and swap those for generated Blob URLs from the filesystem. When we're done, we can extract rewritten HTML text from our live in-memory DOM via documentElement.outerHTML
, obtaining something like this:
/mozillathimblelivepreview.net/346526f5-3c14-4073-b667-997324a5bfa9">
All external resources now use URLs to cached memory resources. This HTML can then be itself turned into a Blob and URL object, and used as the src
for our iframe
browser (this works everywhere except IE, where you have to document.write
the HTML, but can use Blob URLs for everything else).
For CSS we do use regex, looking for url(...)
and other places where URLs can lurk. Thankfully there aren't a lot, and it's just a matter of reading the necessary resources from disk, caching to a Blob URL, and replacing the filesystem paths for URLs, before generating a CSS Blob URL that can be used in the HTML.
Despite what everyone tells you about the DOM being slow, the process is really fast. And because we own the filesystem layer, whenever the editor does something like a writeFile()
, we can pre-generate a URL for the resource, and maintain a cache of such URLs keyed on filesystem paths for when we need to get them again in the future during a rewrite step. Using this cache we are able to live refresh the browser quite often without causing any noticeable slowdown on the main thread.
As an aside, it would be so nice if we could move the whole thing to a worker and be able to send an HTML string, and get back a URL. Workers can already access IndexedDB
, so we could read from the filesystem there, too. This would mean having access to DOMParser
(even if we can't touch the main DOM from a worker, being able to parse HTML is still incredibly useful for rewriting, diff'ing, etc).
Finally, we do dynamic substitutions of relative paths for generated Blob URLs at runtime by hijacking XMLHttpRequest
and using our postMessage
link from the iframe
to the editor in order to return response data for a given filename.
And it all works! Sure, there's lots of things we won't ever be able to cope with, from synchronous XHR to various types of DOM manipulation by scripts that reference URLs as strings. But for the general case, it works remarkably well. Try downloading and dragging a zipped web site template from http://html5up.net/ into the editor. Bramble doesn't claim to be able to replace a full, local development environment for every use case; however, it makes it unnecessary in most common cases. It's amazing what the modern web can do via storage, file, drag-and-drop, parser, and worker APIs.
I talk about Thimble and Bramble as different things, and they are, especially at runtime. Bramble is an embeddable widget with an iframe API, and Thimble hosts it and provides some UI for common operations.
I've put a simple demo of the the Bramble API online for people to try (source is here). Bramble uses, but doesn't own its filesystem; nor does it have any notion of where the files came from or where they are going. It also doesn't have opinions about how the filesystem should be laid out.
This is all done intentionally so that we can isolate the editor and preview from the hosting app, running each on a different domain. We want users to be able to write arbitrary code, execute and store it; but we don't want to mix code for the hosting app and the editor/preview. The hosting app needs to decide on a filesystem layout, get and write the files, and then "boot" Bramble.
I've written previously about how we use MessageChannel
to remotely host an IndexedDB
backed filesystem in a remote window running on another domain: Thimble owns the filesystem and database and responds to proxied requests to do things via postMessage
.
In the case of Thimble, we store data in a Heroku app using postgres on the server. Thimble listens for filesystem events, and then queues and executes file update requests over the network to sync the data upstream. Published projects are written to S3, and we then serve them on a secure domain. Because users can upload files to their filesystem in the editor, it makes it easier to transition to an https:// only web.
When the user starts Thimble, we request a project as a gzipped tarball from the publishing server, then unpack it in a Worker
and recreate the filesystem locally. Bramble then "mounts" this local folder and begins working with the local files and folders, with no knowledge of the servers (all data is autosaved, and survives refreshes).
Now that we've got the major pieces in place, I'm interested to see what people will do with both Thimble and Bramble. Because we're in a full browser vs. an "almost-browser" shell, we have access to all the latest toys (for example, WebRTC and the camera). Down the road we could use this for some amazing pair programming setups, so learners and mentors could work with each other directly over the web on the same project.
We can also do interesting things with different storage providers. It would be just as easy to have Bramble talk to Github, Dropbox, or some other cloud storage provider. We intentionally kept Thimble and Bramble separate in order to allow different directions in the future.
Then there's all the possibilities that custom extensions opens up (did I mention that Bramble has dynamic extension loading? because it does!). I'd love to see us use bundles of extensions to enable different sorts of learning activities, student levels, and instructional modes. I'm also really excited to see what kind of new curriculum people will build using all of this.
In the meantime, please try things out, file bugs, chat with us on irc #thimble on moznet and have fun making something cool with just your browser. Even better, teach someone how to do it.
Let me close by giving a big shout out to the amazing students (current and former) who hacked on this with me. You should hire them: Gideon Thomas, Kieran Sedgwick, Kenny Nguyen, Jordan Theriault, Andrew Benner, Klever Loza Vega, Ali Al Dallal, Yoav Gurevich, as well as the following top notch Mozilla folks, who have been amazing to us: Hannah Kane, Luke Pacholski, Pomax, Cassie McDaniel, Ashley Williams, Jon Buckley, and others.
|
Air Mozilla: Mozilla Weekly Project Meeting |
The Monday Project Meeting
https://air.mozilla.org/mozilla-weekly-project-meeting-20150831/
|
This Week In Rust: This Week in Rust 94 |
Hello and welcome to another issue of This Week in Rust! Rust is a systems language pursuing the trifecta: safety, concurrency, and speed. This is a weekly summary of its progress and community. Want something mentioned? Tweet us at @ThisWeekInRust or send us an email! Want to get involved? We love contributions.
This Week in Rust is openly developed on GitHub. If you find any errors in this week's issue, please submit a PR.
104 pull requests were merged in the last week.
Changes to Rust follow the Rust RFC (request for comments) process. These are the RFCs that were approved for implementation this week:
catch_panic
.Every week the team announces the 'final comment period' for RFCs and key PRs which are reaching a decision. Express your opinions now. This week's FCPs are:
[Op]Assign
traits to allow overloading assignment operations like a += b
.std::net
module to bind more low-level interfaces.x...y
expression to create an inclusive range.Box::leak
to leak Box
to &'static mut T
.ToOpt
trait and have bool
implement it.macro_rules!
should support gensym for creating items..drain(range)
and .drain()
respectively as appropriate on collections.If you are running a Rust event please add it to the calendar to get it mentioned here. Email Erick Tryzelaar or Brian Anderson for access.
No jobs listed for this week. Tweet us at @ThisWeekInRust to get your job offers listed here!
"And God said, Noah you must transport these animals across a large body of water... but they are not Send. And Noah replied, I shall build a great Arc!" — durka42 on #rust
Thanks to tomprogrammer for the tip. Submit your quotes for next week!.
http://this-week-in-rust.org/blog/2015/08/31/this-week-in-rust-94/
|
Wladimir Palant: Why you probably want to disable jQuery.parseHTML even though you don't call it |
TL;DR: jQuery.parseHTML
is a security hazard and will be called implicitly in a number of obvious and not so obvious situations.
Hey, jQuery is great! It’s so great that Stack Overflow users will recommend it no matter what your question is. And now they have two problems. Just kidding, they will have the incredible power of jQuery:
$("#list").append('' + item.name + ' ');
The above is locating a list in the document, creating a new list item with dynamic content and adding it to the list — all that in a single line that will still stay below the 80 columns limit. And we didn’t even lose readability in the process.
Life is great until some fool comes along and mumbles “security” (yeah, that’s me). Can you tell whether the code above is safe to be used in a web application? Right, it depends on the context. Passing HTML code to jQuery.append
will use the infamous innerHTML
property implicitly. If you aren’t careful with the HTML code you are passing there, this line might easily turn into a Cross-Site Scripting (XSS) vulnerability.
Does item.name
or item.info
contain data from untrusted sources? Answering that question might be complicated. You need to trace the data back to its source, decide who should be trusted (admin user? localizer?) and make sure you didn’t forget any code paths. And even if you do all that, some other developer (or maybe even yourself a few months from now) might come along and add another code path where item.name
is no longer trusted. Do you want to bet on this person realizing that they are making an entirely different piece of code insecure?
It’s generally better to give jQuery structured data and avoid taking any chances. The secure equivalent of the code above would be:
$("#list").append($("", {title: item.info}).text(item.name));
Not quite as elegant any more but now jQuery will take care of producing a correct HTML structure and you don’t need to worry about that.
There is one remarkable thing about jQuery APIs: each function can take all kinds of parameters. For example, the .append() function we used above can take a DOM element, a CSS selector, HTML code or a function returning any of the above. This keeps function names short, and you only need to remember one function name instead of four.
The side effect is however: even if you are not giving jQuery any HTML code, you still have to keep in mind that the function could accept HTML code. Consider the following code for example:
$(tagname + " > .temporary").remove();
This will look for elements of class temporary
within a given tag and remove them, right? Except that the content of tagname
better be trusted here. What will happen if an attacker manages to set the value of tagname
to "
? You probably guessed it, the “selector” will be interpreted as HTML code and will execute arbitrary JavaScript code."
There is more than a dozen jQuery functions that will happily accept both selectors and HTML code. Starting with jQuery 1.9.0 security issues here got somewhat less likely, the string has to start with <
in order to be interpreted as HTML code. Older versions will accept anything as HTML code as long as it doesn’t contain #
, the versions before jQuery 1.6.1 didn’t even have that restriction.
To sum up: you better use jQuery 1.9.0 or above, otherwise your dynamically generated selector might easily end up being interpreted as an HTML string. And even with recent jQuery versions you should be careful with dynamic selectors, the first part of the selector should always be a static string to avoid security issues.
With almost all of the core jQuery functionality potentially problematic, evaluating security of jQuery-based code is tricky. Ideally, one would simply disable unsafe functionality so that parsing HTML code by accident would no longer be possible. Unfortunately, there doesn’t seem to be a supported way yet. The approach I describe here seems to work in the current jQuery versions (jQuery 1.11.3 and jQuery 2.1.4) but might not prevent all potential issues in older or future jQuery releases. Use at your own risk! Oh, and feel free to nag jQuery developers into providing supported functionality for this.
There is a comment in the source code indicating that jQuery.parseHTML
function being missing is an expected situation. However, removing this function doesn’t resolve all the issues, and it disables safe functionality as well. Removing jQuery.buildFragment
on the other hand doesn’t seem to have any downsides:
delete jQuery.buildFragment;
// Safe element creation still works $('
', {src: "dummy"});
// Explicitly assigning or loading HTML code for an element works $(document.body).html('
'); $(document.body).load(url);
// These will throw an exception however $('
'); $(document.body).append('
'); $.parseHTML('
');
Of course, you have to adjust all your code first before you disable this part of the jQuery functionality. And even then you might have jQuery plugins that will stop working with this change. There are some code paths in the jQuery UI library for example that rely on parsing non-trivial HTML code. So this approach might not work for you.
The example creating a single list item is nice of course but what if you have to create some complicated structure? Doing this via dozens of nested function calls is impractical and will result in unreadable code.
One approach would be placing this structure in your HTML document, albeit hidden. Then you would need to merely clone it and fill in the data:
Other templating approaches for JavaScript exist as well of course. It doesn’t matter which one you use as long as you don’t generate HTML code on the fly.
|
Christian Heilmann: Quickie: Fading in a newly created element using CSS |
Update: I got an email from James at VIDesignz who found another solution to this problem using the :empty selector. I added it at the end of the article.
As part of our JSFoo workshop today I was asked to look into an issue a team had that you can not apply a CSS transition to a newly created element when you change the CSS settings in JavaScript. As I was dealing with professionals, they created a simple JSFiddle to show the problem:
As you see just changing the property (in this case the opacity) is not enough to have a transition. There are a few solutions to this shown in the Fiddle, too, like forcing a reflow, which of course could be a terrible idea.
I played with this and found the solution to be to not change the properties in JavaScript (which is kind of dirty anyways) but leave it all to CSS instead. The biggest part of the solution is not to use a transition but an animation instead and trigger it by applying a class to the newly created element right after adding it to the DOM:
Update: As mentioned by Kyle Simpson on Twitter, there is a way to do the same with a transition, but you need to wrap the creation and applying the class into requestAnimationFrame calls which in turn means some polyfilling:
Update2 You can also use the :empty selector in CSS to achieve the same when you add the new element as a child:
http://christianheilmann.com/2015/08/30/quicky-fading-in-a-newly-created-element-using-css/
|
Aaron Klotz: On WebExtensions |
There has been enough that has been said over the past week about WebExtensions that I wasn’t sure if I wanted to write this post. As usual, I can’t seem to help myself. Note the usual disclaimer that this is my personal opinion. Further note that I have no involvement with WebExtensions at this time, so I write this from the point of view of an observer.
I shall begin with the proposition that the legacy, non-jetpack environment for addons is not an API. As ridiculous as some readers might consider this to be, please humour me for a moment.
Let us go back to the acronym, “API.” Application Programming Interface. While the usage of the term “API” seems to have expanded over the years to encompass just about any type of interface whatsoever, I’d like to explore the first letter of that acronym: Application.
An Application Programming Interface is a specific type of interface that is exposed for the purposes of building applications. It typically provides a formal abstraction layer that isolates applications from the implementation details behind the lower tier(s) in the software stack. In the case of web browsers, I suggest that there are two distinct types of applications: web content, and extensions.
There is obviously a very well defined API for web content. On the other hand, I would argue that Gecko’s legacy addon environment is not an API at all! From the point of view of an extension, there is no abstraction, limited formality, and not necessarily an intention to be used by applications.
An extension is imported into Firefox with full privileges and can access whatever it wants. Does it have access to interfaces? Yes, but are those interfaces intended for applications? Some are, but many are not. The environment that Gecko currently provides for legacy addons is analagous to an operating system running every single application in kernel mode. Is that powerful? Absolutely! Is that the best thing to do for maintainability and robustness? Absolutely not!
Somewhere a line needs to be drawn to demarcate this abstraction layer and improve Gecko developers’ ability to make improvements under the hood. Last week’s announcement was an invitation to addon developers to help shape that future. Please participate and please do so constructively!
When I first heard rumors about WebExtensions in Whistler, my source made it very clear to me that the WebExtensions initiative is not about making Chrome extensions run in Firefox. In fact, I am quite disappointed with some of the press coverage that seems to completely miss this point.
Yes, WebExtensions will be implementing some APIs to be source compatible with Chrome. That makes it easier to port a Chrome extension, but porting will still be necessary. I like the Venn Diagram concept that the WebExtensions FAQ uses: Some Chrome APIs will not be available in WebExtensions. On the other hand, WebExtensions will be providing APIs above and beyond the Chrome API set that will maintain Firefox’s legacy of extensibility.
Please try not to think of this project as Mozilla taking functionality away. In general I think it is safe to think of this as an opportunity to move that same functionality to a mechanism that is more formal and abstract.
|
Francois Marier: Letting someone ssh into your laptop using Pagekite |
In order to investigate a bug I was running into, I recently had to give my colleague ssh access to my laptop behind a firewall. The easiest way I found to do this was to create an account for him on my laptop and setup a pagekite frontend on my Linode server and a pagekite backend on my laptop.
Setting up my Linode server in order to make the ssh service accessible and proxy the traffic to my laptop was fairly straightforward.
First, I had to install the
pagekite package (already in
Debian and Ubuntu) and open up a port on my firewall by adding the following
to both /etc/network/iptables.up.rules
and
/etc/network/ip6tables.up.rules
:
-A INPUT -p tcp --dport 10022 -j ACCEPT
Then I created a new CNAME
for my server in DNS:
pagekite.fmarier.org. 3600 IN CNAME fmarier.org.
With that in place, I started the pagekite frontend using this command:
pagekite --clean --isfrontend --rawports=virtual --ports=10022 --domain=raw:pagekite.fmarier.org:Password1
After installing the pagekite and openssh-server packages on my laptop and creating a new user account:
adduser roc
I used this command to connect my laptop to the pagekite frontend:
pagekite --clean --frontend=pagekite.fmarier.org:10022 --service_on=raw/22:pagekite.fmarier.org:localhost:22:Password1
Finally, my colleague needed to add the folowing entry to ~/.ssh/config
:
Host pagekite.fmarier.org
CheckHostIP no
ProxyCommand /bin/nc -X connect -x %h:10022 %h %p
and install the netcat-openbsd package since other versions of netcat don't work.
On Fedora, we used netcat-openbsd-1.89 successfully, but this newer package may also work.
He was then able to ssh into my laptop via ssh roc@pagekite.fmarier.org
.
I was quite happy settings things up temporarily on the command-line, but it's also possible to persist these settings and to make both the pagekite frontend and backend start up automatically at boot. See the documentation for how to do this on Debian and Fedora.
http://feeding.cloud.geek.nz/posts/letting-someone-ssh-into-your-laptop-using-pagekite/
|
Cameron Kaiser: 38.2.1 is available |
Don't forget to test 38.2.1 with incremental GC disabled. See the previous post. Enjoy our new sexy wiki, too. Sexy. Yes.
http://tenfourfox.blogspot.com/2015/08/3821-is-available.html
|
Emma Irwin: Participation Leadership Framework 0.1 |
In the last heartbeat, as part of our Q3 goals for leadership development, I interviewed a diverse set of people across Mozilla, asking what they think the skills, knowledge and attitudes of effective Participation Leadership at Mozilla are. Two things really stood out during this process. The first was how many people (staff, contributors and alumni) are truly, truly dedicated to the success of each other and Mozilla’s mission, which was really inspiring and helped inform the quality of this Framework. The second was how many opportunities and resources already exist (or are being created) for leadership development, that if bundled together, with more specifically targeted curriculum and focused outcomes will provide powerful learning by Participating experiences.
This Heartbeat iterated on themes that emerged during those interviews. I thank those who provided feedback on Discourse, and in Github, all of which brought us to this first 0.1 version.
Foundations of Participation Leadership are the core skills, knowledge and attitudes that lend to success on both personal goals, and goals for Participation at Mozilla.
Building Blocks of Participation Leadership are units of learning, that together provide a whole vision for leadership, but individually build skills, attitude and knowledge that inform specific learning outcomes as needed.
Examples of skills, leadership and knowledge for each:
Personal Leadership
Essential Mozilla
Building for Action and Impact
Empowering Teams and People
Working Open
Developing Specialization
We would love your comments, suggestions and ideas on where we are so far. In the next heartbeat we’ll begin building and running workshops with these as guide, and further iterating towards 1.0.
Image Credit: Lead Type by jm3
|
Mozilla Addons Blog: AMO T-shirt Update |
Just want to give a quick update on the snazzy t-shirts designed by Erick Le'on Bolinaga. They were finally done printing this week, and are headed to a fulfillment center for shipping. We expect them to begin shipping by the end of next week.
Thanks for your patience!
https://blog.mozilla.org/addons/2015/08/28/amo-t-shirt-update/
|
Air Mozilla: Webmaker Demos August 28 2015 |
Webmaker Demos August 28 2015
|
Dan Minor: MozReview Montr'eal Work Week |
Under the watchful gaze (and gentle cooing) of the pigeons, the MozReview developers gathered in Montr'eal for a work week. My main goal for the week was to make substantial progress towards Autoland to Inbound, my primary project for this quarter, maybe even a deployment of an initial iteration to the development server.
While we didn’t get quite that far, we did make a lot of progress on a lot of fronts, including finally getting Bugzilla API key support deployed. This is the first work week I’ve done where we just stayed and worked together in an AirBNB apartment rather than get hotel rooms and make use of shared space in a Mozilla office. I really enjoyed this, it was a nice casual work environment and we got a lot of focused work done.
Some things I worked on this week, in varying degrees of completion:
Bug 1198086 This adds an endpoint for making Autoland to “non-Try” tree requests which will allow us to build the UI for Autoland to Inbound. A while back I fixed Bug 1183295 which added support for non-Try destinations in the Autoland service itself. This means, outside of bug fixes, the backend support for Autoland to Inbound is implemented and we can focus on the UI.
Bug 1196263 is my other main project this quarter. We want to add a library which enables people to write their own static analysis bots that run against MozReview. This is based on work that GPS did in the winter to create a static analysis bot for Python. We still need to rework some of the messages we’re sending out on Pulse when a review is published, at the moment we’ll end up re-reviewing unchanged commits and spamming lots of comments. This was a problem with the original Python bot and needs to be fixed before bots can be enabled.
Bug 1168486 involves creating a “Custom Hosting Service” for review repositories. This will let us maintain metadata about things like whether or not a repo has an associated Try repository so we can disable triggering try runs on reviews where this doesn’t make sense.
Bug 1123139 is a small UI fix to remove unnecessary information from the Description field. We’ve decided to reserve the Description field for displaying the Mercurial commit message which will hopefully encourage people to write more descriptive messages for their changes. This will also move the “pull down these commits” hint to the Information section on the right of the page. Like most small UI fixes, this consumed an embarrassing amount of time. I’ve come to realize that no matter how many bad UIs I leave under my pillow at night, the UI fairy will not come and fix them, so I’ll just have to get better at this sort of thing.
http://www.lowleveldrone.com/mozilla/mozreview/2015/08/28/mozreview-workweek.html
|