Updated at Feb 1, 2017

In the first half of 2016, I participated in GSoC (Google Summer of Code), and help implement the major part of File Support in Servo. Initially, this topic doesn’t sound very cool to me (I mean … for example, moon shot is cool), but the real experience turned out to be very fruitful, involving browser architecture, API design, concurrency, resource management, to name a few. So I believe that it is worth a writeup – in which I will try to summarise technical aspects of the project, plus some other personal thinking.

BTW, I have been maintaining a project tracker, which is a detailed record of the whole process (But don’t expect to make much sense of it :P).

Overview

The File API is a collection of JavaScript APIs to manage the blob [1] and file resource. Note that file here is not the usual meaning in the context of day-to-day operating system, but representing some resource with file-ish attributes, e.g. modified date, file name. And the File API also defines blob, which can live both in memory or file system. Another interesting part of File API is Blob URL, which makes it possible to refer to these blob resource as hyperlinks in HTML.

However, since my project name is File Support, it is not enough to just make FileAPI happen. There are also other important functionalities like file upload, in which we pick a file through

Regarding file upload, I wrote a Rust binding for a file dialog UI library (although for some other reason I ended up using another library in Servo). Next, I need to implement [click] event handler of <input> element to capture the user action and route the request to the actual file picker routine. Then the returned file handlers from picker will be registered in the DOM.

Another aspect is file submission. I fixed some gluing parts in <form> element’s submission algorithm, fixed multipart encoding [2] and <FormData> element in Servo. So now you know how hard your browser works just to send a puppy’s picture to the server.

That is basically what the project is about (though a bunch of of technical details are eluded). I won’t touch much of them either in the following text, but rather focus on the stories behind these APIs and user interface (good for a casual reading). I will also discuss an abstract model of the key problems in our resource management architecture (good for a system hacker).

Assuming you are using a browser….

Now, you saw something like this on webpage:

What is that magic button on earth? Answer: It is one kind of input element, like

  • text box
  • radio button

They are all designed to accept data input from user, among which file input prompts you to select a file from local file system and (possibly) send them to remote server along with other input fields. Note that these fields are usually grouped as a form, and the sending action is done through form submission mechanism.

Okay, so now you know what this button is. You click on it, then the click event handler of the file-type input element will handle this event by activating the file picking routine. Normally, the native UI library of your OS, whether it is Windows or OS X etc., will have a file picking dialog like this:

When you select one (or more) of them, the actual file-system paths will be sent back to the browser. But note that the API caller in webpage can’t be given the full path, but just a reference to this file and its filename. Why? First, privacy: you don’t always want to reveal your file system structure to server. Second, security: the web page might contain malicious script. We must separate the ability of accessing file system by path from the web page script execution. If not, the malicious script might be able to inspect everything in your computer. By segregation, webpage script can only access whatever user actually picks through the dialog UI.

Now, the input element grabs a reference to the actual file. And when you click the submit button like this , the whole form, including file input element, will be encoded into a stream of bits. Then these stuff will be sent back to server. Server will decode the bit stream and know that you uploaded some file. Yay!

Assuming you are a Web developer…

MDN has an excellent tutorial on the usage of File API, and I also used it heavily to get some intuition of implementation and testing samples.

Nevertheless, I believe this set of APIs is not easy to comprehend and use correctly, esp. if you only have experience of developing native applications which has low-level I/O support (for example, stdio in C). Comparatively, everything becomes ridiculously complex in browser.

But note that all these complexities are for security and performance. Don’t blame now and keep reading why :)

When writing a native application, your application usually gets file path from command arguments or some global configuration, and accesses these files directly as long as the OS-level permission is proper. And your application might also maintain its own files for persistent storage or I/O purpose. When user uses these applications, the application actually delegates user, and thus can do whatever it wants (which is usually what user wants it to do). So it has more power. Also, since it manually manages the file resource, the performance is usually not a problem if implemented correctly.

But for Web application, it is different. The user will never hope that a single webpage will have full access to his/her computer, because … it is just a web page!!!. The user might access thousands of web pages per day, but might only download and execute one or two native applications on his/her machines. So, the browser has to set a lot of hard lines on how you can access file and what you can do with file in web script, just to ensure that, you can’t mess with the user’s computer by a malicious script, and every, even very mild request to file system, will be informed to the user.

Also, JavaScript is a language with GC, plus DOM environment (which has layers of abstractions) if in browser. Also, JS is by-default single thread, and will freeze the web page user interaction if it blocks on I/O. All these requires browser to provide some specialized APIs supporting more efficient ways of managing the file I/O. For example, the FileReader is asynchronous.

Assuming you are a curious system hacker…

Definitions

First, what is resource? We simply define it as a pile of bytes plus a type tag (like image/jpeg, text/plain).

Next, we define the resource container that we care about:

  • File: it has metadata like file path etc., and its content is stored on disk and accessed by file path
  • Buffer: its content lives in the memory

Note that resource container is kind of an abstract object which is tracked by JavaScript runtime (thus GC’d).

Then, we view these containers as different backends of Blob, i.e. a Blob is either File-based or Buffer-based. And here we provide ways of constructing Blob:

  • select_file(): which gets a file path from file picker and creates a File container from it. Note that the file path can be considered as a reference into the file system, and we assume that the container doesn’t cache the content
  • from_bytes(bytes): the bytes here can be a string object in JavaScript, or typed array. We basically encode its content as a vector of bytes here and create a Buffer backend
  • slice(b, pos): Here b is another Blob and pos defines how do we slice b’s content. The sliced content will be the content of new Blob.

Next, we introduce Blob URL, which when requested, will response with the content of associated Blob. We use the following APIs to establish/cancel this relationship:

  • createURL(b): returns a Blob URL associated with Blob b
  • revokeURL(u): revoke the Blob URL u

Now, we define two operations on them so it is useful:

  • read(b): read content of Blob b
  • load(u): read content of Blob associated with Blob URL u

Finally, we will define a method of Blob to constrain the above operations on it:

  • close(b)

Model and Properties

The things we will describe are mostly about referencing relationship.

First, we define the readability of Blob and loadability of Blob URL:

  • readability: If b is readable, read(b) should return its content; or read(b) should fail
  • loadability: If u is loadable, there should exist a readable Blob b that is associated with u, and load(u) should return b’s content; otherwise load(b) should fail

Lemma (construction): Newly constructed Blob should be readable, and newly created Blob URL should be readable

Lemma (close): After we call close(b), b should be unreadable.

Lemma (revoke): After we call revoke(u), u should be unloadable.

Lemma (slice): slice(b, pos) will return a new Blob referencing the sliced content of b, and inheriting b’s readability

Separation – One Step Closer to Servo’s Reality

Servo is a browser featuring parallelism, and the most coarse-gained level of parallelism is process-level. It means that, Servo is designed as several processes doing different things, which include two important genres: net process and script process. (more on Servo’s design, also a paper on this [3])

Don’t guess what net and script does solely by names. net process is responsible for general resource management, like sending load request to remote server, loading some images from image cache etc. At the same time, script process is responsible for “create and own the DOM and execute the JavaScript engine”. So it can be viewed as an interpreter of Web page.

Now, due to the security problem we mentioned above, we can only let the file path live in the net process. Thus the true file-based Blob will be in net side, and referenced by a stub file-based Blob on the script side. Also, another fact is that the request of resource, including these with Blob URL, has to be handled in the net side. So when you createURL(b) and b is a buffer-based Blob constructed in script side, the content of b can’t keep living in the script side, but has to be transferred to net side. So we have an upgraded model incorporating this architectural change as illustrated by the following graph:

Here are some key observations:

  1. slice, Blob URL and selected file can all be viewed as some form of referencing
  2. On the script side, references are entirely managed by JS runtime GC
  3. On the net side, we have to maintain our own reference count

Analysis of Implementation

So, now we conclude the problem as a reference-counting problem, and the crux of problem is the to maintain the consistency between two domains: one assumes a GC-based reference management mechanism, another has to be implemented manually. And the messages between them must be interpreted correctly.

And the implementation is hard to get right because:

  1. We use large degree of sharing to improve efficiency
  2. The underlying model can be concurrent/async

In fact, it took me at least one month from designing the Blob URL implementation to finally being confident that it is sound! This note documents part of my reasoning, which is not very readable but I think it helps me to improve the thing substantially through trying to formalize the essential parts of the system.

Final thoughts on GSoC

In this final section, I will share some thoughts on working on File Support in Servo as a GSoC project, and as a Servo community member.

Be brief: this is an awesome and memorable experience.

Servo community is friendly, vigorous, creative, a bit researchy, and expertise-rich! Also, as a part of Mozilla community, it has mature open-source culture, and very nice dev-infra to get you quickly started and work smoothly.

I have to express my thanks to all Servo members who encouraged, helped and guided me, especially my GSoC mentor Manishearth. Participating in the discussion, asking questions, and getting comments on my implementations, are a vital part of my project experience.

Now I learn more about dev-ops, code quality, peer-review, architecture, task managements, and technical decision making. These are all soft-skills of a professional programmer as I suppose, which can benefit me for a life-long time.

Another thing I like is the a bit researchy culture of Servo project (and it should be, since Servo is led by Mozilla Research!). When implementing something, I am always encouraged to think more about the general and abstract thing, and try to come up with an elegant, well-designed, and foundational solution, rather than let’s hack together something that works and call it a day. And I think anyone who is more research-oriented will like this culture.

References

  1. blob’s true history
  2. RFC 7578
  3. Brian Anderson, Lars Bergstrom, Manish Goregaokar, Josh Matthews, Keegan McAllister, Jack Moffitt, and Simon Sapin. 2016. Engineering the servo web browser engine using Rust. In Proceedings of the 38th International Conference on Software Engineering Companion (ICSE ‘16).