Thursday, April 1, 2010

GwtQuake: Taking the Web to the Next Level

Back in November of 2009, Joel Webber and I were at a GWT Summit focused on improving the Web and UI latency. Someone was giving a presentation on WebGL, and I saw Joel sitting in the back of the room looking seriously distracted. When I approached him, he said "Did you see this?" and swung his notebook around to show Jake2 -- Quake2 ported to Java running as a Java WebStart Application. I had seen this project years ago, but never knew it had gotten this far.

Right then and there, I knew what he was thinking, I almost didn't need to ask -- he was suggesting that we port Jake2 to the Web using GWT!

The Path to the Web

The idea that you could use GWT to compile something like Jake2 to Javascript seems straightforward, since GWT after all, compiles Java to Javascript, however there is still a lot of work massaging the code to compile, as well as mapping I/O to the appropriate Web APIs.

As you may know, GWT only implements a subset of the Java JRE, and Jake2 relies on many Java classes that GWT just doesn't have:

  • The LWJGL library for 3D

  • and java.nio.Buffer classes

  •* for networking

  • AWT for keyboard stuff

  • File system APIs for loading data

  • OpenAL for audio

More importantly, it relies on a synchronous API paradigm for doing I/O, and as we know, Javascript is built around event driven I/O.

Mapping I/O

Joel reorganized Jake2 I/O system around event driven pumping to load files. The first thing we did was get the level and model files loading, and rendering using a 2D <canvas> wireframe renderer (no WebGL). After that, Stefan Haustein contributed new JRE classes to implement Java NIO Buffers around WebGL's new typed javascript arrays, as well as a GL renderer based around WebGL.

Joel amazingly got multiplayer up and running on WebSockets and we had great fun play testing deathmatches between Atlanta, Mountain View, and even Sydney.

I went ahead and converted all of the sound to MP3 and implemented an initial sound renderer just by using the DOM Audio element. Later, I refactored the OpenAL code into a base class and provided an implementation of the OpenAL Inverse Distance Clamped sound model that adjusted volume gain of sounds based on 3D position, still, using nothing more than the audio.volume property.

Stefan added an implementation of RandomAccessFile based on LocalStorage to save game files, and an implementation of video playback using the <video> tag.

Most of the work was completed by January by part time contributions, but has been tweaked since then. Stefan, for example, went ahead and ported the entire single player game (no server needed) as static content.

A more detailed post-mortem will be presented during one of my Google I/O sessions, be sure to attend!

Is it slow?

I thought it would be, honestly, I never thought we would get this far, but it turns out that playable framerates are achievable. On WebKit/Chrome I can get 20-25fps on a MacBook, and on my Mac Pro desktop, I get about 45fps. Joel reports that a Linux notebook gets up to 60fps.

What this means for the Web

For years, people have assumed the browser was a poor platform for this kind of thing, and that you'd need something like Flash, Silverlight, JavaFX, or native code. While it is true that you should not expect the browser to rival triple-A titles like Far Cry or Call of Duty in the browser, there is no reason why lots of casual games that used to be implemented in Flash, or are now implemented in Objective-C on the iPhone/iPad can't be done using similar techniques we've used.

Moreover, because it's the Web, all you need to do to invite someone to your game, to share it, is to send a link. You could Tweet a link directly to your game, which when clicked, would make someone join the game in progress. No installs necessary. While this can be done in Flash, it just feels "natural" when it's in the browser.

I hope that this port encourages some people to become even bolder and crazier in the types of Web Apps they're trying to build, because if Quake2 is possible in Javascript using browser APIs, then even more amazing applications are waiting for you to develop, so get started!

The source code for our port is at:


Tuesday, December 8, 2009

GWT 2.0: So good it's ridiculous

GWT 2.0 was released today, and even though I've been using it for months and contributing to it, I didn't realize the significance of this release until I saw the Google Campfire event, because I've been exposed to each new feature incrementally, rather than all at once. If you take a step back, look at what has been accomplished since 1.7:

Development Mode

Who else but Google could build a plugin that runs on 3 different browsers on 3 different operating systems, deeply integrated with the Javascript VMs on IE, WebKit, and Firefox, that runs Java in a hybrid mode, where Java code runs in any JVM, and Javascript code is injected and ran in the browser as Javascript over a network connection? Do not underestimate the difficulty of doing this, especially when considering that Javascript can hold references to Java and vice-versa, causing problems for accurate garbage collection. Development Mode is an amazing productivity tool, made possible by some really awesome technology under the hood. Let's get a w00t! for the engineering that went into this.

UI Binder

Following the GWT mantra "Why do at runtime what you can do at compile time?" UIBinder is an extensible declarative language for constructing user interfaces, injecting those constructs into your Java code, and optimizing and bundling resources like stylesheets, images, and localized text. While it may seem like a throwback to use XML/XHTML in an era when HTML5 tagsoup and JSON are emerging as favorites, you have to consider what has been bought in exchange. UiBinder constructs HTML at compile time, which means it minimizes the number of DOM operations needed to get your UI up and running. It treats stylesheets as an intelligence resource, obfuscating and compressing them just as the GWT compiler does for JS, as well as providing local scope for CSS classes so that you can compose stylesheets written by different parties together without worrying about conflicts.

Google Eclipse Plugin

Technically, it's separate from GWT, but I'll list it here because of its tight integration with GWT. It provides intelligent integration between your Java source, JSNI methods, and UiBinder files. Again, this product alone would be a major event unto itself, but it's almost under-the-radar in Google's Campfire event.

Code Splitting

The value that whole-program compilation for JS brings to the table is no better demonstrated than it is here, by the ability of the GWT compiler to statically compute transitive callgraphs and dependencies from any callsite you desire, and split off that chunk of code for deferred loading. Ask yourself, if I handed you a hand-written large JS application like Maps, or GMail, or Y! Mail, how long would it take you to fully modularize all of the libraries used, get all of the dependencies exactly right (if you need feature F, load library F, but F needs G and H, so load libraries G and H). You can point to popular JS libraries using complex JS module loaders with dependency management, but keep in mind, that is all done by hand, on very mature libraries. GWT Code Splitting brings this functionality to mere mortals in just a few lines of code.

Constraint-Based Layout

Many modern JS libraries have punted on making CSS work for general compose-able widgets, and rely on JS to perform layout calculations for widgets. This works well, but has the downside of laggy performance, as the browser rendering engines are optimized for layout calculations, and often performing JS layout can cause excessive browser layouts. You'll see this in split panels or scroll panes sometimes as the divider or scrollbar lagging the mouse position. Joel Webber of the GWT team has finally cracked the code to make CSS based layouts work in a predictable fashion, and as a result, you can now compose arbitrary layouts in GWT with predictability, a productivity enhancement unto itself, even if some flexibility is lost, end-user performance and development speed are gained.


Few people realize that GWT has been doing automated image sprites for a long time. ImageBundle, introduced in GWT 1.4, allowed GWT users to simply refer to images on disk, and the compiler would pack all of those images into a single file, transforming references to the images automatically into sprites. Then with 1.5/1.6, FooBundle, or ImmutableResourceBundle was introduced to the GWT Incubator. FooBundle/IRB took spriting to the next level by allowing all of your resources to be bundled into one or more files, or even inlined in your compiled JS as a gigantic data URI. GWT 2.0 introduces IRB into the core SDK, now called ClientBundle, it has been enhanced and polished. Simply put, ClientBundle allows GWT developers to reduce the number of HTTP requests for resources to just 1 or 2 which are cached forever, pack images optimally, optimize and minify CSS, and much more. Did you know that ClientBundle works with internationalization? That GWT ClientBundle can automatically flip images for Right-To-Left locales like Arabic, and flip CSS rules as well? Crazy awesome!

Optimized for Compression

I'm some what tooting my own horn on this feature, since I contributed it, but GWT 2.0 produces obfuscated Javascript in a way that no other tool on the planet does. In addition to renaming pretty much every variable in your entire program in a few symbols as possible, it actually re-orders code to increase gzip compression ratios. And that's just the beginning. The ability to just keep plugging in optimizations to this toolset is unbounded. Automated PNG optimization in the future? Check. New Deflate algorithms? Check. The list goes on.

HtmlUnit for Testability

You want testing? GWT's got you covered. Java tests? Check. Selenium tests? Check. Web-Mode test in real browser? Yep. But what's really new, is that GWT can run tests in a simulated browser environment thanks to HtmlUnit. This has two immediate benefits: 1) You can run your tests anywhere and not have to worry about prepping a test environment or platform specific details and 2) Tests can startup much quicker because they don't need to use a real browser. Again, it would seem like an almost irrelevent footprint if it weren't for the fact that a lot of effort is going into making this work correctly.

Speed Tracer, 'nuff said!

1. It's awesome. 2. It's actually written in GWT. 3. It enables easy collaboration and sharing of traces. 4. It has a plugin mechanism for analyzing traces and providing automated warnings and suggestions like YSlow!. 5. It's awesome. 6. It's awesome!

But wait, there's more! These are just the main bullet points, there's lots of other changes under the hood that are also important, let's look at a few:

Story of Your Compile, Compile Reports

Think of it as Compile-Time SpeedTracer. SOYC gives you lots of detail about what each part of your app is contributing to the output in terms of size, and what the interdependencies are. This provides valuable feedback as to where to look to reduce bloat, or split-off code better. This would be the equivalent of a JS tool that looked at your jQuery application, and told you exactly how many bytes were used by each method in each jQuery plugin you're using, and which modules and functions pulled in the others. Think about it.

Better stack traces, and stack trace emulation for IE6

The GWT team simply isn't content with the status quo. When it doesn't exist, invent it. Internet Explorer does not provide proper Javascript stack frame traces. This means when something goes wrong in your application, it can be difficult to get a detailed report about what call sequence triggered it. GWT 2.0 contains the ability to switch on stack-trace emulation, which causes the compiler to insert line number, file, and callstack recording into every program method. When an error occurs in a GWT application compiled for IE6, you can actually reverse obfuscated emulated stack trace output back into real Java stack traces that tell you file and line number. Com'on, you're not excited yet?

Schedule Finally

Javascript is an event driven programming language. When the Javascript interpreter runs, it is the context of an event pumped through an event queue that enters the interpreter/VM through some handler callback, or default entry point. GWT 2.0 recognizes this fact by becoming the gatekeeper to every entry/exit of the application code, and by doing this, it can perform quite a few useful things. One is, it can implement the idea of an uncaught exception handler and enforce it, so that any exception that causes the JS interpreter to exit/yield can be diverted to a custom handler of your choice. More importantly, GWT can take control of how code is scheduled for execution, beyond setTimeout()/setInterval(). GWT 2.0 introduces the concept of Schedule Finally? What is it? Simply this: Defer execution of a piece of code to run after all other code in the current execution context is run, but before the next event is pumped. This is used, for example, by the GWT style injector, so that CSS style injection can be deferred to after the current event is handled, but run before the next is. This can allow efficient batching of operations that mutate the DOM, among other things.

I could go on, tons of compiler optimizations, Directly Evalable RPC, Conditional Deferred Binding Properties, HTML Standards Mode Support. One could make the argument, that almost everything in this list can be done, or has been done, elsewhere, or in parts of other tools. But that's why GWT is so beneficial. It unifies where other approaches fragment. Sure, you could combine your own code splitting, with your own handrolled JS loader, with someone else's image sprite tool, with another guy's test framework, and someone else's minifier, and this guy's trace analyzer, and that guy's widget toolset. And if you think you could achieve the same outcome, you'd be absolutely right.

GWT isn't about doing what isn't possible elsewhere. It's about making you productive doing it. It isn't about extolling the beauties of Java, it's about pragmatically recognizing the leverage you get from the existing toolchain, IDEs, test harnesses, and types. It isn't about hiding the DOM, CSS, and the browser from you. It's about allowing you to write your abstractions yourself more productively, just like you'd do in hand-coded JS anyway. Someday, there may even be the ability for GWT to compile other languages to Java, or even process JS with type annotations.

GWT makes productive, large scale web development possible, and enables next-gen applications to be built reliably.


GWT Exporter 2.09 released

Improvements since last version:

Support for runtime method overloading.
Methods on the same class with identical names can be exported as a single Javascript method using run-time type-detection and dispatch. You pay a price in the form of a dispatch table encoded as a JSON object, but only for methods with overloads.
Feeling lazy? exportAll() will export every single exportable type in your classpath so that you don't have to invoke GWT.create() manually.
All exports compiled to a separate JS fragment, loaded asynchronously, and invokes window.onexport() when done.
Exporter.export() deprecrated
Now simply invoking GWT.create(SomeClass.class) is sufficient for export.
Supports GWT 2.0 $entry function
$entry preserves Java exception handler behavior as well as the correct event-loop entry/exit, like running scheduleFinally commands on exit.
Requires GWT 2.0 RC2 or higher
Because $entry and code-splitting are now integrated features.

Structural Typing

I'm also about 90% done implementing Structural Types for exports, which allows idiomatic interoperability between JS and exported APIs. Structural Typing will be available in an upcoming 2.10 release real soon now.

The future

GWT Exporter was originally written for some dire needs I had, and the codebase is now over 2 years old. The latest additions have been difficult, as the codebase is getting messier. The 2.10 release will be the last release, afterwards, I will use the knowledge gained from the library to work on a rewrite for GWT core that may or may not make it into a future SDK distribution.

The goals for the future releases will be (1) Never pay for what you don't use, that is, exports should only bloat code if they absolutely must. 2) Provide flexible way to customize output for various idiomatic Javascript styles. You should be able to generate APIs that look like they were designed by hand. 3) Look at ways to process JSDocs and provide additional ways to automatically bridge JS and Java objects. 4) Improve error checking, diagnostic messages, and quality of auto generated JS docs.

GWT Exporter is available at

Thursday, November 19, 2009

The Traveling Salesman Problem and Javascript Compression

In my last post, On Reducing the Size of Compressed Javascript, I covered a technique I had been researching on re-ordering Javascript code to enhance the efficiency of the gzip/deflate compressors. The basic approach is to come up with some way of measuring the distance between two functions, and then trying to group functions together by shortest distance.

A key question is, what's a good metric? I knew that the ideal metric would take into account how the LZ77 algorithm works, but for expediency, I went with a reasonable and well known string distance: the levenshtein distance.

GZip vs 7-zip

Another finding from my previous experiments was that 7-zip sports a much superior deflate implementation, it can compress a given input using the same algorithm as gzip with non-trivial size improvements. The problem is, 7-zip is implemented in C++, and I wanted to write a GWT Linker to automatically produce super-compressed Javascript.

In digging around for a Java 7-zip deflate implementation, I ran into this paper on Paul Sladen's page. titled: Compressing File Collections with a TSP-Based Approach. This paper published in 2004 details a similar technique of enhancing TAR compression of a collection of files by re-ordering the TAR to maximize compression. More on that later, but the paper itself has an obscure reference to something called the LZ Distance metric. Tracking down that paper "G. Cormode, M. Paterson, S. Sahinalp, and U. Vishkin. Communication complexity of doc- ument exchange. In Proc. of the ACM–SIAM Symp. on Discrete Algorithms, January 2000." leads us to the definition of LZ Distance.

LZ Distance

The LZ Distance between two strings x and y denoted by LZDIST(x,y) is as follows:
The minimum number of single characters or substrings of y or of the partially built string, which are required to build x from left to right.

As an example, consider x=aabcdefgabc and y=aaaaaaaaaaaaaaaa. We try to parse x in terms of y, and keep a running partially built string called 'p'. Here are the steps:

  1. Prefix 'aa' of x is contained in y, so p = (aa), x = bcdefgabc
  2. No prefix of x found in y or p, so just add 'b' p = (aa)(b), x = cdefgabc
  3. No prefix of x found in y or p ,so just add 'c', p = (aa)(b)(c), x = defgabc
  4. No prefix of x found in y or p ,so just add 'd', p = (aa)(b)(c)(d), x = efgabc
  5. No prefix of x found in y or p ,so just add 'e', p = (aa)(b)(c)(d)(e), x = fgabc
  6. No prefix of x found in y or p ,so just add 'f', p = (aa)(b)(c)(d)(e)(f), x = gabc
  7. No prefix of x found in y or p ,so just add 'g', p = (aa)(b)(c)(d)(e)(f)(g), x = abc
  8. 'abc' is found at index '1' in p, so p = (aa)(b)(c)(d)(e)(f)(g)(abc)

Here, the LZDIST(x,y) = 8. But what of LZDIST(y, x)?

  1. Prefix 'aa' of y is contained in x, so p = (aa), y = aaaaaaaaaaaaaa
  2. Prefix 'aa' of y is contained in x, so p = (aa)(aa), y = aaaaaaaaaaaa
  3. Prefix 'aaaa' of y is contained in p, p = (aa)((aa)(aaaa) y = aaaaaaaa
  4. Prefix 'aaaaaaaa' of y is contained in p, so p = (aa)(aa)(aaaa)(aaaaaaaa)

LZDIST(y,x) = 4. Interesting, but what does this buy us?


Recall that GZIP's deflate encoder uses an algorithm called LZ77 which looks at the current input tokens, and searches backwards for previous substrings that match the current input string, replacing them with back references. What LZDIST(x,y) gives us, is a measure of how many back references for X can be found in Y, that is, how X can be compressed by references to substrings in Y or in partially built decompression outputs.

Now you can see why this might help improve code re-ordering. If we can order Javascript functions in order to maximize back-references, we might be able to squeeze a little more out of gzip.

Clustering with LZDIST

Remember the earlier paper's title had the acronym TSP? This paper discusses how re-ordering can be viewed as an optimal tour on a graph which means that solving it is as hard as the Traveling Salesman Problem - NP-Complete. Uh oh, that sounds like trouble, but that's only NP Complete for the optimal solution. A good estimate on the true solution might also produce good results, and this paper shows that a greedy tour heuristic produces up to a 10% benefit in compression.

For expediency reasons, I decided not to implement a TSP model at this point, but went with simple pair-based greedy approach detailed in my previous blog post, substituting LZDIST for Levenshtein Distance. I also implemented an LZ-like greedy algorithm where I keep a running string which is the concatenation of all of the functions which have been added to the output, and use LZDIST to find the next closest function to append within a 32k window. This mimics how a gzip compressor might see the input.

Results for GWT Showcase Application

TechniqueSizeRatioRelative Improvement (vs previous row)
No Compression196778100%
gzip -96440132.7%32.7%
gzip -9 + levensthein6140931.2%4.7%
7zip + levenshtein5853629.7%4.7%
7zip + lzdist5792329.4%1.1%

Looking at relative compression, levenshtein vs lzdist, the difference 57923/58536 = 1.1% which seems worthwhile to me. In a large script, this can save 1-2 kilobytes. I threw in LZMA which boasts 12% better compression in this instance than DEFLATE.

Notes on implementation

The LZDIST function is expensive to compute in a naive fashion. It is similar to the longest common substring problem. The brute force version can be on the order of O(n^3), while a dynamic programming version can run in O(n^2). To make this run efficiently and not irritate your users, you need to use a Generalized Suffix Tree approach. Compressors like 7-zip are already using Suffix trees to implement the LZ77 search algorithm, so those of you looking for hints to do an efficient implementation should take a look at the LZMA SDK (both C++ and Java versions are available)


Utilizing clustering compression techniques, LZ distance, and 7-zip, it is possible to reduce the GWT Showcase application from 64401 bytes to 57923 bytes, a savings of about 10.1%.


Monday, August 17, 2009

On Reducing the Size of Compressed Javascript (by up to 20%)

Recently, I've been studying ways of reducing the download size of Javascript applications produced by Google Web Toolkit, while preserving or improving startup time. There are a number of ways to do this, the first of which is to transform Javascript code into a form that is naturally more succinct, while eliminating unused code, or deferring the load of some code until later.

Unfortunately, there is a tradeoff inherent in some of the fancier ways of making JS more succinct that use run-time code generation. This is put to good use in libraries like jQuery. For a large application, like Google Wave, this would increase startup time. Another technique is to teach a compiler even better optimizations for reducing redundant code, work which is ongoing in Google Web Toolkit.

Other attempts to reduce JS size have looked at ways to remove extraneous whitespace, shorten identifiers, and tokenize/pack JS statements, and while the former two are win-win one-way transforms, the latter has the trade-off requirement of being a reversible transform, meaning that it has to be decoded by an additional JS stage after load, a step that hurts startup performance.

Instead, I was drawn to a reversible transform the browser already includes support for: gzip compression, and decided to ask the question: what effect does the large-scale structure of the JS output code have on the DEFLATE algorithm of GZIP which is used to serve up compressed script? The answer it turns out, is substantial.

Reversible Transforms

If we didn't care about startup performance, we could spend all the time in the world unpacking JS code by including a custom tailored compressor, perhaps the LZMA (Lempel-Ziv Markov Chaining) algorithm or PPM (Prediction by Partial Matching), but unfortunately, these algorithms would be very slow to run in Javascript.

That leaves us with the built in GZIP compression that most web browsers include support for. The question is, can we improve compression while remaining compatible with the browser's decoder? There is an existing example of improving GZIP by injecting a reversible transform: bzip2 and the Burrows-Wheeler Transform.

Which brings up an interesting idea, can we do something like BWT's sort but for Javascript, in a way that doesn't require an extra pass to 'undo' the sorting?

Deflate: A digression

Before answering that question, it would be helpful to look at some of the restrictions of the deflate algorithm, and how code ordering could affect the outcome.

The deflate algorithm is a combination of two compression algorithms: LZ77 and Huffman coding. Huffman coding is a variable length code technique where symbols are replaced with codes based on their frequency of occurrence. So for example, the most common letter in the English alphabet is the letter e, so replacing e with a short code, but giving q, the least frequent character, a longer code, leads to shorter text overall. Huffman encoding is order independent, so that 'eq' is the same length as 'qe'.

LZ77 on the other hand, is a sliding window compression algorithm based on replacing strings with backwards references to previous strings in the input. For example, the string "this is a test" contains the substring 'is' repeated twice in a row, separated by a space, so that the second occurance of 'is' can be replaced with a length (2 characters, and a backwards distance (-3 positions), called the length-distance pair. The compressor typically scans backwards in the input within a certain window (e.g. 8,192 characters or 32,768 characters) looking for matches and then encoding them as length-distance pairs. The compressor has some freedom as to how hard it will search for a match before giving up (something I'll get to later).

One important effect of the sliding window limit is that if two Javascript functions with common substrings are separated by more than this distance, they cannot be matched.

But DEFLATE has another trick up its sleeve. It encodes the output of the LZ77 algorithm using Huffman encoding, but uses one Huffman tree for the character literals and length codes, and another Huffman tree for the backwards distances.

Which suggests another potential gain is to try and arrange for the backwards distances to be both small, and frequently the same, so as it produce shorter Huffman codes.

A sort with no undo

Thus far, intuition would tell us that if we could rearrange the input in order to bring more closely matched text closer together, we might be able to push up compression ratios, but how to do this without something to reverse the sort? Fortunately, unlike BWT, we are not working on plain text, but machine readable program code. We already know of something that rearranges code and moves it around, it's called a compiler!

We would not want a one-way text compression sort which say, brings Hamlet's prologue, climax, and epilogue together and randomly rearranges the rest, but your browser has no problem running your Javascript code if function foo is declared after bar, or before it. Thus, as long as two statements do not have order-dependencies, we can arrange them freely, in fact, all top-level function declarations can be rearranged arbitrarily.

Sort by Clustering

So, we've finally arrived at the point in which we have to devise our sort algorithm. What we want to do is, for any function Foo ensure that the best-match for this function in the whole program appears closest within the sliding window that the compressor will use. But bringing two functions that are most similar together in a greedy algorithm fashion won't necessarily produce the optimal result, as it is possible that by moving function Bar closer to function Foo, you've moved it away from lots of other functions that were good matches as well. As an example, consider these strings:

"Hello World"
"Hello World is a common test string used."
...several thousand strings later including the phrase "hello world" ...
"Common test strings used are metasyntactic variables like Foo and Bar."
"Variables like Foo and Bar are very common."

If we selectively moved the second string close to the first, we might prevent it from matching the other good matches later, especially if the window fills up.

One idea I started to think about was to repurpose Document Clustering techniques towards code. Document Clustering is commonly used in information retrieval systems to find related documents. Typically, a document is encoded using some technique to measure word importance, such as representing each word by its term frequency inverse document frequency. Then, any two documents can be compared by some distance metric, for example, taking the tf-idf weightings of terms as a vector in N-space and computing the cosine between them.

In this case, we'd let each function be a separate document, and the entire program be like the corpus of documents. We'd then choose some encoding to weigh Javascript grammar nodes by importance in a way that would produce good LZ77 matches, and then proceed in a bottom-up clustering fashion. First, we'd construct all the pairs of functions which match best. Pick a function, pair it with its best match, call that Cluster 1. Pick another function, pair it with its best match, call that Cluster 2, and so on. After this procedure is done, pick a Cluster, and find its nearest Cluster (according to some metric) and pair them up in a Cluster of 4 functions. After that's done, pair up 4-Clusters into Clusters of 8, and so on, until the final cluster encompasses the whole program.

What's a good metric?

Ideally, a good metric for comparing two functions would take into account the way the GZIP compressor searches for matches and encodes them. It is an interesting theoretical question, but for practical implementation purposes, I needed something that performs reasonably well, now. One algorithm that is pretty good at finding strings of rough similarity, even in the presence of noise, is the dynamic programming edit-distance algorithm. It's deployed widely in one variant or another in the bioinformatics industry for gene sequence alignment (Smith-Waterman, HMMR, etc), but the version commonly used for general CS work is the Levenshtein Distance.


Taking Levenshtein Distance as my metric, I produced a greedy variant as a patch to the GWT compiler. The greedy variant does not implement bottom-up clustering, but instead, sorts all functions by length first (suggested by GWT team member Lex Spoon), and then performs a linear scan over the sorted functions, picking the best match each time to the previous output. The source code is here. Remarkably, even this simple algorithm produces nice gains? How much? Well, if you do nothing else, it produces a 5-7% gain in GWT's Showcase application when compressed with gzip -9. But there's more that we can do!

Optimizing GWT's Obfuscation

When the GWT compiler is executed in obfuscated mode, it renames every single Javascript identifier in the whole program except for foreign Javascript. Ideally, you want the shortest id possible. Up until recently, GWT limited the first character of an identifier to an alphabet of 32-characters, and for strings of length 2 or more, it used base-64 characters. However, due to a clever patch by GWT community member Andriasyan (Issue #2448), the first character can actually be chosen from a base-54 alphabet. This has the effect of shrinking output size by up to 1.75% prior to compression.

We're not done yet! The GWT compiler has other tricks up its sleeve. It performs the renaming from the bottom-most scopes upwards, letting each scope reuse variable identifiers as they become free. However, it unfortunately did not insure that identifiers were picked in a stable order. Thus, a function of 3 variables could be declared as function foo(a,b,c){ or as function foo(b,c,a){. Obviously, this would lead to suboptimal compression since every function of 3 variables should have the same suffix (a,b,c){. The effect of making obfuscated identifier allocation have a stable sort order combined with the base-54 patch produces an incredible gain of 10.5% when compressed with gzip -9.

Choosing a different GZIP implementation

The deflate algorithm actually gives some leeway to the compressor implementor in terms of how matches are found. ZLIB's implementation on which GZIP is based is actually not the best implementation, although it might be the best patent unencumbered one. Rather, the inventor of the LZMA algorithm has his own DEFLATE implementation in his 7-zip utility, which produces 4% better output than gzip by my estimates.

Combining base-54/base-64 obfuscated identifier encoding, stable sort-order for identifier allocation, my greedy clustering-by-edit-distance sort algorithm, and 7-zip as a gzip-compatible compressor, yields an incredible 21% reduction of the Showcase application. On a large 500k Javascript application, this means an additional 100k bandwidth is saved, with no performance penalty!


A general purpose technique (Cromwell Clustering Transform? (CCT) :)) for compilers to rearrange code for compression efficiency (vs say, cache locality) has been presented, which achieves non-trivial compression efficiency gains in Javascript output from the GWT compiler. Some of these techniques can also be applied to hand-written Javascript as well and included in 3rd party JS minification utilities.


From reading a description of the algorithm, it may be hard to visualize. Here is sample output from GWT's Showcase application:

function lU(a){gU();while(bU){bU.a.a.a.p=false;$wnd.alert(s5b+a+t5b);bU=bU.b}cU=null}
function vR(a){qR();while(lR){lR.a.b.a.p=false;$wnd.alert(s5b+a+t5b);lR=lR.b}mR=null}
function QR(a){LR();while(GR){GR.a.b.a.p=false;$wnd.alert(s5b+a+t5b);GR=GR.b}HR=null}
function mS(a){hS();while(cS){cS.a.b.a.p=false;$wnd.alert(s5b+a+t5b);cS=cS.b}dS=null}
function KS(a){FS();while(AS){AS.a.b.a.p=false;$wnd.alert(s5b+a+t5b);AS=AS.b}BS=null}
function gT(a){bT();while(YS){YS.a.b.a.p=false;$wnd.alert(s5b+a+t5b);YS=YS.b}ZS=null}
function PT(a){KT();while(FT){FT.a.b.a.p=false;$wnd.alert(s5b+a+t5b);FT=FT.b}GT=null}
function JU(a){EU();while(zU){zU.a.b.a.p=false;$wnd.alert(s5b+a+t5b);zU=zU.b}AU=null}
function fV(a){aV();while(XU){XU.a.a.a.p=false;$wnd.alert(s5b+a+t5b);XU=XU.b}YU=null}
function DV(a){yV();while(tV){tV.a.a.a.p=false;$wnd.alert(s5b+a+t5b);tV=tV.b}uV=null}

For any function, note that the one immediately following it contains large numbers of common substrings of length 3 or greater.

Thursday, May 14, 2009

Gwt Query and Timefire @ Google I/O

If you're attending Google I/O this year (and you should, it's not to late to Register, $50 for Academia!) I'm doing two sessions. The first:
Progressively Enhance Ajax Applications with Google Web Toolkit and GQuery

Google Web Toolkit
Ray Cromwell
Thu 1:15p - 2:15p
Don't throw away your Web 1.0 websites just yet. In this session, you'll learn how to take your existing websites, and layer AJAX on top of them using GQuery, a jQuery style library for GWT. Learn how GQuery benefits performance over existing solutions, increases productivity, and reduces defects by leveraging the GWT tool chain.

The second:
Building Applications With Google APIs

Google Web Toolkit
Ray Cromwell
Thu 3:45p - 4:45p
Google offers a wide variety of APIs in many domains that together form a complete platform, from authentication and authorization, cloud computing, and social networking, to visualization, mobile computing, and Google Web Toolkit. In this talk, we will walk though a complex application that integrates many APIs together, how each can solve a different need in your application, how you can share code between GWT, Android, and App Engine, and how you can monetize your application with Google Checkout.

In the first presentation, I'll detail the revamped and relaunched GWT Query library, a jQuery clone for GWT featuring fast implementation, terse syntax, type safety, and a few other tricks. If you've ever wanted to use GWT, but didn't want to program in a Desktop Widget metaphor, this talk is for you.

The second presentation sounds stale (I couldn't write more exciting copy, honest!) but it is infact, the story of Timefire, a software-as-a-service visualization/analytics platform we are building that leverages a number of Google APIs, from AppEngine, to GWT, to Android, OpenSocial, Checkout, and beyond. If you've got a need, Google has an API for it, and in my opinion, what they have is looking more and more like a complete platform, from client to cloud.

So if this sounds interesting, mark Thursday on your calendars, and come see the amazing things you can do with Google Web Toolkit.


Friday, April 24, 2009

GWT RPC over arbitrary transports: Uber RPC

A frequent request that pops up in the GWT groups is how to run GWT RPC over non-XHR transport mechanisms, like script-tag injection, cross-domain POSTs, or OpenSocial Gadget's makeRequest().

There are a few ways to do this, including patching GWT itself, but the path of least resistance would be a module that you could inherit which would provide this functionality, without compromising or changing your client code in any way.

Step 1: Overwrite GWT ServiceInterfaceProxyGenerator

In order to handle alternate transport mechanisms, we need to take over generation of the RPC client stub that is created by the builtin generators, to do this, we add the following entries to a module file:


This tells GWT to invoke our UberServiceIntefaceProxyGenerator whenever someone calls GWT.create(RemoteService.class). There are some package-protected methods we need access to, so we arrange for our class to be in the package. Next, we want to modify the generator to allow substitution of arbitrary client stub superclasses.

<define-property name="gwt.rpc.proxySuperclass"
<set-property name="gwt.rpc.proxySuperclass"

This construct is used to pass a compile time parameter to the generator as to which class will be used as a client stub, org.timepedia.uberrpc.client.RpcServiceProxy. Here is our proxy generator source, which exists merely to redirect to UberProxyCreator

public class UberServiceInterfaceProxyGenerator extends Generator {

public String generate(TreeLogger logger, GeneratorContext ctx,
String requestedClass) throws UnableToCompleteException {

logger.log(TreeLogger.WARN, "Running UberProxyCreator", null);

TypeOracle typeOracle = ctx.getTypeOracle();
assert (typeOracle != null);

JClassType remoteService = typeOracle.findType(requestedClass);
if (remoteService == null) {
logger.log(TreeLogger.ERROR, "Unable to find metadata for type '"
+ requestedClass + "'", null);
throw new UnableToCompleteException();

if (remoteService.isInterface() == null) {
logger.log(TreeLogger.ERROR, remoteService.getQualifiedSourceName()
+ " is not an interface", null);
throw new UnableToCompleteException();

UberProxyCreator proxyCreator = new UberProxyCreator(remoteService);

TreeLogger proxyLogger = logger.branch(TreeLogger.DEBUG,
"Generating client proxy for remote service interface '"
+ remoteService.getQualifiedSourceName() + "'", null);

return proxyCreator.create(proxyLogger, ctx);

Step 2; Override the superclass of the generated client proxy stub

This source was mostly copied unchanged from the original, except for the line which calls UberProxyCreator. The bulk of the work is done there. Again, I copied the source, but made just a few changes to the routine which creates the SourceWriter

private SourceWriter getSourceWriter(TreeLogger logger, GeneratorContext ctx,
JClassType serviceAsync) {
JPackage serviceIntfPkg = serviceAsync.getPackage();
String packageName = serviceIntfPkg == null ? "" : serviceIntfPkg.getName();
PrintWriter printWriter = ctx
.tryCreate(logger, packageName, getProxySimpleName());
if (printWriter == null) {
return null;

ClassSourceFileComposerFactory composerFactory
= new ClassSourceFileComposerFactory(packageName, getProxySimpleName());

String[] imports = new String[]{RemoteServiceProxy.class.getCanonicalName(),
GWT.class.getCanonicalName(), ResponseReader.class.getCanonicalName(),
for (String imp : imports) {

String rpcSuper = null;
try {
// retrieve user-defined superclass from module file
rpcSuper = ctx.getPropertyOracle()
.getPropertyValue(logger, "gwt.rpc.proxySuperclass");
if (rpcSuper != null) {
rpcSuper = rpcSuper.replaceAll("_", ".");
} catch (Exception e) {

// allow defining a custom superclass to customize the RPC implementation


return composerFactory.createSourceWriter(ctx, printWriter);

Step 3: Write your own Proxy

This is where you come in, since you have to decide how you're going to transport the RPC payload, such as putting it as a URL GET parameter, a POST parameter, or using an OpenSocial container. Here is some skeleton code showing you how to override the doInvoke method. This example is pseudo-code for how you'd do it using in an OpenSocial container.

public class RpcServiceProxy extends RemoteServiceProxy {

protected GadgetRpcServiceProxy(String moduleBaseURL,
String remoteServiceRelativePath, String serializationPolicyName,
Serializer serializer) {
super(moduleBaseURL, remoteServiceRelativePath, serializationPolicyName,

static boolean isReturnValue(String encodedResponse) {
return encodedResponse.startsWith("//OK");

* Return true if the encoded response contains a checked
* exception that was thrown by the method invocation.
* @param encodedResponse
* @return true if the encoded response contains a checked
* exception that was thrown by the method invocation
static boolean isThrownException(String encodedResponse) {
return encodedResponse.startsWith("//EX");

public static final String RPC_PAYLOAD_PARAM="rpcpayload";

protected Request doInvoke(
final RequestCallbackAdapter.ResponseReader responseReader, String methodName, int invocationCount,
String requestData, final AsyncCallback tAsyncCallback) {

try {
makeRequest(getServiceEntryPoint(), "text/x-gwt-rpc; charset=utf-8", requestData, new AsyncCallback() {
public void onFailure(Throwable throwable) {
tAsyncCallback.onFailure(new InvocationException("Unable to initiate the asynchronous service invocation -- check the network connection"));

public void onSuccess(String encodedResponse) {
try {
if(isReturnValue(encodedResponse)) {
else if(isThrownException(encodedResponse)) {

else {
tAsyncCallback.onFailure(new InvocationException("Unknown return value type"));
} catch (SerializationException e) {
tAsyncCallback.onFailure(new InvocationException("Failure deserializing object "+e));
} catch (Exception ex) {
InvocationException iex = new InvocationException(
"Unable to initiate the asynchronous service invocation -- check the network connection",
} finally {
if (RemoteServiceProxy.isStatsAvailable()
&& RemoteServiceProxy.stats(RemoteServiceProxy.bytesStat(methodName,
invocationCount, requestData.length(), "requestSent"))) {
return null;

private native void makeRequest(String serviceEntryPoint, String contentType,
String requestData, AsyncCallback tAsyncCallback) /*-{
var params = {};
params[] =;

params[];"?"+@org.timepedia.uberrpc.client.UberRpcServiceProxy::RPC_PAYLOAD_PARAM+"="+encodeURIComponent(requestData), function(resp) {
if(resp.errors && resp.errors.length > 0) {;)(null)
else {;)(resp.text);
}, params);

Step 4: Modify RemoteServiceServlet

The last step is to modify RemoteServiceServlet so that it understands the new transport formats you've devised. Here's an example of one that would handle the incoming OpenSocial makeRequest(). This one handles GET or POST requests with the incoming payload as a form parameter.

public class GadgetServiceServlet extends RemoteServiceServlet {

protected void doGet(HttpServletRequest httpServletRequest,
HttpServletResponse httpServletResponse)
throws ServletException, IOException {
doPost(httpServletRequest, httpServletResponse);

protected String readContent(HttpServletRequest httpServletRequest)
throws ServletException, IOException {
String str = httpServletRequest.getMethod().equals("POST") ? RPCServletUtils
.readContentAsUtf8(httpServletRequest, false) : httpServletRequest
String ustr = URLDecoder.decode(str);
return ustr;

To use, you'd just subclass this servlet.

The Sky's the Limit

Once you get to this point, you can start imagining all of the cool things you can do. Cross-domain POSTs using the trick. OAuth signed RPC requests, verified in the servlet, done by OpenSocial containers. FaceBook integration. RPC over JSON, Protocol Buffers, Thrift. I'm trying to cobble together some of this stuff for a future UberRPC module, but due to talks I'm giving at the upcoming Google I/O, I'm a little too swamped to make them release worthy at this point.

Much credit goes to Alex Epshteyn for his original proposal on the GWT Contributors list, which I picked up (over an inferior method I had been using to make Gadgets work), and integrated into a more graceful override of the default RPC behavior.

If you're going to Google I/O, I will be doing two talks this year. One on Progressive Enhancement using GWT and a new library I've written, and a second on Building an Application on Google's Open Stack, which is a walkthrough of a sophisticated GWT app which leverages about a dozen Google APIs.