So, Timepedia is building a time machine, right? It sounds pretentious, but for us, it's really a geeky moniker of love for our project, after all, is Google's "search engine" really an "engine"? How much horsepower does it have? :)
One part of Timepedia, readers of this blog are already familar with: Chronoscope. With Chronoscope, we are attempting to build an open platform of visualization tools for time oriented data, in much the same way that Google Maps and Google Earth deal with spatial data.
However, what good is a time machine, if you don't know where to go, or don't understand what you're looking at? Timepedia has another platform, aimed at data mining time related information, called Everett (owned and implemented by another Timepedia founder, Mat). Everett is a collection of many algorithms for both data mining, and forecasting, some of them bleeding edge academic research. When we started, we weren't sure which of them would work, or how well they would work, we only knew that they had promising features, so Everett was less of a end user product, and more of a research platform.
One of the tools of Everett is an algorithm that lets us find hidden recurring patterns in data, even in the presence of noise, or scaling. Last week, we tested the algorithm on real life data for the first time, and had one of those "holy cow!" moments, which don't occur too often for me personally, where your own code surprises you.
To give you an example, I fed Everett an 18,000 data point series of federal funds rates over the last few decades, and it identified a pattern that occured 3 times in history. Visualizing this in another tool we call Timelord (A Chronoscope married to Everett and other server-side services), I was puzzled as to the significance of these three sequences. My co-founder Shawn spent about 1 hour Googling, until he found the correlation: These sequences corresponded to international financial/currency crises (such as the Mexican currency crises), in which the Fed was forced to take action. The leadup to the crises appeared identical each time. A fluke? It sure the hell was very interesting.
I was worried it was a fluke, so I tried something more mundane. A time series of unemployment benefit expenditures in Indiana, and once again, Everett identified a series of puzzling repetitive sequences. What were they? The dates looked very familar, 1980-81, 1990-91, 2000-1...were they recessions? To check, I used Timelord to overlay a National Bureau of Economic Research official measure of economic expansions and contractions, and sure enough, these patterns intersected with NBER recessions. One other interesting property stood out, the patterns returned prefixed the recessions, that is, Everett was showing us a pattern that leads to a recession.
How cool is that? Ambition got the best of me, I went for broke: I tried a historical time series of average hurricane strength (saffir-simpson scale), as well as a yearly count. There appears to be good evidence that a 40-60 cyclical hurricane season exists, and I was hoping that Everett could find these patterns, but alas, it did not.
Still, the initial results are promising, and we hope that Everett will give average users an ability to query time in ways that have not been previously available.
So, if you're wondering why I haven't released Chronoscope yet, it's because I've been working on integrating Timelord with Everett. :)
-Ray
p.s. Timelord is another GWT application, making it our 4th major GWT application. Everett is C++ coupled via JNI a Java/GWT RPC interface, since performance is absolutely critical in Everett.
Wednesday, September 12, 2007
When algorithms work better than you dream...
Posted by
Timepedia
at
4:11 PM
5
comments
Labels: data mining, everett, google webtoolkit, gwt, timepedia
Saturday, June 2, 2007
GWT Demystified
This will be a first in a series of tutorials and essays digging deeper into some of the more esoteric (but extremely important!) functionality of the Google Web Toolkit. (This article is written by Ray Cromwell, CTO) I will be talking about some of the nitty gritty details of the GWT compiler, how to create your own custom generators and the cool stuff you can do with them, and finally, how to plugin new optimization algorithms into the GWT compiler itself.
The Google Web Toolkit can be considered a package of three fundamental technologies: The Java to Javascript Compiler, the runtime library/APIs, and the Hosted Mode browser. All three are very important components of GWT in their own right, but because the APIs and the Hosted browser are the most visual pieces (you can SEE THEM operating) they tend to have more impact on people, while the real heavy duty stuff being done by the GWT compiler goes mostly unnoticed.
This is why I am not surprised that many language zealots engaged in language wars often appear grimaced when I tell them I am rewriting my Javascript code in GWT. "Huh!? Why!?" Then the inevitable signs of misunderstanding crop up: "Didn't applets show running Java in the browser is slow?", "Doesn't GWT produce bloated code?", etc. Now, I am by no means a zealot for the Java language, but there does appear to be frequent misunderstandings about what the GWT Compiler is, and does.
The GWT Compiler is a real compiler. Not a simple translator that walks an abstract syntax tree turning Java expressions and statements into the nearest Javascript equivalents. In fact, the GWT Compiler performs optimizations that are very hard to do statically on raw Javascript, and hard to do even in regular Java.
For example, static dead code elimination is very hard to do safely in Javascript, and limited in Java as well (at the bytecode level). You can never really be sure that a public method in Java won't be called, because of dynamic class loading, and frankly, you can't even be sure that private methods won't be invoked due to reflection and interception techniques, thus the only safe way to do dead code elimination is to defer it to runtime and let Hotspot deal with it.
In contrast, GWT has very good information on which methods are reachable, and is extremely aggressive at removing unused methods, saving both bandwidth and start up time. The closest equivalent in the Java world would be MIDLets, since the same sorts of closed-world assumptions can be made.
GWT does more than just remove dead code. It performs inlining, polymorphic-to-monomorphic call conversion (devirtualization), a form of type-inferencing which GWT calls Type-Tightening (GWT can infer that a field of type Animal is only ever assigned type Cat. In fact, it can infer that such a field is always null!), and lots of other little tricks. It doesn't appear to have common subexpression elimination, copy propagation, or in-block dead code elimination yet, but in my third tutorial, I will demonstrate a simple naive way to achieve this.
What this means is that in GWT, you don't pay much for what you don't use, and you don't have to worry about figuring out what's used. It means GWT's Javascript output is not bloated. It has a constant factor overhead related to the bootstrap process, but as your applications grow bigger, this code becomes smaller.
The GWT compiler already does a good job producing compact javascript code (Chronoscope is 30,000 lines of Java, 3.2 Megabytes of source, 1.9 Megabyte of compiled byte code, and 137k of Javascript after GWT 1.4 compiles it, and 45k after gzip -9), and there is a lot of headroom still left in terms of optimization that it can do.
So why do I use GWT? Besides the fact that we want to run in environments that don't have Javascript, GWT's Hosted Mode is extremely helpful providing full access to the universe of Java debuggers, and automatic code completion and popup Javadoc is very nice. (Even though I use IntelliJ IDEA, which already has good Javascript code completion and popup documentation)
GWT is a good platform for building AJAX/RIA applications and allows one to easily port or repurpose existing Java codebases and tools -- another good option for developers. It has a bright future ahead of it.
-Ray
Coming up: GWT Generators HOW-TO: generate classes on the fly at compile time.
Posted by
Timepedia
at
11:07 PM
16
comments
Labels: chronoscope, deferred binding, generators, google web toolkit, gwt, gwt demystified, timepedia
Timepedia @ Google Developer Day
We had a blast at Google Developer Day, especially meeting and hanging out with the Google Web Toolkit team. Google's organization of the event was excellent, and they are to be commended for staging a world wide event (free) for developers, providing ample (and good) food and snacks, as well as busing developers to the Google campus for an after hours party. We've been to freebie events before, and they haven't been this good.
At Dev Day, Timepedia launched a preview of one of our components: Chronoscope, which is our visualization platform for timeseries data written in GWT.
We got alot of positive feedback, and had fun hearing people try and guess what our t-shirts meant. Actually, we were surprised how many people got very close. Timepedia has been our personal hobby, our passion, for over two years, and we are both nervous and excited to finally reveal it to the public.
As we come closer and closer to launch time, we'll post developer notes, stumbling blocks, hurdles leaped, and give more information about some of the other pieces of Timepedia (such as codenames: Tardis, Timelord, Geisser, Everett, and Jarocki :) )
Posted by
Timepedia
at
10:41 PM
0
comments
Labels: chronoscope, google developer day, google web toolkit, gwt, timepedia