So, Timepedia is building a time machine, right? It sounds pretentious, but for us, it's really a geeky moniker of love for our project, after all, is Google's "search engine" really an "engine"? How much horsepower does it have? :)
One part of Timepedia, readers of this blog are already familar with: Chronoscope. With Chronoscope, we are attempting to build an open platform of visualization tools for time oriented data, in much the same way that Google Maps and Google Earth deal with spatial data.
However, what good is a time machine, if you don't know where to go, or don't understand what you're looking at? Timepedia has another platform, aimed at data mining time related information, called Everett (owned and implemented by another Timepedia founder, Mat). Everett is a collection of many algorithms for both data mining, and forecasting, some of them bleeding edge academic research. When we started, we weren't sure which of them would work, or how well they would work, we only knew that they had promising features, so Everett was less of a end user product, and more of a research platform.
One of the tools of Everett is an algorithm that lets us find hidden recurring patterns in data, even in the presence of noise, or scaling. Last week, we tested the algorithm on real life data for the first time, and had one of those "holy cow!" moments, which don't occur too often for me personally, where your own code surprises you.
To give you an example, I fed Everett an 18,000 data point series of federal funds rates over the last few decades, and it identified a pattern that occured 3 times in history. Visualizing this in another tool we call Timelord (A Chronoscope married to Everett and other server-side services), I was puzzled as to the significance of these three sequences. My co-founder Shawn spent about 1 hour Googling, until he found the correlation: These sequences corresponded to international financial/currency crises (such as the Mexican currency crises), in which the Fed was forced to take action. The leadup to the crises appeared identical each time. A fluke? It sure the hell was very interesting.
I was worried it was a fluke, so I tried something more mundane. A time series of unemployment benefit expenditures in Indiana, and once again, Everett identified a series of puzzling repetitive sequences. What were they? The dates looked very familar, 1980-81, 1990-91, 2000-1...were they recessions? To check, I used Timelord to overlay a National Bureau of Economic Research official measure of economic expansions and contractions, and sure enough, these patterns intersected with NBER recessions. One other interesting property stood out, the patterns returned prefixed the recessions, that is, Everett was showing us a pattern that leads to a recession.
How cool is that? Ambition got the best of me, I went for broke: I tried a historical time series of average hurricane strength (saffir-simpson scale), as well as a yearly count. There appears to be good evidence that a 40-60 cyclical hurricane season exists, and I was hoping that Everett could find these patterns, but alas, it did not.
Still, the initial results are promising, and we hope that Everett will give average users an ability to query time in ways that have not been previously available.
So, if you're wondering why I haven't released Chronoscope yet, it's because I've been working on integrating Timelord with Everett. :)
-Ray
p.s. Timelord is another GWT application, making it our 4th major GWT application. Everett is C++ coupled via JNI a Java/GWT RPC interface, since performance is absolutely critical in Everett.
Wednesday, September 12, 2007
When algorithms work better than you dream...
Posted by
Timepedia
at
4:11 PM
5
comments
Labels: data mining, everett, google webtoolkit, gwt, timepedia
Thursday, June 7, 2007
GWT Canvas: Rendering rotated text
Chronoscope operates on a Canvas abstraction in GWT. Think of it as a mirror of the Safari Javascript Canvas API, with extensions for text rendering, reading back coordinate transforms, layers, fast-clear, and frame/display list capture.
A typical bit of Chronoscope Canvas code will look like:
Canvas canvas = getCanvas();
canvas.beginFrame();
Layer layer = canvas.getLayer("foo");
layer.setStrokeColor(Color.black);
layer.moveTo(sx,sy);
layer.lineTo(ex, ey);
layer.stroke();
Layer textLayer = canvas.getLayer("bar");
textLayer.setStrokeColor(Color.red);
textLayer.drawText("Hello World", x, y,
gssProperties);
canvas.endFrame();
The begin/end frame abstraction permits capture of display list for faster rendering (Opera lockCanvas), as well as shipping a scenegraph to Flash, Silverlight, or SVG. Layers permit compositing and fast-clear.
Rendering Text
Users of Javascript Canvas dream of support for text rendering. For some unknown reason, Apple (who has awesome text rendering capability in the Quartz canvas) left it out. This has forced many developers to roll their own text rendering, until the day when everyone supports the WHATWG Canvas.
Some of the typical hacks used to get text rendering include:
- Placing DIV tags over the Canvas with text
- Using pre-rendered or on-the-fly server-rendered images
- Extracting glyph vectors from true type fonts, sending them encoded to the browser, and rendering them with Canvas moveto/lineto calls.
The latter two options are the only ones capable of dealing with rotated text.
Chronoscope uses the first solution, even for rotated text, however the individual letters do not get rotated, leading to ugliness. The existing Chronoscope demo shows this on the vertical axis chart labels. Fast-clear comes into play when I want to redraw. I can blow away dozens of hundreds of DIV tags with something like "layerElem.innerHTML=''".
GWT1.4 ClippedImage and FontBook Rendering
I recently began experimenting with the idea of rendering an entire font at the desired 2D transformation and font-properties, and then using GWT1.4 ClippedImage to render letters.
A few hours later, I have an implementation that exceeded my wildest dreams in quality, here's how I did it:
- Create a GWT RPC Service called getFontMetrics(transform, fontProperties, color)
- Servlet calculates FontMetrics (ascent, descent, leading, advance, etc) for first 256 characters of the Font (most European languages)
- Returned metrics includes a URL to a generated font book
- FontRenderer servlet renders 16x16 array of characters with specified transform, color, font, etc
- new drawTransformedText() method of Canvas first renders according to the old "ugly" method mentioned above and kicks off RPC and Image load
- On load completed, "ugly" transformed text replaced with text rendered with clipped characters from the font book image
Devil's in the drawTransformedText() details
So how does this method work? The server returns a 256-element array of advance values (essentially character width), along with maxAscent, maxDescent, and leading obtained from Java2D FontMetrics. The returned RenderedFontMetrics RPC object has a method called 'getBounds(char c, Bounds b)' that for any character, returns a clipping bounding box into the second argument (avoiding excessive object creation in the main loop), which is the position of the rendered character in the font book image.
The main loop essentially obtains the bounding box for each character to be drawn, creates a ClippedImage that will display it, and positions the IMG tag along the font-baseline according to the current transform (coordinate system). It uses the 'advance' values for each character to know how far along the baseline to place each successive clipped image.
Screen Shot

Performance
Font Books can be cached, and rendering a font book on the server takes about 10 milliseconds. The resulting image is about 20kb. After that, text rendering performance is pretty much the same as the old "ugly" DIV method.
Caveats
BIDI not supported. Non-ISO-8859 character sets a pain. The issue is, I can't very well send down font-book with 65k characters. However, statistics are on our side. For example, in a language like Chinese, you'll find that most people only use a few thousand characters in day to day use.
We can compute a histogram on a corpus of Chinese text, and render a fontbook containing the most used characters in descending order. We can render several such tiles, in a monotonically descending fashion, on-demand, and cache them. In most applications, you'll probably infrequently request the second-tier characters, and very very infrequently request characters 3 standard deviations away.
Once Chronoscope is out in the wild, I hope some non-ISO-8859-1 native speakers will help contribute improved fontbook rendering for CJK characters and other languages.
p.s. I'll post a demo up on timepedia.org later, after I can verify all of the code is working properly, since I also moved the code base from 1.3 to 1.4.
-Ray
Posted by
Timepedia
at
5:49 PM
1 comments
Labels: clippedimage, fontbook, google webtoolkit, gwt, renderer, rotated text