15 July, 2009

Syntax Highlighting

In my previous post, I described the KAE.query.highlight plugin, which does syntax highlighting in JavaScript. I will now show you how I achieved this relatively easy task.

Naturally the best way is to simply grab a copy of the source, and read the code (there are plenty of comments as well). However, only a small part of the code is doing the actual parsing, so it can be easy to get lost. This post will give a general overview of the steps required.

I'm going to assume that you want to allow users to create their own rules for parsing (brushes), and also supply their own themes (for colors, etc). Here are the basic steps:

  1. Grab the brush and loop over it. Find all the matches in the source text, and store them in an array. You will need to know at least four things: the text, the CSS class, the start index, and the end index. These are all trivial to obtain.

  2. Sort the array of matches by the start index, so the matches are in the correct order.

  3. Loop over the matches. For each match, check to see if the current start index is greater than the previous end index. If so, add the match's text and apply the CSS class. If not, set the current match's end index to be the same as the previous match's end index.

    This is important because you will have matches within matches. For instance, you might have a comment. You want it to ignore the matches that are inside of the comment, obviously. This also handles things like numbers inside of strings, etc.

    Here is a diagram showing why this works:

    Let's examine it. The colored area shows the current match. The arrow shows the previous match's end index. One way to think of it is: if the arrow is to the right of the current match, we ignore the match and go to the next one.

    The important thing is that if a match is inside of the previous match, we set the current match's end index to be the same as the previous match's end index. This allows us to recurse through the entire set, rather than stopping at the first.

  4. Now, you want to use something like slice() to obtain the non-matching text in between the current match's end index, and the next match's start index. This handles things like this: Foo.bar.qux(); Note how .bar. is not a match, but we want to include it, rather than leaving it out.

That's it! No, really, that's all the parsing that's required. KAE.query.highlight's parsing code is only 30 38 lines! It also handles a few odd cases, and allows you to apply multiple brushes at the same time to the same element.

Just grab a copy of the source code and search for KAE.query.highlight.parser

That function handles all the parsing. It should also be well commented, so hopefully you won't have any trouble understanding.

KAE.query.highlight

For a long while now I had been using the wonderful SyntaxHighlighter for syntax highlighting on this blog.

I had a (possibly) odd requirement, however: I wanted to be able to syntax highlight code that was inline with non-code. Here's an example:

Blah blah blah var foo = "foo"; blah blah blah.

SyntaxHighlighter won't let you do that. I created a quick stand-alone program that would allow for inline highlighting, but that added even more bloat to an already-big program. (SyntaxHighlighter is 1,984 lines long!)

I then decided to create my own syntax highlighter. One that was designed from the ground up to be very minimal and light-weight, and one that easily supported inline highlighting. The fruit of my labor is the KAE.query.highlight plugin.

Wait. Plugin? Not program? KAE.query.highlight relies on the KAE.query module in order to function. This may seem like a disadvantage at first, but let's look at the benefits:

  1. Allows syntax highlighting on any element, not just those with a special class.
  2. Provides a platform for building other useful plugins.
  3. KAE.query plugins can easily be ported to jQuery, due to the similarities in architecture.

What is KAE.query, anyways? Think of it like jQuery without any features. You pass in a string, and KAE.query will return the DOM elements that match the string, just like jQuery. It supports a plugin system that is very similar to jQuery's plugin system. Unlike jQuery, however, it lacks any useful methods: those must be provided with plugins. In essence, it is a light-weight stripped down version of jQuery.

I don't harbor any ill feelings toward either jQuery or SyntaxHighlighter. However, both projects are targeted at something I don't need. I just wanted a simple way to extend the DOM NodeList, providing useful methods, like syntax highlighting.

Having said that, you might wonder why anybody would choose KAE.query.highlight over SyntaxHighlighter. Here are some reasons why I decided to create a new project:

  • Can be used to apply syntax highlighting with different options to different elements. For instance, you may want to apply different settings to a <code> tag than you would want to apply to a <pre> tag.
  • Much smaller in terms of code size. This can make a big difference when viewers have not cached the JavaScript file.
  • Uses a library-agnostic brush system that allows all JavaScript highlighters to use the same brushes.
  • Non-destructive: KAE.query.highlight works on the original element, so any styles are preserved.
  • Easily allows for syntax highlighting of inline elements.
  • Much much faster.

An interesting side effect of making it a KAE.query plugin is that it's backwards compatible with SyntaxHighlighter. You shouldn't need to change any HTML markup: it'll just work.

Some of these changes could be merged back into SyntaxHighlighter; in fact I encourage it. Some of the changes, however, may not be accepted.

Please do file any bugs or suggestions on the bug tracker listed below.

[LINK] Bug Tracker

[LINK] KAE.query
[LINK] KAE.query.highlight

02 July, 2009

Timer constructor.

"use strict";
var Timer = function (iter) {
    function manip(item) {
        return item;
    }
    this.average = function (func) {
        manip = func;
    };
    this.results = function () {
        var i, length = this.length, times = [];
        for (i = 0; i < length; i += 1) {
            times.push(manip(Timer.run(this[i], iter), iter));
        }
        return times;
    };
};
Timer.prototype = [];
Timer.run = function (func, length) {
    var i, start, end;
    length = length || 1;
    start = new Date();
    for (i = 0; i < length; i += 1) {
        func();
    }
    end = new Date();
    return end - start;
};

In an earlier post I described a simple function for benchmark testing in JavaScript. There isn't anything wrong with this function (aside from it being a global), however it doesn't do very much.

Most of the time, when I want to do benchmarks, I'm comparing 2+ ways of doing the same thing. This allows me to pick whichever method is the fastest. It is possible with getTime, but it is cumbersome. I then set out to create the Timer constructor, to alleviate this problem.

You initialize it with new Timer(), and you can pass in an optional number, indicating the iterations. If you create it with new Timer(1000), then every function will be called 1,000 times.

Timer is similar to an array of functions. You can add new functions with the push() method:

var timer = new Timer(100);
timer.push(function () {
    /* code goes here! */
});

You can use push() to add as many functions as you like. In order to obtain the actual time it takes to run the functions, you call the results() method, which returns an array:

// An array of benchmarks:
timer.results();

You can then call join() to display the array in various ways:

// Use custom separators:
timer.results().join("");
timer.results().join("\n");
timer.results().join(" + ");

Lastly, there's the average() method, which allows you to manipulate the benchmark. For instance, to average the results based on the mean:

timer.average(function (item, iter) {
    return item / iter;
});

You pass in a function, which is run after each benchmark has been computed. The first argument is how long it took to run the function, and the second argument is how many times the function was run (iterations). Whatever the function returns is used instead of the normal time.

Using these combined, you can create a simple script to compare two or more pieces of code, to determine which is fastest:

var timer = new Timer(10000);

//-- Begin benchmark functions
timer.push(function () {
    $("#test").css("backgroundColor", "black");
});

timer.push(function () {
    $("#test").css({
        backgroundColor: "black"
    });
});

timer.push(function () {
    $("#test").attr("style", "background-color: black;");
});
//-- End benchmark functions

timer.average(function (item, iter) {
    return item / iter;
});
alert(timer.results().join("\n"));

The above runs three functions (that do the same thing), computes how long it takes to run them 10,000 times, averages the result, and lastly displays it. If you don't want it to average the result, simply leave out the timer.average() call.

Finally, you can still access the old getTime under the name Timer.run:

var time = Timer.run(function () {
    /* code goes here! */
}, 100000);

[LINK] The source code.
[LINK] The unit tests.