15 July, 2009

Syntax Highlighting

In my previous post, I described the KAE.query.highlight plugin, which does syntax highlighting in JavaScript. I will now show you how I achieved this relatively easy task.

Naturally the best way is to simply grab a copy of the source, and read the code (there are plenty of comments as well). However, only a small part of the code is doing the actual parsing, so it can be easy to get lost. This post will give a general overview of the steps required.

I'm going to assume that you want to allow users to create their own rules for parsing (brushes), and also supply their own themes (for colors, etc). Here are the basic steps:

  1. Grab the brush and loop over it. Find all the matches in the source text, and store them in an array. You will need to know at least four things: the text, the CSS class, the start index, and the end index. These are all trivial to obtain.

  2. Sort the array of matches by the start index, so the matches are in the correct order.

  3. Loop over the matches. For each match, check to see if the current start index is greater than the previous end index. If so, add the match's text and apply the CSS class. If not, set the current match's end index to be the same as the previous match's end index.

    This is important because you will have matches within matches. For instance, you might have a comment. You want it to ignore the matches that are inside of the comment, obviously. This also handles things like numbers inside of strings, etc.

    Here is a diagram showing why this works:

    Let's examine it. The colored area shows the current match. The arrow shows the previous match's end index. One way to think of it is: if the arrow is to the right of the current match, we ignore the match and go to the next one.

    The important thing is that if a match is inside of the previous match, we set the current match's end index to be the same as the previous match's end index. This allows us to recurse through the entire set, rather than stopping at the first.

  4. Now, you want to use something like slice() to obtain the non-matching text in between the current match's end index, and the next match's start index. This handles things like this: Foo.bar.qux(); Note how .bar. is not a match, but we want to include it, rather than leaving it out.

That's it! No, really, that's all the parsing that's required. KAE.query.highlight's parsing code is only 30 38 lines! It also handles a few odd cases, and allows you to apply multiple brushes at the same time to the same element.

Just grab a copy of the source code and search for KAE.query.highlight.parser

That function handles all the parsing. It should also be well commented, so hopefully you won't have any trouble understanding.

No comments:

Post a Comment