I'm writing my own syntax highlighter in javascript for fun and see a couple of approaches but they both have pros and some pretty serious cons that I can't get around. What do you guys think about these approaches and are there better methods that I'm missing?
Assumption
Code to highlight exists in a single string.
Approaches
EDIT: Lexical Analysis So, if I understand it, using Lexical Analysis you break the string into tokens. This somehow sounds a lot like approach number 2? How do you approach reassembling the tokens into the original string?
Treat code in it's string form and use regular expressions to find patterns.
Pros Simple to define and search for patterns Cons Hard to disregard keywords inside of quotes or commentsSplit the string by spaces and linebreaks and 开发者_StackOverflow中文版loop over the array.
Pros Easy to keep track of scope Cons Hard to keep track of spaces and linebreaks after the split
Note: This uses jQuery. It can pretty well be rewritten to work with straight javascript if you want.
I actually wrote a little plugin for fun that does this:
(function($) {
$.fn.codeBlock = function(blockComment) {
// Setup keyword regex
var keywords = /(abstract|boolean|break|byte|case|catch|char|class|const|continue|debugger|default|delete|do|double|else|enum|export|extends|final|finally|float|for|function|goto|if|implements|import|in|instanceof|int|interface|long|native|new|package|private|protected|public|return|short|static|super|switch|synchronized|this|throw|throws|transient|try|typeof|var|void|volatile|while|with|true|false|prototype)(?!\w|=)/gi;
// Booleans to toggle comment, regex, quote exclusions
var comment = false;
var quote = false;
var regex = false;
/* Array used to store values of regular expressions, quotes, etc.
so they can be used to ID locations to be skipped durring keyword
regexing.
*/
var locator = new Array();
var locatorIndex = 0;
if (blockComment) locator[locatorIndex++] = 0;
var text = $(this).html();
var continuation;
var numerals = /[0-9]/;
var arr = ($(this).html()).split("");
var outhtml = "";
for (key in arr) {
// Assign three variables common 'lookup' values for faster aquisition
var keyd = key;
var val = arr[keyd];
var nVal = arr[keyd - 1];
var pVal = arr[++keyd];
if ((val == "\"" || val == "'") && nVal != "\\") {
if (quote == false) {
quote = true;
outhtml += val;
}
else {
outhtml += val;
quote = false;
}
locator[locatorIndex++] = parseInt(key);
}
else if (numerals.test(val) && quote == false && blockComment == false && regex == false) {
outhtml += '<span class="num">' + val + '</span>';
}
else if (val == "/" && nVal != "<") {
var keys = key;
if (pVal == "/") {
comment = true;
continuation = key;
break;
}
else if (pVal == "*") {
outhtml += "/";
blockComment = true;
locator[locatorIndex++] = parseInt(key);
}
else if (nVal == "*") {
outhtml += "/";
blockComment = false;
locator[locatorIndex++] = parseInt(key);
}
else if (pVal == "[" && regex == false) {
outhtml += "<span class='res'>/";
regex = true;
}
else {
outhtml += "/";
}
}
else if (val == "," || val == ";" && regex == true) {
outhtml += "</span>" + val;
regex = false;
}
else {
outhtml += val;
}
}
if (comment == true) {
outhtml = outhtml.replace(keywords, "<span class='res'>$1</span>");
outhtml += '<span class="com">';
outhtml += text.substring(continuation, text.length);
outhtml += '</span>';
}
else {
if ((locator.length % 2) != 0) locator[locator.length] = (text.length - 1);
if (locator.length != 0) {
text = outhtml;
outhtml = text.substring(0, locator[0]).replace(keywords, "<span class=\"res\">$1</span>");
for (var i = 0; i < locator.length;) {
qTest = text.substring(locator[i], locator[i] + 1);
if (qTest == "'" || qTest == "\"") outhtml += "<span class=\"quo\">";
else outhtml += "<span class=\"com\">";
outhtml += text.substring(locator[i], locator[++i] + 1) + "</span>";
outhtml += text.substring(locator[i] + 1, locator[++i]).replace(keywords, "<span class=\"res\">$1</span>");
}
}
else {
outhtml = outhtml.replace(keywords, "<span class=\"res\">$1</span>");
}
}
text = outhtml;
$(this).html(text);
return blockComment;
}
})(jQuery);
I'm not going to claim it is the most efficient way of doing this or the best but it does work. There are still probably a few bugs in there I haven't ID'd yet (and 1 I know about but haven't gotten around to fixing) but this should give you an idea of how you could go about this if you like.
My suggested implementation of this is to create a textarea
or something and have the plugin run when you click a button or something (as far as testing it goes that is a decent idea) and of course you can set the text in the textarea to some starting code to make sure it works (Tip: You can put tags in between the the <textarea>
tag and it will render as text, not HTML).
Also, blockComment is a boolean, make sure to pass false because true will trigger the block quoting. If you decided to parse something line by line, like:
<a>code</a>
<a>some more code</a>
Do something like:
blockComment = false;
$("a").each(function() {
blockComment = $(this).codeBlock(blockComment);
});
精彩评论