I need to match and replace some comments. for example:
$test = "the url is http://www.google.com";// comment "<-- that quote needs to be matched
I want to match the comments outside of the quotes, and replace any "
's in the comments with "
's.
I have tried a number of patterns and different ways of running them but with no luck.
The regex will be run with javascript to match php "//" comments
UPDATE: I took the regex from borkweb below and modified it. used a function from http://ejohn.org/blog/search-and-dont-replace/ and came up with this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<script type="text/javascript">
function t_replace(data){
var q = {}, ret = "";
data.replace(/(?:((["'\/]*(("[^"]*")|('[^']*'))?[\s]*)?[\/\/|#][^"|^']*))/g, function(value){
q[key] = value;
});
for ( var key in q ){
ret = q[key];
}
var text = data.split(ret);
var out = ret + text[1];
out = out.replace(/"/g,""");
out = out.replace(/'/g,"'");
return text[0] + out;
}
</script>
</head>
<body>
<script type="text/javascript">
document.write(t_replace("$test = \"the url is http://www.google.com\";// c'o\"mment \"\"\"<-- that quote needs to be matched")+"<br>");
document.write(t_replace("$test = 'the url is http://www.google.com';# c'o\"mment \"\"\"<-- that quote needs to be matched"));
</script>
</body>
</html>
it handles all the line comments outside of single or double quotes. Is there anyway I could optimize this function?
UPDATE 2: it does not handle this string
document.write(t_replace("$test //= \"the url is http://www.google.com\"; //c'o\"mment 开发者_如何学Python\"\"\"<-- that quote needs to be matched")+"<br>");
You can have a regexp to match all strings and comments at the same time. If it's a string, you can replace it with itself, unchanged, and then handle a special case for comments.
I came up with this regex:
"(\\[\s\S]|[^"])*"|'(\\[\s\S]|[^'])*'|(\/\/.*|\/\*[\s\S]*?\*\/)
There are 3 parts:
"(\\[\s\S]|[^"])*"
for matching double quoted strings.'(\\[\s\S]|[^'])*'
for matching single quoted strings.(\/\/.*|\/\*[\s\S]*?\*\/)
for matching both single line and multiline comments.
The replace function check if the matched string is a comment. If it's not, don't replace. If it is, replace "
and '
.
function t_replace(data){
var re = /"(\\[\s\S]|[^"])*"|'(\\[\s\S]|[^'])*'|(\/\/.*|\/\*[\s\S]*?\*\/)/g;
return data.replace(re, function(all, strDouble, strSingle, comment) {
if (comment) {
return all.replace(/"/g, '"').replace(/'/g, ''');
}
return all;
});
}
Test run:
Input: $test = "the url is http://www.google.com";// c'o"mment """<-- that quote needs to be matched
Output: $test = "the url is http://www.google.com";// c'o"mment """<-- that quote needs to be matched
Input: $test = 'the url is http://www.google.com';# c'o"mment """<-- that quote needs to be matched
Output: $test = 'the url is http://www.google.com';# c'o"mment """<-- that quote needs to be matched
Input: $test //= "the url is http://www.google.com"; //c'o"mment """<-- that quote needs to be matched
Output: $test //= "the url is http://www.google.com"; //c'o"mment """<-- that quote needs to be matched
Don't forget that PHP comments can also take the form of /* this is a comment */
which can be span across multiple lines.
This site may be of interest to you:
http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript
Javascript does not have native lookbehind support in it's regular expression engine. What you may be able to do is start at the end of a line and look backward to capture any characters that follow a semi colon + optional whitespace + // So something like:
;\w*\/\/(.+)$
This may not capture everything.
You also may want to look for a Javascript (or other languages) PHP syntax checker. I think Komodo Edit's PHP syntax checker may be written in Javascript. If so, it may give you insight on how to strip everything out but comments as the syntax checkers need to ensure the PHP code is valid, comments and all. The same can be said about syntax color changers. Here are two other links:
http://ecoder.quintalinda.com/
http://www.webdesignbooth.com/9-useful-javascript-syntax-highlighting-scripts/
I have to admit, this regex took me a while to generate...but I'm pretty sure this will do what you are looking for:
<script>
var str = "$test = \"the url is http://www.google.com\";// comment \"\"\"<-- that quote needs to be matched";
var reg = /^(?:(([^"'\/]*(("[^"]*")|('[^']*'))?[\s]*)?\/\/[^"]*))"/g;
while( str !== (str = str.replace( reg, "$1"") ) );
console.log( str );
</script>
Here's what's going on in the regex:
^ # start with the beginning of the line
(?: # don't capture the following
(
([^"'\/]* # start the line with any character as long as it isn't a string or a comment
(
("[^"]*") # grab a double quoted string
| # OR
('[^']*') # grab a single quoted string
)? # but...we don't HAVE to match a string
[\s]* # allow for any amount of whitespace
)? # but...we don't HAVE to have any characters before the comment begins
\/\/ # match the start of a comment
[^"]* # match any number of characters that isn't a double quote
) # end un-caught grouping
) # end the non-capturing declaration
" # match your commented double quote
The while loop in javascript is just find/replacing until it can't find any additional matches in a given line.
In complement of @Thai answer which I found very good, I would like to add a bit more:
In this example using original regex only the last character of quotes will be matched: https://regex101.com/r/CoxFvJ/2
So I modified a bit to allow capture of full quotes content and give a more talkative and generic example of content: https://regex101.com/r/CoxFvJ/3
So final regex:
/"((?:\\"|[^"])*)"|'((?:\\'|[^'])*)'|(\/\/.*|\/\*[\s\S]*?\*\/)/g
Big thanks to Thai for unlocking me.
精彩评论