I have over 2000 aspx documents that all hold the same heading that I need to remove:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML lang="en">
<HEAD>
<TITLE>External Reference Investopedia</TITLE>
<META NAME="author" CONTENT="DERCHEC">
</HEAD>
<BODY>
<A NAME="topofpagebibliographyitem2aspx"></A>
Both the <TITLE>
and <A>
tag change in every file.
I need some help creating a regular expression that will select all the above text for me. I am currently using TextCrawler to work through these docum开发者_开发技巧ent in a batch. If better tools and methods are out there. Please let me know.
Regards,
CD
Use visual studio find and replace in files. In your find options choose that you want to use regular expressions (its a checkbox)
Find:
{\<Title>{.*}\</title\>}
Replace with nothing - IE leave it blank. This should get you started : )
Option 2 - download ultraedit and do a find and replace in files on the text block - done : )
Simple! The regular expression will be exactly the same text you need to remove. So if you want to match:
<HTML lang="en">
your regular expression will be:
<HTML lang="en">
The only time you'll have a problem is when you have a character which has a reserved meaning, in that instance you just need to prefix with a \ .
So if you need to match a question mark (?) the regex would be \?
If the bit you want to remove always ends with the </A>
tag. The you could just use a normal string split function in any language.
精彩评论