开发者

How to do regex HTML tag replace in SQL Server?

开发者 https://www.devze.com 2022-12-26 18:45 出处:网络
I have a table in SQL Server 2005 with hundreds of rows with HTML content. Some of the content has HTML like:

I have a table in SQL Server 2005 with hundreds of rows with HTML content. Some of the content has HTML like:

<span class=heading-2>Directions</span>开发者_如何学编程

where "Directions" changes depending on page name.

I need to change all the <span class=heading-2> and </span> tags to <h2> and </h2> tags.

I wrote this query to do content changes in the past, but it doesn't work for my current problem because of the ending HTML tag:

Update  ContentManager
Set ContentManager.Content = replace(Cast(ContentManager.Content AS NVARCHAR(Max)), 'old text', 'new text')

Does anyone know how I could accomplish the span to h2 replacing purely in T-SQL? Everything I found showed I would have to do CLR integration. Thanks!


Indeed T-SQL does not natively support regular expressions and this is the sort of problem in which regular expressions would be the tool of choice. First, I'll say that the level of complication in the solution depends greatly on how consistent your data is. For example, suppose we search for items with the heading:

Select ..
From ...
Where HtmlContent Like '<span class="heading-2">%'

This assumes no additional spacing between span and class as well as no additional spacing after the final double quote before the end bracket. We could write '%<span%class="heading-2"%>%' to account for the spaces but that would also find div tags marked as heading-2 in the same content as any span tag. If this later scenario shouldn't happen but you might have varying spaces, then use this revised pattern. Where we will really run into troubles is the closing tag. Suppose our content looks like so:

<span class="heading-2"> Foo <span class="heading-3">Bar</span> And Gamma Too</span> .... <span class="heading-4">Fubar Is the right way!</span>...

It is not so simple to find the correct closing span tag to change to </h2>. You cannot simply find the first </span> and change it to </h2>. If you knew that you had no nested span tags, then you could write a user-defined function that would do it:

Create Function ReplaceSpanToH2( @HtmlContent nvarchar(max) )
Returns nvarchar(max)
As
Begin
    Declare @StartPos int
    Declare @EndBracket int

    Set @StartPos = CharIndex('<span class="heading-2">', @HtmlContent)
    If @StartPos = 0
        Return @HtmlContent

    Set @HtmlContent = Replace(@HtmlContent, '<span class="heading-2">', '<h2>')

    -- find next </span>
    Set @StartPos = CharIndex('</span>', @HtmlContent, @StartPos)

    Set @HtmlContent = Stuff(@HtmlContent, @StartPos, 7, '</h2>')
    Return @HtmlContent
End


If you are positive that all of the HTML is (and will continue to be) valid XHTML and you're using SQL Server 2005 or later, you might be able to cast the columns to an XML data type and use XQuery. See http://msdn.microsoft.com/en-us/library/ms345117%28SQL.90%29.aspx

(Caveat: I haven't tried this.)

I think the best answer, though, is Michael Petito's comment. I would write an application to do this and use the Html Agility Pack. That will provide a permanent, maintainable solution that will work in nearly all cases.

(If this is a one-shot and you don't care about accuracy, then pick your poison.)


I am not real strong in SQL Server but here is how I would try to do this:

UPDATE TableName SET FieldName = REPLACE(FieldName ,'<span class=heading-2>', '<h2>') SET FieldName = REPLACE(FieldName, '</span>', '</h2>')

There might need to be 2 UPDATE statements issued, I am not sure if you can operate on the same field this way. The OP did say ALL occurences of the text. Set me straight if I am missing something.

Of course if there are other <span class=heading-2> or </span> text that you don't want to change, this won't work.


Gah, use jquery! Don't make life difficult.. there's some sample replace code on the jquery homepage and you can just include the jquery-1.4.2.js in the <head> section

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号