I wonder if there is an easy and efficient way in SQL Server 2005 to eliminat开发者_运维技巧e replicated characters in a string. Like converting
'ABBBCDEEFFFFG' to 'ABCDEFG'
It really sucks that SQL Server has such a poor string library and no ready-to-use regexp feature...
You can use the CLR functionality built into SQL Server 2005/2008 to get this done by .NET code.
MSDN magazine wrote about it in their February 2007 issue.
If this is not an acceptable solution, here is a UDF that will do the same, mind you this is about two orders of magnitude slower than the CLR solution.
YMMV. This appears to work for your string above. But not ABBBCDEEBBBBG
DECLARE @Numbers TABLE (Num smallint NOT NULL PRIMARY KEY)
INSERT @Numbers (Num)
SELECT TOP 8000
ROW_NUMBER() OVER (ORDER BY c1.NAME)
FROM
sys.columns c1
DECLARE @STuff TABLE (Seq varchar(100) NOT NULL PRIMARY KEY)
INSERT @STuff (Seq) VALUES ('ABBBCDEEFFFFG') --works
SELECT
Single
FROM
(
SELECT DISTINCT
CAST(Single AS varchar(100))
FROM
@Numbers N
CROSS APPLY
(SELECT Seq, SUBSTRING(Seq, Num, 1) AS Single FROM @Stuff) S
WHERE
Num <= LEN(Seq)
FOR XML PATH ('')
) foo(Single)
I know about the CLR solution, but as I said, I am neither responsible nor authorized to implement it in the DB of question.
For this particular problem, I decided to write a very simple and kinda silly loop. I am afraid it won't be fast enough for millions of records, but anyways... I wish I could do this stuff in the application layer but I am bound to T-SQL here..
DECLARE @i int ; -- counter
DECLARE @input varchar(200) ;
SET @input = 'AAABCDEEFFBBBXYZSSSWWWNT'
IF LEN(@input) > 1
BEGIN
DECLARE @unduplicated varchar(200) ;
SET @unduplicated = SUBSTRING(@input,1,1) ;
SET @i = 2 ;
WHILE @i <= LEN(@input)
BEGIN
-- If current char is different from the last char, concatenate, else not
IF SUBSTRING(@unduplicated, LEN(@unduplicated), 1) <> SUBSTRING(@input, @i, 1)
SET @unduplicated = @unduplicated + SUBSTRING(@input, @i, 1) ;
SET @i = @i + 1;
END
END
SELECT @unduplicated AS unduplicated;
Result:
unduplicated
ABCDEFBXYZSWNT
精彩评论