I would like to have a TSQL Statement to move Name Suffix (Jr, Sr., IV, etc) into another field.
The suffixes I see are JR SR I II III IV V
Here is a sample of the data
LastName BRUNNING, II BURCH II BUSS, JR. CANI III CHRISTIAN,SR COLVIN Jr COWHERD,JR.
I would like the suffix moved out of the LastName field into another field called Suffix.
LastName Suffix BRUNNING II BURCH I BUSS JR CANI III CHRISTIAN SR COLVIN JR COWHERD JR
I am using SQL Server 2005 and can use SQL#开发者_JAVA百科 functions.
Any help would be greatly appretiated.You can probably do better than this using the SQL# functions, but in straight T-SQL, here you go.
The main idea here is to parse out the last segment/token in the name using REVERSE
and PATINDEX
, and then match it to a list of known suffixes.
First some test data:
IF OBJECT_ID('tempdb..#names') IS NOT NULL DROP TABLE #names
IF OBJECT_ID('tempdb..#suffixes') IS NOT NULL DROP TABLE #suffixes
CREATE TABLE #names (name VARCHAR(32))
CREATE TABLE #suffixes (suffix VARCHAR(32))
GO
INSERT #names VALUES ('BRUNNING, II' )
INSERT #names VALUES ('BURCH II' )
INSERT #names VALUES ('BUSS, JR.' )
INSERT #names VALUES ('CANI III' )
INSERT #names VALUES ('CHRISTIAN,SR' )
INSERT #names VALUES ('COLVIN Jr' )
INSERT #names VALUES ('COWHERD,JR.' )
INSERT #names VALUES ('BILLY BOB' )
INSERT #names VALUES ('JOHNNY' )
INSERT #suffixes VALUES ('II' )
INSERT #suffixes VALUES ('III')
INSERT #suffixes VALUES ('JR' )
INSERT #suffixes VALUES ('SR' )
Then, an inline SELECT
version. Notice the use of NULLIF
to control for SUBSTRING
errors.
SELECT
name
, left_segments
, right_segment
, new_name = CASE WHEN b.suffix IS NOT NULL THEN a.left_segments ELSE a.name END
, b.suffix
FROM (
SELECT
name
, left_segments = CASE WHEN left_segments LIKE '%[ ,]' THEN LEFT(left_segments,LEN(left_segments)-1) ELSE left_segments END
, right_segment = CASE WHEN right_segment LIKE '%[.]' THEN LEFT(right_segment,LEN(right_segment)-1) ELSE right_segment END
FROM (
SELECT *
, left_segments = RTRIM(LEFT(RTRIM(name),LEN(name)-NULLIF(PATINDEX('%[ ,]%',REVERSE(RTRIM(name))),0)))
, right_segment = RIGHT(RTRIM(name),NULLIF(PATINDEX('%[ ,]%',REVERSE(RTRIM(name))),0)-1)
FROM #names
) a
) a
LEFT JOIN #suffixes b ON a.right_segment = b.suffix
Alternately, UPDATE
w/ local vars:
ALTER TABLE #names ADD
left_segments VARCHAR(64)
, right_segment VARCHAR(64)
GO
DECLARE
@name VARCHAR(64)
, @len INT
, @last_delim INT
, @left_segments VARCHAR(64)
, @right_segment VARCHAR(64)
UPDATE #names SET
@name = RTRIM(name)
, @len = LEN(@name)
, @last_delim = @len-NULLIF(PATINDEX('%[ ,]%',REVERSE(@name)),0)
, @left_segments = RTRIM(LEFT(@name,@last_delim))
, @right_segment = RIGHT(@name,@len-@last_delim-1)
, @left_segments = CASE WHEN @left_segments LIKE '%[ ,]' THEN LEFT(@left_segments,LEN(@left_segments)-1) ELSE @left_segments END
, @right_segment = CASE WHEN @right_segment LIKE '%[.]' THEN LEFT(@right_segment,LEN(@right_segment)-1) ELSE @right_segment END
, left_segments = @left_segments
, right_segment = @right_segment
SELECT a.*
, new_name = CASE WHEN b.suffix IS NOT NULL THEN a.left_segments ELSE a.name END
, suffix = b.suffix
FROM #names a LEFT JOIN #suffixes b ON a.right_segment = b.suffix
The inline SELECT
is fairly convenient, but difficult to read and troubleshoot. I prefer the UPDATE
with local vars for anything I might have to return to later. Plus, it makes individual edits easier to apply.
EDIT, SELECT
method, slightly edited, and wrapped in an inline table-valued function. A inline TVF should be more efficient than a scalar UDF, and you get multiple return values to boot.
CREATE FUNCTION dbo.ParseNameAndSuffix (@name VARCHAR(64), @ValidSuffixes VARCHAR(512))
RETURNS TABLE AS RETURN (
SELECT
left_segments
, right_segment
, new_name = CASE WHEN CHARINDEX(';'+right_segment+';',';'+@ValidSuffixes+';') > 0 THEN a.left_segments ELSE a.name END
, suffix = CASE WHEN CHARINDEX(';'+right_segment+';',';'+@ValidSuffixes+';') > 0 THEN a.right_segment END
FROM (
SELECT
name
, left_segments = CASE WHEN left_segments LIKE '%[ ,]' THEN LEFT(left_segments,LEN(left_segments)-1) ELSE left_segments END
, right_segment = CASE WHEN right_segment LIKE '%[.]' THEN LEFT(right_segment,LEN(right_segment)-1) ELSE right_segment END
FROM (
SELECT name
, left_segments = RTRIM(LEFT(name,LEN(name)-NULLIF(PATINDEX('%[ ,]%',REVERSE(name)),0)))
, right_segment = RIGHT(name,NULLIF(PATINDEX('%[ ,]%',REVERSE(name)),0)-1)
FROM (SELECT name = LTRIM(RTRIM(@name))) a
) a
) a
)
GO
SELECT * FROM #names a
CROSS APPLY dbo.ParseNameAndSuffix(a.name,'II;III;JR;SR') b
Off the top of my head, since you've got a small number of replacements, you could do something like this:
UPDATE [TableName] SET LastName = SUBSTRING(LastName,0, CHARINDEX(lastname,'III')), SUFFIX = 'III' WHERE CHARINDEX(lastname,'III') > 0;
You might want to do some pre-processing to make the formats more consistent.
Consider removing the ending period and replacing all commas with a space. After that, your sample should look like:
LastName
BRUNNING II
BURCH II
BUSS JR
CANI III
CHRISTIAN SR
COLVIN Jr
COWHERD JR
Then you can identify rows ending in ' I',' II',' III',' JR', and ' SR' and strip their suffix according to its length and update the Suffix field with the value you want.
If CLR is not an option then what Peter suggested is great. However, since you said that you have SQL# then you can do this in a simpler and more functional manner using the RegEx_MatchSimple function in SQL#. I will explain using Peter's example as the starting point.
We can set up the test using nearly the same SQL that Peter used but in this case I will create the Suffixes table as a real (non-Temp) table so that I can reference it in one of the example TVFs below. Maybe you might want to keep them in a table as opposed to passing in as a parameter, but I will show both styles. I also added two names to the #Names table to show how using RegEx can help capture variations in the data (extra spaces and/or commas):
USE [tempdb]
GO
IF OBJECT_ID('tempdb..#Names') IS NOT NULL DROP TABLE #Names
IF OBJECT_ID('tempdb.dbo.Suffixes') IS NOT NULL DROP TABLE dbo.Suffixes
CREATE TABLE #Names (LastName VARCHAR(32))
CREATE TABLE dbo.Suffixes (Suffix VARCHAR(32))
GO
INSERT #Names VALUES ('BRUNNING, II' )
INSERT #Names VALUES ('BURCH II' )
INSERT #Names VALUES ('BUSS, JR.' )
INSERT #Names VALUES ('CANI III' )
INSERT #Names VALUES ('CHRISTIAN,SR' )
INSERT #Names VALUES ('COLVIN Jr' )
INSERT #Names VALUES ('COWHERD,JR.' )
INSERT #Names VALUES ('BILLY BOB' )
INSERT #Names VALUES ('JOHNNY' )
INSERT #Names VALUES ('BRUNNING, II ' )
INSERT #Names VALUES ('SMITH ,, SR. ' )
INSERT dbo.Suffixes VALUES ('II' )
INSERT dbo.Suffixes VALUES ('III')
INSERT dbo.Suffixes VALUES ('JR' )
INSERT dbo.Suffixes VALUES ('SR' )
The first thing to show is a simple example of it working with the above data. In this case I use a CTE to generate the list of matches against the names and then filter out the rows that did not match anything. I enclosed the [FullMatch] field in colons so it would be easier to see the leading and trailing spaces being captured:
;WITH cte AS (
SELECT names.LastName,
[SQL#].[SQL#].RegEx_MatchSimple(names.LastName, '(([ ]*,+[ ]*)|([ ]+))' + suff.Suffix + '[.]*[ ]*$', 1, 'IgnoreCase') AS [FullMatch],
suff.suffix
FROM #Names names
CROSS JOIN tempdb.dbo.Suffixes suff
)
SELECT cte.LastName, ':' + cte.FullMatch + ':' AS [FullMatch], REPLACE(cte.LastName, cte.FullMatch, '') AS [Replacement], cte.Suffix
FROM cte
WHERE cte.FullMatch <> ''
You can carry this theory over to a direct UPDATE statement:
;WITH cte AS (
SELECT names.LastName,
[SQL#].[SQL#].RegEx_MatchSimple(names.LastName, '(([ ]*,+[ ]*)|([ ]+))' + suff.Suffix + '[.]*[ ]*$', 1, 'IgnoreCase') AS [FullMatch],
suff.Suffix
FROM MyTable names
CROSS JOIN NameSuffixes suff
)
UPDATE mt
SET mt.LastName = REPLACE(cte.LastName, cte.FullMatch, ''),
mt.NameSuffix = cte.Suffix
FROM MyTable mt
INNER JOIN cte
ON cte.LastName = mt.LastName
WHERE cte.FullMatch <> ''
You had requested this as a function so that is as follows:
CREATE FUNCTION dbo.ParseNameAndSuffix (@Name VARCHAR(64))
RETURNS TABLE AS RETURN
(
WITH cte AS (
SELECT @Name AS [LastName],
[SQL#].[SQL#].RegEx_MatchSimple(@Name, '(([ ]*,+[ ]*)|([ ]+))' + suff.Suffix + '[.]*[ ]*$', 1, 'IgnoreCase') AS [FullMatch],
suff.Suffix
FROM tempdb.dbo.Suffixes suff
)
SELECT cte.LastName, cte.FullMatch, REPLACE(cte.LastName, cte.FullMatch, '') AS [Replacement], cte.Suffix
FROM cte
WHERE cte.FullMatch <> ''
)
GO
And can be used like so:
SELECT *
FROM #Names a
CROSS APPLY dbo.ParseNameAndSuffix(a.LastName) b
-- or --
UPDATE mt
SET mt.LastName = REPLACE(parse.LastName, parse.Found, ''),
mt.NameSuffix = parse.Suffix
FROM MyTable mt
CROSS APPLY dbo.ParseNameAndSuffix(mt.LastName) parse
To more closely match the example given by Peter that passed in the suffixes as a parameter, that can be done as follows using the String_Split function in SQL#:
CREATE FUNCTION dbo.ParseNameAndSuffix2 (@Name VARCHAR(64), @Suffixes VARCHAR(MAX))
RETURNS TABLE AS RETURN
(
WITH cte AS (
SELECT @Name AS [LastName],
[SQL#].[SQL#].RegEx_MatchSimple(@Name, '(([ ]*,+[ ]*)|([ ]+))' + suff.Val + '[.]*[ ]*$', 1, 'IgnoreCase') AS [FullMatch],
suff.Val AS [Suffix]
FROM [SQL#].[SQL#].String_Split(@Suffixes, ';', 2) suff
)
SELECT cte.LastName, cte.FullMatch, REPLACE(cte.LastName, cte.FullMatch, '') AS [Replacement], cte.Suffix
FROM cte
WHERE cte.FullMatch <> ''
)
GO
which can then be used as follows:
SELECT * FROM #Names a
CROSS APPLY dbo.ParseNameAndSuffix2(a.LastName, 'II;III;JR;SR') b
-- or --
UPDATE mt
SET mt.LastName = REPLACE(parse.LastName, parse.Found, ''),
mt.NameSuffix = parse.Suffix
FROM MyTable mt
CROSS APPLY dbo.ParseNameAndSuffix2(mt.LastName, 'II;III;JR;SR') parse
I think your best bet is going to be a RegEx match for the last word (excluding punctuation marks) being in a list (JR, Sr, III, etc)
Check out this blog
http://blogs.msdn.com/b/khen1234/archive/2005/05/11/416392.aspx
精彩评论