开发者

TSQL String matching question

开发者 https://www.devze.com 2023-02-06 12:25 出处:网络
I am trying to match 2 strings using TSQL. First string:      ABCD DFHG KLJKL Second string: ABCD DFHG KLJKL - 4536764

I am trying to match 2 strings using TSQL.

First string:      ABCD DFHG KLJKL

Second string: ABCD DFHG KLJKL - 4536764

Matching rule: if second string begins with the first string followed by " - " (that is, space, dash, space) and a set of numbers (and noth开发者_StackOverflow中文版ing else), consider it a match.

Any ideas?


I have two answers for you.

  1. Assuming that your FirstString values do not contain any of the characters %, _, OR [, this will return what you're asking for. Not only does it guarantee that the second string begins with the first and is followed by space-dash-space and a number, it also makes sure that only numbers follow from that point onward.

    If your table is very wide at all, then a nonclustered index that includes FirstString and SecondString as well as whatever other columns you want SELECTed (or they're in the clustered index) will make that index fully cover the query and could greatly improve performance.

    SELECT * 
       FROM Strings
       WHERE
          SecondString LIKE FirstString + ' - [0-9]%'
          AND SecondString NOT LIKE FirstString + ' - %[^0-9]%';
    

    I would also submit that if the FirstString is blank and SecondString starts immediately with ' - ' then it is correct per the specs.

  2. If your FirstString value DOES contain any of the above characters, then here's one way to handle that:

    SELECT * 
       FROM Strings
       WHERE
          Left(SecondString, Len(FirstString) + 3) = FirstString + ' - '
          AND Len(SecondString) > Len(FirstString) + 3
          AND Substring(SecondString, Len(FirstString) + 4, 2147483647) NOT LIKE '%[^0-9]%';
    

    This is kind of strange territory, here, so I would experiment also with this version to see if it performs any better:

    WITH S AS (
       SELECT
          *,
          Replace(Replace(Replace(Replace(
             FirstString,
             '\', '\\'),
             '%', '\%'),
             '_', '\_'),
             '[', '\[' --' just a comment to fix wonky code colorization
          ) FirstStringEscaped
       FROM Strings
    )
    SELECT *
    FROM S
    WHERE
       SecondString LIKE FirstStringEscaped + ' - [0-9]%' ESCAPE '\' --'
       AND SecondString NOT LIKE FirstStringEscaped + ' - %[^0-9]%' ESCAPE '\'; --'
    

Please note that if you want to properly handle spaces at the end of FirstString, some adjustment may be required (the second query which uses Len does not handle this case properly).


select * 
from theTable 
where SecondString like FirstString + ' - %[0-9]'
and SecondString not like FirstString + ' - %[^0-9]%'

This will select anything that has your string, followed by 1 space, followed by a dash, followed by 1 more space, followed by any set of numbers, and nothing but numbers.

EDITED: To filter out results with any cruft, not just letters, following the dash.


This query satisfies all the requirements.

select *
from #strings
where
  -- s2 contains s1 as the prefix.
  -- The addition of '.' is because sql considers ('abc' = 'abc ')
  LEFT(s2,Len(s1))+'.' = s1+'.'

  -- next 4 chars are space-dash-space-digit
  AND SUBSTRING(s2, Len(s1)+1, Len(s2)) LIKE ' - %[0-9]%'

  -- no non-digit letters after that
  AND NOT STUFF(s2, 1, len(s1)+4, '') LIKE '%[^0-9]%' 

  AND s1 > '' -- reject empty string1, added just in case

Here's a test table, showing you all the test cases

create table #strings (s1 varchar(100), s2 varchar(100))
insert into #strings values
    ('ABCD DFHG KLJKL', 'ABCD DFHG KLJKL - abc'), -- no, not number
    ('ABCD DFHG KLJKL', 'ABCD DFHG KLJKL - 123'), -- yes
    ('ABCD ', 'ABCD - 123'), -- no, 2nd string is first + '-' without space
    ('ABCD DFHG KLJKL - 123', 'ABCD DFHG KLJKL'), -- no, reversed
    ('KLJKL', 'KLJKL - 1.234'), -- ?? no, 2nd string is not digits only
    ('KL%', 'KLJKL - 1.234'), -- ?? no, 2nd string is not digits only
    ('', ' - 5234'), -- ?? no, blank string is not a match
    (null, ' - 1234'), -- ?? no, null is not equal to blank, which is not a match anyway
    ('ABCD DFHG KLJKL', null) -- no, of course not


select * 
from theTable 
where (FirstString = SecondString) 
or (FirstString = SUBSTRING(SecondString, 0, CHARINDEX('-', SecondString))
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号