开发者

how to query duplicate string entry in mysql

开发者 https://www.devze.com 2023-03-06 13:10 出处:网络
I have a program which has a function to show AT LEAST 2 duplicated similar string in MySQL records, I have a current implementation which works well BUT too slow for 300K records.

I have a program which has a function to show AT LEAST 2 duplicated similar string in MySQL records, I have a current implementation which works well BUT too slow for 300K records.

Example CONTACT table entries :

id (int)| name (string) | phone (string)
-----------------------------------------
1       | mike          | 开发者_JAVA百科081239812345678
2       | jhon          | 082222212345678
3       | rudy          | 081237712345678
4       | lucy          | 081237712345123
5       | lily          | 081244412345678

I need to get record belong to "mike, jhon, rudy and lily" because 8 of their last phone number digits are the same (duplicated), but record belong to lucy is ignored (because no other record has similar 8 digit ending number)

My current method is to use 2 queries like :

My first query :

"select right(phoner, 8) as myRight
    , count(*) as totdup
     from contact
     group by myRight
     having totdup > 1";

from this query I can get the value of "myRight" then I execute second query to get the detail:

"select * from contact where phone like '%$myRight'";

My question is HOW to speed up the process because I have 300K records and it takes around 20 minutes for this query, also I am looking to simplify the query by only using 1 query but I do not know how, I have been struggle with this problem for couple days now, your help will be very appreciated.


If you store the (8) rightmost characters of the phonenumbers in reverse order then
Your table will have the following fields:

id (int)| name (string) | phone (string)  | phonerev (string)
----------------------------------------------------
1       | mike          | 081239812345678 | 876543218932180
2       | jhon          | 082222212345678 | 876543212222280
3       | rudy          | 081237712345678 | 876543217732180
4       | lucy          | 081237712345123 | 321543217732180
5       | lily          | 081244412345678 | 876543214442180

you can do a query like:

select right(phone,8) as myRight
from contact c1
inner join contract c2 on (left(c1.phonerev,8) = left(c2.phonerev,8) 
                           and c1.id <> c2.id)
group by left(phonerev,8)

Make sure you set an index on phonerev

You can unify the query by doing:

select c1.*
from contact c1
inner join contract c2 on (left(c1.phonerev,8) = left(c2.phonerev,8) 
                           and c1.id <> c2.id)

This will allow the use of an index on the phonenumber

If you just store the 8 rightmost chars in reverse order, then the query becomes:

select right(phone,8) as myRight
from contact c1
inner join contract c2 on (c1.phonerev,8 = c2.phonerev 
                           and c1.id <> c2.id)
group by phonerev;

Which is even faster.


It looks like you're running a loop over the result rows from the first query and executing the second query once for each such row.

I'd suggest putting the first result into a temporary table and then doing

SELECT contact.* FROM contact JOIN temporary
 WHERE right(contact.phoner, 8) = temporary.myRight

Putting an index on the temporary table will probably improve your performance even more.


Depending on which version of MySQL you're using, this might work:

select *
from contact
where RIGHT(phone, 8) IN (
 select right(phone, 8) as myRight
     from contact
     group by myRight
     having COUNT(*) > 1
)


@Johan, you gave me a good direction to speed up the query, I added a new field for rightmost 8 char of phone, this avoid using "right(phone, 8)" in the query.

@Will A, thanks for your help but the query actually slower than my current solution, I dont know why.

@LHMathies, your idea is good, but making temp table for storing myRight is not applicable for my server/database condition.

Thank you guys~

0

精彩评论

暂无评论...
验证码 换一张
取 消