I'm using the follow query to try and identify data from a table with around 10 million rows
SELECT *
FROM
db.tablename x
JOIN db.tablename z
ON x.columnA = z.columnA
WHERE
x.columnB > z.columnB
The problem is that the table doesn't have a Primary Key because of the duplicated data in the value that is the primaryKey. The above query is hugely slow and there isn't any way I can figure out how to make it more effici开发者_运维百科ent.
Adding LIMIT 100
still doesn't seem to help ?
Any ideas ?
To find the duplicate values you could:
select columnA
from table
group by columnA
having count(*) > 1
Depending on what you then want to do you could put this in a temporary table.
But it seems strange that you have a concept of identity (even if not 100% correct all the time) and don't have an index on the data - would you not want to do lookups using this field fairly often? Perhaps you could create a non-unique index on columnA, at least while you run your query
精彩评论