开发者

How to do a Join in MySql based on partial matches?

开发者 https://www.devze.com 2023-03-11 14:16 出处:网络
I have two tables. Table A Column 1Column 2 CT 3C10.5 -23.12 OT 5A11.2 -24.5 Table B Column 1Column 2 PRIM12.3 -24.51, 10.5 -23.12, 61.24 -78.23

I have two tables.

Table A

Column 1     Column 2
CT 3C        10.5 -23.12
OT 5A        11.2 -24.5

Table B

Column 1     Column 2
PRIM         12.3 -24.51, 10.5 -23.12, 61.24 -78.23
SEC          8.7345 -46.1934, 10.49 -49.1834
TERT         18.98 -28.12, 11.23 -24.78, 1.45 -24.11, 11.2 -24.5

Resulting Table after the join:

Column 1      Column 2      Column 3     Column 4
CT 3C         10.5 -23.12  开发者_Python百科 PRIM         12.3 -24.51, 10.5 -23.12, 61.24 -78.23
OT 5A         11.2 -24.5    TERT         18.98 -28.12, 11.23 -24.78, 1.45 -24.11, 11.2 -24.5

Is there a way to this match without splitting the contents of 'column 2 in Table B' into separate columns? The contents in 'Column 2 of Table B' are delimited by ','.

10.5 -23.12 should match with 12.3 -24.51, 10.5 -23.12, 61.24 -78.23 as it is contained in the list of values present in it.

I can't split them into separate columns because in some cases there are as many as 300 distinct values in it separated by a ','


Short Term Solution

Use the FIND_IN_SET function:

SELECT a.column1, a.column2,
       b.column1, b.column2
  FROM TABLE_A a
  JOIN TABLE_B b ON FIND_IN_SET(a.column2, b.column2) > 0

Long Term Solution

Denormalized data should only be stored as a performance improvement, after it's been determined necessary -- not before.

You need a many-to-many table between TABLE_A and TABLE_B to hold the relationships represented in the comma separated lists.


As an attempt to give you what you're asking for, give the query below a shot. However, if there are tens or hundreds of thousands of values, I am not optimistic about its performance.

select a.col1 as col1, a.col2 as col2, b.col1 as col3, b.col2 as col4
from tableA a
join tableB b
 on (b.col2 like a.col2 + '%' 
      or b.col2 like '%,' + a.col2 + ',%'
      or b.col2 like '%' + a.col2)

The three or conditions are where the value is the first value, the last value, or any middle value. You can't just say '%' + a.col2 + '%' because it theoretically could match part of a number set and not the whole value. i.e. %11.2 -24.5% could match 111.2 -24.56.


Similar to this question, the FIND_IN_SET function suggested by Shakti Singh should work for you. I imagine it will be incredibly slow, though, if you have so many values in a single column.

Better would be to create a child table.


300 distinct values This is clearly not a field. A nested table, might be a better name it.

You are doing it wrong. If a single field contains about 300 records, you should keep it as a different table.


While I agree with the answers that suggest normalizing the database, I would point out that if you are not married to MySQL, Postgresql has an array type with an ANY() predicate that does what you want. However, besides the philosophical issues, if you normalize this situation indexes will probably also give you superior performance. AFAIK, the array version can't be indexed usefully.

0

精彩评论

暂无评论...
验证码 换一张
取 消