SQL for cases with 2 or more hits_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-11 08:30 出处：网络

I have this table that has a lot of book related fields including keywords. Book_ids are primary keys. Let\'s say I have a book with six keywords describing it, now how do I search all the other books

相关专题：sql

I have this table that has a lot of book related fields including keywords. Book_ids are primary keys. Let's say I have a book with six keywords describing it, now how do I search all the other books that have two or more same keywords?

What I have now is roughly this:

select book_id 
from book_fields
where keyword in ( select keyword from book_fields where book_id='123' )

The purpose is to get rough recommendations of similar books based on keywords. I'd like to do this with SQL if possible.

To clarify: The idea is to match a book to other b开发者_JAVA百科ooks using more than one keyword. All the keywords of all books were in book_fields table. No chance to touch the schema and RDMS was Oracle.

BEGIN;

CREATE SCHEMA books;
SET search_path TO books;

CREATE TABLE book_fields (
  book_id INT NOT NULL
, keyword VARCHAR(30) NOT NULL
, PRIMARY KEY (book_id, keyword) 
);

INSERT INTO book_fields (book_id, keyword)
VALUES
  (10, 'foo')
, (10, 'bar')
, (10, 'baz')
, (20, 'foo')
, (20, 'xxx')
, (20, 'baz')
, (30, 'yyy')
, (30, 'zzz')
;

SELECT
  lhs.book_id AS thisbook
, rhs.book_id AS otherbook
, COUNT(rhs.keyword)
FROM book_fields lhs, book_fields rhs
WHERE lhs.book_id <> rhs.book_id
  AND lhs.keyword = rhs.keyword
GROUP BY lhs.book_id, rhs.book_id
;

ROLLBACK;

Assuming you have a one-to-many table, KeywordsToBooks, having the following schema:

BookID int
Keyword varchar

with both columns as PK, BookID as a FK to your Books table, and passing @BookToMatchID as the ID of the Book you want to match on as a parameter, here's what I'd do:

SELECT BookID, COUNT(BookID) AS KeywordMatches
FROM KeywordsToBooks
WHERE BookID <> @BookToMatchID AND Keyword IN (
    SELECT Keyword
    FROM KeywordsToBooks
    WHERE BookID = @BookToMatchID)
GROUP BY BookID
HAVING COUNT(BookID) >= 2

As other's suggested, if this doesn't help, can you post the relevant bits of your schema please?

Assuming that books are stored in the table named books:

SELECT  *
FROM    books bo
WHERE   (
        SELECT  1
        FROM    book_fields bf
        JOIN    book_fields br
        ON      bf.keyword = br.keyword
        WHERE   br.book_id = 123
                AND bf.book_id = bo.id
        LIMIT 1, 1
        ) = 1

For better performance, do roughly what ristonj suggests, but avoid using the IN clause:

SELECT book_id, COUNT(*)
  FROM book_fields b
 WHERE EXISTS (SELECT 1
                 FROM book_fields a
                WHERE a.keyword = b.keyword
                  AND a.book_id = '123')
 GROUP BY book_id
HAVING COUNT(*) >= 2

This problem can be better addressed by creating a table which holds all keywords.

Is it feasible?