开发者

Count distinct co-occurrences

开发者 https://www.devze.com 2023-01-21 07:15 出处:网络
I have a database with a listing of documents and the words within them. Each row represents a term. What I\'m looking to do is to开发者_运维知识库 count how many documents a word occurs in.

I have a database with a listing of documents and the words within them. Each row represents a term. What I'm looking to do is to开发者_运维知识库 count how many documents a word occurs in.

So, given the following:

+  doc  +  word  +
+-------+--------+
+   a   +  foo   +
+-------+--------+
+   a   +  foo   +
+-------+--------+
+   a   +  bar   +
+-------+--------+
+   b   +  bar   +
+-------+--------+

I'd get a result of

+  word  +  count  +
+--------+---------+
+  foo   +    1    +
+--------+---------+
+  bar   +    2    +
+--------+---------+

Because foo occurs in only one document (even if it occurs twice within that doc) and bar occurs in two documents.

Essentially, what (think) I should be doing is a COUNT of the words that the following query spits out,

SELECT DISTINCT word, doc FROM table

..but I can't quite figure it out. Any hints?


You can actually use distinct inside count, like:

select  word
,       count(distinct doc)
from    YourTable
group by
        word


This may be an aside, but i'm guessing this is not the best way to do this. Why are you tracking every word in every document? Take a look at Oracle Intermedia. It was built for this sort of thing (specifically text search).

0

精彩评论

暂无评论...
验证码 换一张
取 消