Count distinct co-occurrences_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-21 07:15 出处：网络

I have a database with a listing of documents and the words within them. Each row represents a term. What I\'m looking to do is to开发者_运维知识库 count how many documents a word occurs in.

相关专题：oracle sql

I have a database with a listing of documents and the words within them. Each row represents a term. What I'm looking to do is to开发者_运维知识库 count how many documents a word occurs in.

So, given the following:

+  doc  +  word  +
+-------+--------+
+   a   +  foo   +
+-------+--------+
+   a   +  foo   +
+-------+--------+
+   a   +  bar   +
+-------+--------+
+   b   +  bar   +
+-------+--------+

I'd get a result of

+  word  +  count  +
+--------+---------+
+  foo   +    1    +
+--------+---------+
+  bar   +    2    +
+--------+---------+

Because foo occurs in only one document (even if it occurs twice within that doc) and bar occurs in two documents.

Essentially, what (think) I should be doing is a COUNT of the words that the following query spits out,

SELECT DISTINCT word, doc FROM table

..but I can't quite figure it out. Any hints?

You can actually use distinct inside count, like:

select  word
,       count(distinct doc)
from    YourTable
group by
        word

This may be an aside, but i'm guessing this is not the best way to do this. Why are you tracking every word in every document? Take a look at Oracle Intermedia. It was built for this sort of thing (specifically text search).