开发者

how to find frequency of a phrase (multiple token string) inside a document in java?

开发者 https://www.devze.com 2023-03-27 21:00 出处:网络
I want t开发者_运维百科o find the frequency of a multiple-token-string or phrase inside a document. Its not the word/single-term frequency that I am looking for, its always will be multiple-term and t

I want t开发者_运维百科o find the frequency of a multiple-token-string or phrase inside a document. Its not the word/single-term frequency that I am looking for, its always will be multiple-term and the number of terms are dynamic ...

ex : searching the frequency of "words with friends" inside a document!

Any help/pointer will be much appreciated.

Thanks Debjani


You can read the document line by line using Buffered Reader, and then use split function to get the frequency of word/token

int count=0;
while ((strLine = br.readLine()) != null)   {
     count+ = (strLine.split("words with friends").length-1);     
}
return count;

EDIT: And if you want to perform case-insensitive search, then you can use

Pattern myPattern = Pattern.compile("words with friends", Pattern.CASE_INSENSITIVE);
int count=0;
while ((strLine = br.readLine()) != null)   {
     count+ = (myPattern.split(strLine).length-1);    
}
return count;


Why not use regex? Regex is optimized for this sort of task.

http://download.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.html

0

精彩评论

暂无评论...
验证码 换一张
取 消