开发者

R generate bi- and trigrams from column

开发者 https://www.devze.com 2023-03-13 14:43 出处:网络
I have a column containing a word in each row: word ----- asdf wer asdf Is there a way to get the most frequent bi- and trigrams over all rows?

I have a column containing a word in each row:

 word
 -----
 asdf
 wer
 asdf

Is there a way to get the most frequent bi- and trigrams over all rows? For instance开发者_运维知识库 for bigrams:

aa: 10%
ab: 9%
.....


I have no experience with this particular sort of problem, but a little Google work turned up the tau package for "N-Gram Based Text Categorization". And using the textcnt function on your sample looked like this:

x <- c('asdf','wer','asdf')
textcnt(x,3)

and seems to return the sort of information you're looking for.

0

精彩评论

暂无评论...
验证码 换一张
取 消