开发者

R text mining package DocumentTermMatrix with a dictionary in the control list takes way too much memory [closed]

开发者 https://www.devze.com 2023-03-19 10:40 出处:网络
Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post.
Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? Add details and clarify the problem by editing this post.

Closed 6 years ago.

Improve this question

I have noticed that DocumentTermMatrix(myCorpus, control=list(dictionary=myDict)) consumes way more memory than DocumentTermMatrix(myCorpus)

Why is this happening?

Any leads?

Here is the code snippet:

library(tm)
library(XML)
source("MyXMLReader.r") # contains the myXML reader code 
myCorpus <- Corpus(DirSource(paste(basepath,"corpus",sep=""))
readerControl = list(reader = myXMLReader))
myDict = unlist(readLines("some-file-containing-a-fixed-vocab"))

Now here is my question:

dtm = DocumentTermMatrix(mYCorpus) # takes very little extra RAM to do this
dtm = DocumentTermMatrix(myCorpus,control=list(dictionary=myDict)) # Takes a whol开发者_运维百科e lot of # RAM` which is not even released after dtm is formed...

I guess there is a memory leak and possible bug.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号