regex issue while parsing .pdf file using CAM::PDF_问答_开发者

regex issue while parsing .pdf file using CAM::PDF

开发者 https://www.devze.com 2023-03-06 18:36 出处：网络

Unmatched [ in regex; marked by <-- HERE in m/ <-- HERE / at ./pdf_parse.pl line 37. Actually I\'m parsing .pdf file word by word [in order to make a dictionary out of it]

相关专题：pdf perl

Unmatched [ in regex; marked by <-- HERE in m/ <-- HERE / at ./pdf_parse.pl line 37.

Actually I'm parsing .pdf file word by word [in order to make a dictionary out of it] line 37:-

if(grep(!/$w开发者_如何学Cord/,@line_rd)){
}

Well actual word where parser script stops working is in different font [in side the pdf which I'm parsing], is that the culprit here ?

Whether CAM::PDF recognizes words in different fonts ? What care should i do, in order to stop this !

You need to quote $word in the regular expression if it can contain special chars (like [ or even .). Try with:

if (grep(!/\Q$word\E/, @line_rd)) {
  ...
}

If you want to make a dictionary of all the words, use a hash:

my %allwords;
...
  # each time you have a new word incoming from the parser:
  $allwords{$word}++;

At the end, the %allwords hash will contain the distinct words as keys, and the word count as values. You could e.g. print it using:

map {
 print "Word $_: count: ", $allwords{$_}, "\n";
} (sort keys %allwords);

regex issue while parsing .pdf file using CAM::PDF

精彩评论

关注公众号

热门标签

图文推荐

regex issue while parsing .pdf file using CAM::PDF

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：