开发者

Obj-C / iOS: Look through a document for any one of several thousand words?

开发者 https://www.devze.com 2023-03-31 07:04 出处:网络
As part of a document reader I\'m writing for iPhone/iPad, I need the following functionality: Search through a do开发者_如何学编程cument of between appx 500 and 10000 words for words and phrases tha

As part of a document reader I'm writing for iPhone/iPad, I need the following functionality:

Search through a do开发者_如何学编程cument of between appx 500 and 10000 words for words and phrases that appear in one of several lists. Each list contains between 100 and 5000 words and phrases. When I find a word in the document that appears in one of those lists, I mark it and move on.

I will know the word lists ahead of time, but the documents will be unknown until the moment they need to be processed.

And this needs to be VERY FAST.

Any help would be greatly appreciated!


This presentation and paper present a fast multi-pattern string search algorithm. It also mentions some predecessors, should this one not fit your needs.

Multifast is an open source (LGPLed) C library that implements the Aho-Corasick algorithm.


I would create a huge hashmap with the phrases and words to search against at load time, since searching through hashmaps is very, very fast, especially at these sizes. Obviously a memory-hungry solution, but pretty trivial.

iOS 4 and above seems to have functionality for custom dictionaries; perhaps you could exploit that somehow?

0

精彩评论

暂无评论...
验证码 换一张
取 消