开发者

Arabic text files searching and indexing

开发者 https://www.devze.com 2023-03-19 15:01 出处:网络
I 开发者_JAVA百科am working on a project of an electronic library (for Arabic books). A program that allows the user to import his books into the systems library and perform searching against his libr

I 开发者_JAVA百科am working on a project of an electronic library (for Arabic books). A program that allows the user to import his books into the systems library and perform searching against his library. The system is delivered to the user with a basic library (set of books) that the user ca update later.

To handle the searching problems, i thought for the system to have an initial table in the DB for the basic searching keywords. Every search keyword points to its locations in the books in the library.

The problem appears when in the user imports a new book into the library. There are two step. The first search the keywords that are already into the system against the new book to find if any of them appear in the book and add there location into the system. The second, which the main stumbling block, is to identify NEW search keywords in the new book.

The idea that i have, which i think is pretty bad and naive, is to break the new book into tokens and then search each token against all the book previously found in the library.

so to sum-up, if any help (tools, libraries or DB options) or idea to solve the second problem or another idea for the whole system, i appreciate. really tried reading and searching a lot of a solution, but in-vain.

Thanks a lot,


You want Lucene.net. You will need to use the Arabic Analyzer.


http://www.ibm.com/developerworks/java/library/os-apache-lucenesearch/index.html

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号