I want to do a search engine in arabic, and i have already a code for searching in english I had just to change the Analyzer but when i wrote in arabic in the console, I change to UTF-8 and i get 0 found so I think that eclipse give the arabic word to the query in a code , and the query doesn't recognize this code, my question is how can I do to make the arabic wor开发者_运维知识库d readable to the query?
QueryParser parser = new QueryParser(Version.LUCENE_30,
"contents", new ArabicAnalyzer(Version.LUCENE_30));
Try looking in project properties, in the "Resource" section. Set your text file encoding to UTF-8 & see if that fixes the problem. I am assuming you have the right fonts already installed.
I believe you are reading characters like this:
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
try {
String token = reader.readLine();
System.out.println(token);
} catch (IOException e) {
e.printStackTrace();
}
In that case character encoding is exactly the same as current system code page (at least in Windows). The problem is, Eclipse will allow you to paste Arabic letters to its console window but will lose information during the process. I am not sure if setting System code page (in OS Regional options) to windows-1256 will help but it could. I have tried to pass Charset.forName("windows-1256")
as a second parameter to InputStreamReader and then input something with Arabic keyboard but it does not work.
OK, but we are not so helpless after all. Since that is meant for testing (right?), you can follow one of two approaches to fix the problem:
- Use some basic Swing UI (JFrame + JTextField + JLabel and maybe some button)
- Provide unescaping mechanism and enter characters as code points (i.e. \u0629)
The best fix would be to fix Eclipse (which is broken) and for example implement Console (System.console()) but I am not so sure if they would accept such patch.
You can try to give Unicode symbols in the console instead of Arabic characters. Use a converter like this one to convert your Arabic text to Unicode symbols.
精彩评论