Lucene Search with Unicode Characters_问答_开发者

开发者 https://www.devze.com 2023-01-08 22:10 出处：网络

I have indexed a database of some texts and the database texts are of Unicode encoding. When I search for an English word with Lucene search everything goes OK. But when I use a non-English quer开发者

Exception in thread "main" org.apache.lucene.queryParser.ParseException: Cannot parse '??': '*' or '?' not allowed as the first character in WildcardQuery
        at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:187)
        at Search.main(Search.java:151)
Caused by: org.apache.lucene.queryParser.ParseException: '*' or '?' not allowed as first character in WildcardQuery
        at org.apache.lucene.queryParser.QueryParser.getWildcardQuery(QueryParser.java:923)
        at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1347)
        at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1250)
        at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1178)
        at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1167)
        at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:182)
        ... 1 more

What should I do?

Thank you.

Two points here -

What is the encoding type of your source file (*.java). Make sure it is UTF-8
The default encoding of Java is likely to be something other than utf8. Make sure you specify the encoding like:

InputStreamReader( new FileInputStream(filename), "UTF-8");`