There's a 开发者_开发知识库large set of entities of different kinds:
interface Entity {
}
interface Entity1 extends Entity {
String field1();
String field2();
}
interface Entity2 extends Entity {
String field1();
String field2();
String field3();
}
interface Entity3 extends Entity {
String field12();
String field23();
String field34();
}
Set<Entity> entities = ...
The task is to implement full text search for this set. By full text search I mean I just need to get entities that contain a substring I'm looking for (I don't need to know exact property, exact offset of where this substrig is, etc). In current implementation the Entity
interface has a method matches(String)
:
interface Entity {
boolean matches(String text);
}
Each entity class implements it depending on its internals:
class Entity1Impl implements Entity1 {
public String field1() {...}
public String field2() {...}
public boolean matches(String text) {
return field1().toLowerCase().contains(text.toLowerCase()) ||
field2().toLowerCase().contains(text.toLowerCase());
}
}
I believe this approach is really awful (though, it works). I'm considering using Lucene to build indexes every time I have a new set. By index I mean content -> id mappings. The content is just a trivial "sum" of all the fields I'm considering. So, for Entity1
the content would be concatenation of field1()
and field2()
. I have some doubts about the performance: building the index is often quite an expensive operation, so I'm not really sure if it helps.
Do you have any other suggestions?
To clarify the details:
Set<Entity> entities = ...
is of ~10000 items.Set<Entity> entities = ...
is not read from DB, so I can't just addwhere ...
condition. The data source is quite non-trivial, so I can't solve the problem on its side.Entities
should be thought of as of short articles, so some fields may be up to 10KB, while others may be ~10 bytes.- I need to perform this search quite often, but both the query string and original set are different every time, so it looks like I can't just build index once (because the set of entities is different every time).
For such a complex Object domain, you can use lucene wrapper tool like Compass which allow quickly map you object graph to lucene index using the same approach as ORM(like hibernate)
I would strongly consider using Lucene with SOLR. http://lucene.apache.org/java/docs/index.html
精彩评论