I have the data availability like this, few data are stored in the database and other are uploaded as the pdf/word/excel documents in the file server. how should the Lucene index be if I wish 开发者_JAVA技巧to index the above all? should the index be different for table and the documents? such that the search string is searched along the indexes or combine into a single index with various fields structure(does lucene support this?)?
thanks V
if you don't want to make a difference between the documents, you can use one index. you can go trough the stucture of a folder by using filesysteminfo. with filesysteminfo you can check if it is an folder or an document, if it is an document, you index it, if not you call the function again.
Dim filesysteminfo As FileSystemInfo
Dim FSIs As FileSystemInfo() = New DirectoryInfo(yourfolderroot).GetFileSystemInfos
For Each filesysteminfo In FSIs
If TypeOf filesysteminfo Is DirectoryInfo Then
function_create_document(filesysteminfo.FullName, indexwriter, id)
Else
Dim dynamic_doc As New Document()
Dim sr As System.IO.StreamReader = New StreamReader(filesysteminfo.FullName)
Dim filename As String = filesysteminfo.Name
...
if you want to make a difference, you can check if you get the document from the database or from your fileserver. Just store your information in a field.
use a stringvariable (yourstring) if your document is from the database yout string is "database" else it is "fileserver"
Dim field_typ As Field = New Field("doc_typ", yourstring, Field.Store.YES, Field.Index.TOKENIZED)
精彩评论