I'm doing a web crawler and I want to use lucene to index while the streaming is progressing or completed.
I开发者_JAVA技巧've seen that the example of lucene.net html library is good. however, I don't want to keep download into disk. what i want and is just indexing while downloading the web or maybe index of a string of html content.
Is there any example that makes lucence.net html indexer working with memory stream or a string?
something like that?
// create writer to index
IndexWriter iw = new IndexWriter(new FileInfo("C:\\example\\"), new StandardAnalyzer());
// create a document to index
Document d = new Document();
// create a field that the document will contain
Field aField = new Field("test", "", Field.Store.YES, Field.Index.ANALYZED);
// add the field to the document
d.Add(aField);
// index some data (4 documents)
aField.SetValue("Example 1");
iw.AddDocument(d);
aField.SetValue("Example 2");
iw.AddDocument(d);
aField.SetValue("Example 3");
iw.AddDocument(d);
aField.SetValue("Example 4");
// a field with Store.NO can be set with a TextReader
Field notStored = new Field("test2", "", Field.Store.NO, Field.Index.ANALYZED);
notStored.SetValue(new StringReader("Example 4 - From TextReader"));
// add new field to a 4th document
d.Add(notStored);
iw.AddDocument(d);
// closing writer commits changes to disk
iw.Close();
精彩评论