My application needs to able to indicate where in the original document do the highlights from Solr actually come from. For the time being, my project deals only with .txt files.
I'm using the highlights returned by Solr as string inputs to an richtextbox.find
function. Once I have the starting point of the hit, I highlight the 开发者_StackOverflow中文版string using richtextbox.select
function and set backcolor and color and other properties.
PROBLEM : RichTextBox.Find
is never returning a valid output (always -1), which means it's not finding my highlight text in the document.
I've tried removing the <em>
and </em>
tags along with the \n
tags that are there in the highlight string but won't be there in the actual text document, but it still doesn't
work. Searching the same string on MS Word or Notepad on the original file doesn't work either, even though the string appears identical to the text fragment in the file. Is there any other information I can get on changes i need to make to the string to make it searchable?
EDIT 1 :
I've tracked down the problem. Apparently in certain cases, the highlight that Solr returns itself contains some non-printable or junk characters not initially found in the original document. I need a way to reliably clean these on some criteria. My text contains a lot of valid special characters so I cannot afford to have those removed by mistake!
精彩评论