I have stored a number of binary files in a SQL Server table. I created a full-text-index on that table which also indexes the binary field containing the documents. I installed the appropriate iFilters such that SQL Server can also read .doc, .docx and .pdf files.开发者_如何学C
Using the function DATALENGTH I can retrieve the length/size of the complete document, but this also includes layout and other useless information. I want to know the length of the text of the documents.
Using the iFilters SQL Server is able to retrieve only the text of such "complicated" documents but can it also be used to determine the length of just the text?
As far as I know (which isn't much), there is no way to query document properties via FTS. I would get the word count before inserting the document into the database, then insert the count along with it, into another column in the table. For Word documents, you can use the Document.Words.Count property; I don't know what the equivalent mechanism is for PDF documents.
精彩评论