开发者

How to find the position or location of string in given document

开发者 https://www.devze.com 2022-12-21 14:29 出处:网络
How to find the position or location of string in given document.I have one word document and i want to store all its words and word positions in database so thats why i need to find the position of t

How to find the position or location of string in given document.I have one word document and i want to store all its words and word positions in database so thats why i need to find the position of the words.

so please tell me how can i find position or lo开发者_运维技巧cation of word or string in given document.

i intend to use vb.net or c# for and .doc documents


Mmmm... I haven´t found a more smart solution :-/ but maybe this helps you... We´ll suppose that you have some version of MS Office installed in your system.

First of all, you have to add a reference in your project to a Microsoft COM component called "Microsoft Word ?* object library"

*? It deppends of the version of your MS Office

After you´ve added the reference, you could test this code:

using System;
using System.Collections.Generic;
using System.Text;
using Word;

namespace ConsoleApplication1
{
    class Program
    {

        static void Main(string[] args)
        {

            // Find the full path of our document

            System.IO.FileInfo ExecutableFileInfo = new System.IO.FileInfo(System.Reflection.Assembly.GetEntryAssembly().Location);            
            object docFileName = System.IO.Path.Combine(ExecutableFileInfo.DirectoryName, "document.doc");

            // Create the needed Word.Application and Word.Document objects

            object nullObject = System.Reflection.Missing.Value;
            Word.Application application = new Word.ApplicationClass();
            Word.Document document = application.Documents.Open(ref docFileName, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject);


            string wholeTextContent = document.Content.Text; 
            wholeTextContent = wholeTextContent.Replace('\r', ' '); // Delete lines between paragraphs
            string[] splittedTextContent = wholeTextContent.Split(' '); // Get the separate words

            int index = 1;
            foreach (string singleWord in splittedTextContent)
            {
                if (singleWord.Trim().Length > 0) // We don´t need to store white spaces
                {
                    Console.WriteLine("Word: " + singleWord + "(position: " + index.ToString() + ")");
                    index++;
                }
            }

            // Dispose Word.Application and Word.Document objects resources

            document.Close(ref nullObject, ref nullObject, ref nullObject);
            application.Quit(ref nullObject, ref nullObject, ref nullObject);
            document = null;
            application = null;

            Console.ReadLine(); 
        }
    }
}

I´ll test it and it looks that it works =)

0

精彩评论

暂无评论...
验证码 换一张
取 消