开发者

how to extract Paragraph text color from ms word using apache poi

开发者 https://www.devze.com 2023-01-24 22:17 出处:网络
i am using apache POI ,is it possible to read text background and foreground colo开发者_开发百科r from ms word paragraph I got the solution

i am using apache POI , is it possible to read text background and foreground colo开发者_开发百科r from ms word paragraph


I got the solution

            HWPFDocument doc = new HWPFDocument(fs);
            WordExtractor we = new WordExtractor(doc);
            Range range = doc.getRange();       
            String[] paragraphs = we.getParagraphText();
            for (int i = 0; i < paragraphs.length; i++) {
                org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);

                System.out.println(pr.getEndOffset());
                int j=0;
                while (true) {              
                 CharacterRun run = pr.getCharacterRun(j++);
                 System.out.println("-------------------------------");             
                 System.out.println("Color---"+ run.getColor());
                 System.out.println("getFontName---"+ run.getFontName());
                 System.out.println("getFontSize---"+ run.getFontSize());           

                if( run.getEndOffset()==pr.getEndOffset()){
                    break;
                }
                }
}


I found it in :

CharacterRun run = para.getCharacterRun(i)

i should be integer and should be incremented so the code will be as follow :

int c=0;
while (true) {
    CharacterRun run = para.getCharacterRun(c++);
    int x = run.getPicOffset();
    System.out.println("pic offset" + x);
    if (run.getEndOffset() == para.getEndOffset()) {
       break;
    }
}


  if (paragraph != null)
            {
                int numberOfRuns = paragraph.NumCharacterRuns;
                for (int runIndex = 0; runIndex < numberOfRuns; runIndex++)
                {
                    CharacterRun run = paragraph.GetCharacterRun(runIndex);
                    string color = getColor24(run.GetIco24());

                }
  }

GetColor24 Function to Convert Color in Hex Format for C#

     public static String getColor24(int argbValue)
    {
        if (argbValue == -1)
            return "";

        int bgrValue = argbValue & 0x00FFFFFF;
        int rgbValue = (bgrValue & 0x0000FF) << 16 | (bgrValue & 0x00FF00)
                | (bgrValue & 0xFF0000) >> 16;

        StringBuilder result = new StringBuilder("#");
        String hex = rgbValue.ToString("X");
        for (int i = hex.Length; i < 6; i++)
        {
            result.Append('0');
        }
        result.Append(hex);
        return result.ToString();
    }


if you are working on docx(OOXML), you may want to take a look on this:

import java.io.*
import org.apache.poi.xwpf.usermodel.XWPFDocument


fun test(){
   try {
            val file = File("file.docx")
            val fis = FileInputStream(file.absolutePath)
            val document = XWPFDocument(fis)
            val paragraphs = document.paragraphs

            for (para in paragraphs) {
                println("-- ("+para.alignment+") " + para.text)

                para.runs.forEach { it ->
                    println(
                            "text:" + it.text() + " "
                                    + "(color:" + it.color
                                    + ",fontFamily:" + it.fontFamily
                                    + ")"

                    )
                }

            }

            fis.close()
        } catch (e: Exception) {
            e.printStackTrace()
        }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消