开发者

Split document into pages with titles (Was: tricky Question In java )

开发者 https://www.devze.com 2023-01-26 03:41 出处:网络
I\'ve a little tricky problem i\'m trying to save pag开发者_JAVA百科es of document in sqlite ..the thing is i want in certain points lets say when it find thefont <font tag which is mark the title

I've a little tricky problem i'm trying to save pag开发者_JAVA百科es of document in sqlite ..the thing is i want in certain points lets say when it find the font <font tag which is mark the titles of the chapters of the document .. so in short i want the the title of the chapter to be the begining of the page.. so i've did that code `

Integer i=0;
int j= 0;
StringBuilder page = new StringBuilder();
String [] paragraphs = content.split("\n");
for (String paragraph : paragraphs){
        i++;
           page.append(paragraph).append("\n");
        Integer length = paragraphs.length;
        String stringPage=page.toString();



        stringPage= stringPage.replaceAll("\n","<br/>");
        String[] pageContents  = stringPage.split(" ");
        boolean beginOfStory=false;
        for (String pageContent:pageContents){

            if(pageContent.contains("<font")){
                beginOfStory=true;
                break;
            }
        }
        if(pageContents.length >180 || beginOfStory){

               j++;
               prep.setLong(1, j);
               prep.setString(2, stringPage);
               prep.addBatch();
               page =new StringBuilder();

    }

}

of course i know that this make the title to be the last thing in the page and it begin the new page after it .. but i want the title to be in the new page.. it's tricky for me and i can't get a clue for it.. any help..hope i've describe it will

thanks in advance


If I understand your design you are doing this:

  1. Splitting a string into paragraphs.
  2. Building a page by adding paragraphs one at a time
  3. After you add the paragraph to your page you check to see if there is a title by breaking up the entire page into words and looking for a word that contains the start of the HTML font tag.
  4. A page is considered complete if you found a title or it has more than 180 words

So if you want to put titles at the top check the paragraph for a title first before you add it to your page. Of course this also assumes that the way you detect titles is accurate in the first place (which I am less than certain of...)

Try this approach:

  1. Split into paragraphs
  2. Check the paragraph for a title marker
  3. If title found store the current page and start another with title paragraph as the first paragraph
  4. If title not found add paragraph and check page length
  5. If page length boundary reached store page and start another empty page

That should work...

0

精彩评论

暂无评论...
验证码 换一张
取 消