开发者

Problem editing word file in PHP

开发者 https://www.devze.com 2023-01-08 01:48 出处:网络
So I need to edit some text in a Word document. I created a Word document and saved it as XML. It is saved correctly (I can open the XML file in MS Word and it looks exactly like the docx original).

So I need to edit some text in a Word document. I created a Word document and saved it as XML. It is saved correctly (I can open the XML file in MS Word and it looks exactly like the docx original).

So then I use PHP DOM to edit some text in the file (just two lines) (EDIT - bellow is already fixed working version):

<?php

$firstName = 'Richard';
$lastName = 'Knop';

$xml = file_get_contents('template.xml');

$doc = new DOMDocument();
$doc->loadXML($xml);
$doc->preserveWhiteSpace = false;

$wts = $doc->getElementsByTagNameNS('http://schemas.openxmlformats.org/wordprocessingml/2006/main', 't');

$c1 = 0; $c2 = 0;
foreach ($wts as $wt) {

    if (1 === $c1) {
        $wt->nodeValue .= ' ' . $firstName;
        $c1++;
    }

    if (1 === $c2) {
        $wt->nodeValue .= ' ' . $lastName;
        $c2++;
    }

    if ('First Name' === substr($wt->nodeValue, 0, 10)) {
        $c1++;
    }

    if ('Last Name' === substr($wt->nodeValue, 0, 9)) {
        $c2++;
    }

}

$xml = str_replace("\n", "\r\n", $xml); 

$fp = fopen('final-xml.xml', 'w');
fwrite($fp, $xml);
fclose($fp);

This gets executed properly (no errors). These two lines:

<w:t>First Name:</w:t>
<w:t>Last Name:</w:t>

Get replaced with these:

<w:t>First Name: Richard</w:t>
<w:t>Last Name: Knop</w:t>

However, when I try to open the final-xml.xml file in MS Word, it doesn't open (Word freezes). Any suggestions.

EDIT:

I tried usin开发者_运维知识库g levenstein():

$xml = file_get_contents('template.xml');
$xml2 = file_get_contents('final-xml.xml');

$str = str_split($xml, 255);
$str2 = str_split($xml2, 255);

$i = 0;
foreach ($str as $s) {
    $dist = levenshtein($s, $str2[$i]);
    if (0 <> $dist) {
        echo $dist, '<br />';
    }
    $i++;
}

Which outputted nothing.

Which is weird. When I open the final-xml.xml file in notepad, I can clearly see that those two lines have changed.

EDIT2:

Here is the template.xml file: http://uploading.com/files/61b2922b/template.xml/


This is a problem related to DOS vs UNIX line endings. Word 2007 does not tolerate a \n line ending, it requires \r\n whereas Word 2010 is more tolerant and accepts both versions.

To fix the problem make sure that you replace all UNIX line breaks with DOS ones before saving the output file:

$xml = str_replace("\n", "\r\n", $xml); 

Full sample:

<?php

$firstName = 'Richard';
$lastName = 'Knop';

$xml = file_get_contents('template.xml');

$doc = new DOMDocument();
$doc->loadXML($xml);
$doc->preserveWhiteSpace = false;

$wts = $doc->getElementsByTagNameNS('http://schemas.openxmlformats.org/wordprocessingml/2006/main', 't');

foreach ($wts as $wt) {
   echo $wt->nodeValue;

    if ('First Name:' === $wt->nodeValue) {
        $wt->nodeValue = 'First Name: ' . $firstName;
    }

    if ('Last Name:' === substr($wt->nodeValue, 0, 10)) {
        $wt->nodeValue = 'Last Name: ' . $lastName;
    }
}

$xml = $doc->saveXML();

// Replace UNIX with DOS line endings
$xml = str_replace("\n", "\r\n", $xml); 

$fp = fopen('final-xml.xml', 'w');
fwrite($fp, $xml);
fclose($fp);
?>


XML Word files have certain checksums stored near the top of the dom (to my recollection). You may have to change these, such as the size, or general checksum itself.

I know this was my problem when I was (dumb) enough to make an HTML file in word and save it, it has thousands of useless things in it that only served to make editing worse.

0

精彩评论

暂无评论...
验证码 换一张
取 消