开发者

HTML to CALS tables?

开发者 https://www.devze.com 2023-02-03 14:29 出处:网络
I\'m checking to see if anyone has an XSLT laying around that transforms HTML tables to CALS.I开发者_C百科\'ve found a lot of material on going the other way (CALS to HTML), but not from HTML.I though

I'm checking to see if anyone has an XSLT laying around that transforms HTML tables to CALS. I开发者_C百科've found a lot of material on going the other way (CALS to HTML), but not from HTML. I thought somebody may have done this before so I don't have to reinvent the wheel. I'm not looking for a complete solution. Just a starting point.

If I get far enough on my own, I'll post it for future reference.


I've come up with a much simpler solution than what @Flack linked to:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="tbody">
    <xsl:variable name="maxColumns">
        <xsl:for-each select="tr">
            <xsl:sort select="sum(td/@colspan) + count(td[not(@colspan)])" data-type="number"/>
            <xsl:if test="position() = last()">
                <xsl:value-of select="sum(td/@colspan) + count(td[not(@colspan)])"/>
            </xsl:if>
        </xsl:for-each>
    </xsl:variable>
    <tgroup>
        <xsl:attribute name="cols">
            <xsl:value-of select="$maxColumns"/>
        </xsl:attribute>
        <xsl:apply-templates select="@*|node()"/>
    </tgroup>
</xsl:template>

<xsl:template match="td[@colspan > 1]">
    <entry>
        <xsl:attribute name="namest">
            <xsl:value-of select="sum(preceding-sibling::td/@colspan) + count(preceding-sibling::td[not(@colspan)]) + 1"/>
        </xsl:attribute>
        <xsl:attribute name="nameend">
            <xsl:value-of select="sum(preceding-sibling::td/@colspan) + count(preceding-sibling::td[not(@colspan)]) + @colspan"/>
        </xsl:attribute>
        <xsl:apply-templates select="@*[name() != 'colspan']|node()"/>
    </entry>
</xsl:template>

<xsl:template match="tr">
    <row>
        <xsl:apply-templates select="@*|node()"/>
    </row>
</xsl:template>

<xsl:template match="td">
    <entry>
        <xsl:apply-templates select="@*|node()"/>
    </entry>
</xsl:template>

<xsl:template match="td/@rowspan">
    <xsl:attribute name="morerows">
        <xsl:value-of select=". - 1"/>
    </xsl:attribute>
</xsl:template>

<!-- fallback rule -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>
</xsl:stylesheet>

There are two tricky points. First, a CALS table needs a tgroup/@cols attribute containing the number of columns. So we need to find the maximum number of cells in one row in the XHTML table - but we must heed colspan declarations so that a cell with colspan > 1 creates the right number of columns! The first template in my stylesheet does just that, based on @Tim C's answer to the max cells per row problem.

Another problem is that for multi-column cells XHTML says "this cell is 3 columns wide" (colspan="3") while CALS will say "this cell starts in column 2 and ends in column 4" (namest="2" nameend="4"). That transformation is done in the second template in the stylesheet.

The rest is indeed fairly straightforward. The stylesheet doesn't deal with details like changing style="width: 50%" into width="50%" etc. but those are relatively common problems, I believe.


I know it's 4 years later, but posting for someone who may come across:

ISOSTS XHTML table to CALS conversion


I know that this is a late answer, but I'm currently developing a Python library to easily convert tables from a XML format to another.

To convert the tables of a .docx document to CALS format, you can process as follow:

import os
import zipfile

from benker.converters.ooxml2cals import convert_ooxml2cals

# - Unzip the ``.docx`` in a temporary directory
src_zip = "/path/to/demo.docx"
tmp_dir = "/path/to/tmp/dir/"
with zipfile.ZipFile(src_zip) as zf:
    zf.extractall(tmp_dir)

# - Source paths
src_xml = os.path.join(tmp_dir, "word/document.xml")
styles_xml = os.path.join(tmp_dir, "word/styles.xml")

# - Destination path
dst_xml = "/path/to/demo.xml"

# - Create some options and convert tables
options = {
    'encoding': 'utf-8',
    'styles_path': styles_xml,
    'width_unit': "mm",
    'table_in_tgroup': True,
}
convert_ooxml2cals(src_xml, dst_xml, **options)

See: https://benker.readthedocs.io

note: (X)HTML format will come soon (contributions are welcome).


Though I don't the understand the particular difficulty, I googled some:

  • Stylesheet for converting XHTML tables to CALS
0

精彩评论

暂无评论...
验证码 换一张
取 消