开发者

How to find elements by 'id' field in SVG file using Python

开发者 https://www.devze.com 2022-12-22 01:46 出处:网络
Below is an excerpt from an .svg file (which is xml): <text xml:space=\"preserve\" style=\"font-size:14.19380379px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text

Below is an excerpt from an .svg file (which is xml):

   <text
       xml:space="preserve"
       style="font-size:14.19380379px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;font-family:DejaVu Sans Mono;-inkscape-font-specification:DejaVu Sans Mono"
       x="109.38555"
       y="407.02847"
       id="libcode-00"
       sodipodi:linespacing="125%"
       inkscape:label="#text4638"><tspan
         sodipodi:role="line"
         id="tspan4640"
         x="109.38555"
         y="407.02847">12345678</tspan></text>

I'm learning Python and have no clue how can I find all such text elements that have an id field equal to libcode-XX where XX is a number.

I've loaded this .svg file using minidom's parser and tried to find elements using getElementById. However I'm getting None result.

    svgTemplate = minidom.parse(svgFile)
    print svgTemplate
    print svgTemplate.getElementById('libcode-00')

Going after other SO question I've tried using setIdAttribute('id') on svgTemplate object with no luck.

Bottom line: please give a hint for a smart way to extract all of these text elements that have ids in form of libcode-XX. After that it should be no problem to get tspan text and substitute it with generated c开发者_开发问答ontent.


Sorry, I don't know my way around minidom. Also, I had to find the namespace declarations from a sample svg document so that your excerpt could load.

I personally use lxml.etree. I'd recommend that you use XPATH for addressing parts of your XML document. It's pretty powerful and there's help here on SO if you're struggling.

There are lots of answers on SO about XPATH and etree. I've written several.

from lxml import etree
data = """
 <svg
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:cc="http://web.resource.org/cc/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:svg="http://www.w3.org/2000/svg"
    xmlns="http://www.w3.org/2000/svg"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
    xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
    width="50"
    height="25"
    id="svg2"
    sodipodi:version="0.32"
    inkscape:version="0.45.1"
    version="1.0"
    sodipodi:docbase="/home/tcooksey/Projects/qt-4.4/demos/embedded/embeddedsvgviewer/files"
    sodipodi:docname="v-slider-handle.svg"
    inkscape:output_extension="org.inkscape.output.svg.inkscape">
    <text
       xml:space="preserve"
       style="font-size:14.19380379px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;font-family:DejaVu Sans Mono;-inkscape-font-specification:DejaVu Sans Mono"
       x="109.38555"
       y="407.02847"
       id="libcode-00"
       sodipodi:linespacing="125%"
       inkscape:label="#text4638"><tspan
         sodipodi:role="line"
         id="tspan4640"
         x="109.38555"
         y="407.02847">12345678</tspan></text>
    </svg>
"""

nsmap = {
    'sodipodi': 'http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd',
    'cc': 'http://web.resource.org/cc/',
    'svg': 'http://www.w3.org/2000/svg',
    'dc': 'http://purl.org/dc/elements/1.1/',
    'xlink': 'http://www.w3.org/1999/xlink',
    'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
    'inkscape': 'http://www.inkscape.org/namespaces/inkscape'
    }


data = etree.XML(data)

# All svg text elements
>>> data.xpath('//svg:text',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfc9dc>]
# All svg text elements with id="libcode-00"
>>> data.xpath('//svg:text[@id="libcode-00"]',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfc9dc>]
# TSPAN child elements of text elements with id="libcode-00"
>>> data.xpath('//svg:text[@id="libcode-00"]/svg:tspan',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}tspan at b7cfc964>]
# All text elements with id starting with "libcode"
>>> data.xpath('//svg:text[fn:startswith(@id,"libcode")]',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfcc34>]
# Iterate text elements, access tspan child
>>> for elem in data.xpath('//svg:text[fn:startswith(@id,"libcode")]',namespaces=nsmap):
...     tp = elem.xpath('./svg:tspan',namespaces=nsmap)[0]
...     tp.text = "new text"

open("newfile.svg","w").write(etree.tostring(data))


Does it work if you replace 'id' with 'xml:id'?

If minidom doesn't know svg it might treat the 'id' attribute as just any other attribute, instead of being of type ID. A conforming svg implementation would recognize the 'id' attribute in svg content as being of type ID, and an xml implementation that loads external DTDs should also recognize it correctly if the file is tagged appropriately. Loading external DTDs is optional in XML, so the proper way of fixing this would be to make the parser svg-aware.

Definition of 'id' in SVG 1.1 DTD: http://www.w3.org/TR/SVG11/svgdtd.html#DTD.1.4


Adding a little bit to MattH's great example when you use xpath and you know the namespace you can do things like

pub_name = data.xpath('//dc:publisher/cc:Agent/dc:title',
                            namespaces=nsmap)[0].text

This will give direct access to the element tag text that you want.

0

精彩评论

暂无评论...
验证码 换一张
取 消