I made a huge error in a gigantic XML file.
<item1>
<item2>
<item1>
//.. tons of stuff...
</item1>
</item2>
</item1>
I need to replace the outer item1 with something else. But find and replace isn't working because of the matching inner item1. I've tried searching by multiple pieces of information, but the single-line nature of every find and replace I find makes it impossible, and all of the data is tabbed.
An开发者_StackOverflowy ideas?
Can you use the tabs to your advantage? If it is as regular as your example then you can probably search and replace on \t\t\t<item>
(or whatever syntax you need to search with tabs) with whatever else you need.
If you can match a regular expression then you could match:
<item1>\n[any whitespace]<item2>
and change it to:
<item3>\n[any whitespace]<item2>
and the same for
</item1>\n[any whitespace]</item2>
and change it to:
</item3>\n[any whitespace]</item2>
I haven't specified the [any whitespace] expression as I know it's different for different editors.
xmlstarlet can help.
Regex may have worked in this instance, but regex is generally NOT the best means of modifying XML.
XML is not regular. You should use XML tools to parse and manipulate XML data, or you will likely run into problems at some point.
Transforming the XML using an XSLT identity transform with a template for the particular "item1" element is one example that would be a more safe, robust solution:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="item1[item2/item1]" >
<!--Replace this literal element "NEW_ITEM_ELEMENT" with whatever name you need to change "item1" elements to: -->
<NEW_ITEM_ELEMENT>
<xsl:apply-templates />
</NEW_ITEM_ELEMENT>
</xsl:template>
</xsl:stylesheet>
Use a regular expression search or other advanced search/replace method to replace the inner <item1> tag with something else temporarily (by specifying the tab characters before it as well). Then replace the remaining item1 tags, which will now be the outer ones, before changing your temporary ones back again.
If the XML is all formatted like that, you should be able to use regular expressions. You might also try formatters to get that format.
Otherwise you could read the XML with an XML-parser in a language you know, change it there and write it back to disk.
精彩评论