Parsing an XML file in Java_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-14 16:19 出处：网络

相关专题：xml

In the example code below, I have a question about the List. My prof adds the Document object to the ArrayList. It seems like this would just add the one Document object to the list, and not each individual Node. But then looking at the while loop, it seems like he gets the item at index 0, parses the information, then removes that item so he can look at the next information. So it seems like there is more going on in the ArrayList then just the one Document object. Is taht what is going on in the ArrayList/while loop portion? I'm getting confused on how this code works. Thanks in advance!

import java.io.*; 
import java.util.*; 
import javax.xml.parsers.*; 
import org.w3c.dom.*; 
import org.xml.sax.*; 


public class RSSReader {
    public static void main(String[] args) {
  开发者_开发问答      File f = new File("testrss.xml");
        if (f.isFile()) {
            System.out.println("is File");
            RSSReader xml = new RSSReader(f);
        }
    }

    public RSSReader(File xmlFile) {
        try {
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document doc = builder.parse(xmlFile);

            List<Node> nodeList = new ArrayList<Node>();
            nodeList.add(doc);

            while(nodeList.size() > 0) 
            { 
            Node node = nodeList.get(0); 

            if (node instanceof Element) { 
                System.out.println("Element Node: " + ((Element)node).getTagName()); 
                NamedNodeMap attrMap = node.getAttributes(); 
                for(int i = 0; i < attrMap.getLength(); i++) 
                { 
                    Attr attribute = (Attr) attrMap.item(i); 
                    System.out.print("\tAttribute Key: " + attribute.getName() 
                        + " Value: " + attribute.getValue()); 
                } 
                if(node.hasAttributes()) 
                    System.out.println(); 
            } 
            else if(node instanceof Text) 
                System.out.println("Text Node: " + node.getNodeValue()); 
            else 
                System.out.println("Other Type: " + node.getNodeValue()); 

            if(node.hasChildNodes()) 
            { 
                NodeList nl = node.getChildNodes(); 
                for(int i = 0; i < nl.getLength(); i++) 
                { 
                    nodeList.add(nl.item(i)); 
                } 
            } 
            nodeList.remove(0); 
            } 
        }

        catch (IOException e) {
            e.printStackTrace();
        }
        catch (SAXException e) {
            e.printStackTrace();
        }
        catch (IllegalArgumentException e) {
            e.printStackTrace();
        }
        catch (ParserConfigurationException e) {
            e.printStackTrace();
        }
    }
}

What I think your professor is demonstrating here is called a Breadth First algorithm. The key block of code in the loop is

if(node.hasChildNodes()) 
{ 
    NodeList nl = node.getChildNodes(); 
    for(int i = 0; i < nl.getLength(); i++) 
    { 
        nodeList.add(nl.item(i)); 
    } 
}

After processing an element in the list, this code will chack if the element has child elements to be processed. If it does, these will be added to the list to be processed.

My using this algorithm, the root element if first processed, then its children, then their children, and then the children below that, and so on until there are only leaves in the tree.

(On a side note: This seems to be to be the wrong approach for an XML document in general and an RSS feed specifically. I think you would want to do Depth First algorithm to make the output more understandable. In that case, you could use a Stack instead of a List.)

Every child of every node is added to the List<Node> by this code:

if(node.hasChildNodes()) 
{ 
    NodeList nl = node.getChildNodes(); 
    for(int i = 0; i < nl.getLength(); i++) 
    { 
        nodeList.add(nl.item(i)); 
    } 
}

Basically it means that every node in the document will be visited.