< rss xmlns:media=\"http://search.yahoo.com/mrss/\" xmlns:ynews=\"http://news.yahoo.com/rss/\" version=\"2.0\" >< channel >" />
开发者

splitting up of xml file using java

开发者 https://www.devze.com 2023-02-22 16:55 出处:网络
< ?xml version=\"1.0\" encoding=\"utf-8\"? > < rss xmlns:media=\"http://search.yahoo.com/mrss/\" xmlns:ynews=\"http://news.yahoo.com/rss/\" version=\"2.0\" >< channel >

< ?xml version="1.0" encoding="utf-8"? > < rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:ynews="http://news.yahoo.com/rss/" version="2.0" > < channel >

< title>Cricket News Headlines | Cricket News - Yahoo! News India< /title>

< link>http://in.news.yahoo.com/cricket/< /link>

< description>Check out the latest Cricket news headlines from Yahoo! News India. F开发者_Python百科ind top Cricket stories and in-depth coverage of Cricket news from India and around the world.< /description>

< language>en-IN< /language>

< copyright>Copyright (c) 2011 Yahoo! Inc. All rights reserved< /copyright>

< pubDate>2011-04-06T15:30:02+05:30< /pubDate>

< ttl>5< /ttl>

< image>

< title>Cricket News Headlines | Cricket News - Yahoo! News India< /title>

< link>http://in.news.yahoo.com/cricket/< /link>

< url>http://l.yimg.com/os/mit/media/m/index/img/Yahoo_logo_en- IN.gif< /url>

< /image> < item>< title>Hectic schedule will drain players, says Dhoni< /title>

< description>Chennai, Apr 6 (PTI) ...< /description>

< link>http://in.news.yahoo.com/hectic-schedule-drain-players-says-dhoni-20110406-023100-889.html< /link>

< pubDate>2011-04-06T09:31:00Z< /pubDate>

< source>PTI< /source>

< guid isPermaLink="false">/hectic-schedule-drain-players-says-dhoni-20110406-023100-889.html< /guid>

< /item>

< item>

< title>India, Pakistan trade secretaries to meet on April 27-28< /title>

< description>New Delhi, Apr 6 (PTI) ...< /description>

< link>http://in.news.yahoo.com/india-pakistan-trade-secretaries-meet-april-27-28-20110406-023100-140.html< /link>

I want only the HEADLINES from this XML, that is only between < item>< title>MESSAGES< /title> tags. Also have to print the message one after other continuously. how can i do this.


I would use the javax.xml.xpath APIs that are included in Java SE 5 for this.

import java.io.FileReader;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        XPath xPath = XPathFactory.newInstance().newXPath();

        FileReader reader = new FileReader("input.xml");
        InputSource xml = new InputSource(reader);
        NodeList titleNodes = (NodeList) xPath.evaluate("//item/title", xml, XPathConstants.NODESET);

        for(int x=0; x<titleNodes.getLength(); x++) {
            System.out.println(titleNodes.item(x).getTextContent());
        }
    }

}


Parse the file to create a DOM document. On this DOM select all title elements and their text contents are the headlines you're looking for.

Quick example with dom4j:

File xml = new File("input.xml");     // replace with your document
SAXReader reader = new SAXReader();
Document doc = reader.read(xml);
List titles = doc.selectNode("//item/title");  // a list of all title elements
for (Object obj:titles) 
   System.out.println(((Element) obj).getText());

Should print all titles to the console


This is something that comes up often. I have a groovy script to do this. It is available here.

https://github.com/ramanathanrv/utils/blob/master/groovy/split_xml.groovy

Usage: groovy split_xml.groovy <input_file_name> <no_of_pieces>

PS: This is not my code. I got this code from somewhere but really forgot the source.

0

精彩评论

暂无评论...
验证码 换一张
取 消