I have a large (1 GB) file that I need to split into smaller files. I want each smaller file to contain 500 of the <OFFER>
tags.
Here is a small snippet of the large XML file:
<?xml version="1.0"?><RESULT>
<header>
<site>http://www.thomascook.fr</site>
<marque>ThomasCook France</marque>
<logo>http://www.example.com/example.gif</logo>
</header>
<OFFER>
<IFF>5810</IFF>
<TO>TCF</TO>
<COUNTRY>Chypre</COUNTRY>
<REGION>Chypre du Sud</REGION>
<HOTELNAME>Elias Beach & Country Club</HOTELNAME>
<DESCRIPTION>....</DESCRIPTION>
<TYPE>Sejour</TYPE>
<STARS>5.0<开发者_StackOverflow中文版/STARS>
<THEMAS>Plage directe;Special enfant;Bien-Etre-Fitness</THEMAS>
<THUMBNAIL>http://example.com/example.jpg</THUMBNAIL>
<URL>http://example.com/example.html</URL>
<DATE>
<BROCHURE>TCFB</BROCHURE>
<DURATION>7</DURATION>
<DURATION_VAR>6_6-9</DURATION_VAR>
<BOARD>Demi-pension</BOARD>
<DEPARTURE>27.2.2011</DEPARTURE>
<RETURN>6.3.2011</RETURN>
<DEPARTURE_CITY>PAR</DEPARTURE_CITY>
<ARRIVAL_CITY>LCA</ARRIVAL_CITY>
<PRICE>790</PRICE>
<URL>http://example.com/other-example.html</URL>
</DATE>
</OFFER>
<OFFER>
(etc)
</OFFER>
How can I do this?
From you english I understand that you want to split a big XML file into multiple small files. The best one is http://vtd-xml.sourceforge.net/
Sample Code, the following code will split the big xml based on XPath, TopTag/ChildTag
import java.io.File;
import java.io.FileOutputStream;
import com.ximpleware.AutoPilot;
import com.ximpleware.FastLongBuffer;
import com.ximpleware.VTDGen;
import com.ximpleware.VTDNav;
// This example shows how to split XML
public class Split {
public static void main(String[] args) {
String prefix = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n<TopTag xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n";
String suffix = "\n</TopTag<";
try {
VTDGen vg = new VTDGen();
if (vg.parseFile(args[0], false)) {
int splitBy = Integer.parseInt(args[1]);
String filePrefix = args[2];
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/TopTag/ChildTag");
// flb contains all the offset and length of the segments to be
// skipped
FastLongBuffer flb = new FastLongBuffer(4);
int i;
byte[] xml = vn.getXML().getBytes();
while ((i = ap.evalXPath()) != -1) {
flb.append(vn.getElementFragment());
}
int size = flb.size();
if (size != 0) {
File fo = null;
FileOutputStream fos = null;
for (int k = 0; k < size; k++) {
if (k % splitBy == 0) {
if (fo != null) {
fos.write(suffix.getBytes());
fos.close();
fo = null;
}
}
if (fo == null) {
fo = new File(filePrefix + k + ".xml");
fos = new FileOutputStream(fo);
fos.write(prefix.getBytes());
}
fos.write(xml, flb.lower32At(k), flb.upper32At(k));
}
if (fo != null) {
fos.write(suffix.getBytes());
fos.close();
fo = null;
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
As a programming question, this is just a matter of stax programming.
Every 500 elements make the necessary calls to end the element and the document, close the file, open a new file, start the new file, and continue along. If you have a program that can write one file in stax, it's not very different to write many.
精彩评论