1.Document
接口对象是官方出的,W3C标准,作为HTML、XML实体类加载到内存中,形成文档对象,然后使用循环进行数据解析。
2.SAXParser
SAXParser是一个用于处理XML的事件驱动的“推”模型。它不是W3C标准,但它是一个得到了广泛认可的API,大多数SAXParser解析器在实现的时候都遵循标准。
SAXParser解析器不象DOM那样建立一个整个文档的树型表示,而是使用数据流的方式读取,然后根据读取文档的元素类型进行事件反馈。这些事件将会推给事件处理器,而事件处理器则提供对文档内容的访问数据包装等。
事件处理器有三种基本类型:
- 用于访问XML DTD内容的DTDHandler;
- 用于低级访问解析错误的ErrorHandler;
- 用于访问文档内容的最普遍类型ContentHandler。
3.XMLStreamReader(StAX)
XMLStreamReader也属于数据留解析的一种,读入文件,按线性的方式从文件头一直读到文件尾;和SAXParser一样,使用事件驱动的模型来反馈事件。不同的是,XMLStreamReader不使用SAXParser的推模型,而是使用 “拉”模型进行事件处理。而且XMLStreamReader解析器不使用回调机制,而是根据应用程序的要求返回事件。XMLStreamReader还提供了用户友好的API用于读入和写出。
尽管SAXParser向ContentHandler返回不同类型的事件,但XMLStreamReader却将它的事件返回给应用程序,甚至可以以对象的形式提供事件。
当应用程序要求一个事件时,XMLStreamReader解析器根据需要从XML文档读取并将该事件返回给该应用程序。 XMLStreamReader提供了用于创建XMLStreamReader读写器的工具,所以应用程序可以使用StAX接口而无需参考特定实现的细节。
与Document和SAXParser不同,XMLStreamReader指定了两个解析模型:指针模型,如SAXParser,它简单地返回事件;迭代程序模型,它以对象形式返回事件(这里需要吐槽一下,我个人是比较喜欢SAXParser的handler事件处理的模式,代码方面比较值观),其实XMLStreamReader也可以跟SAXParser一样,但是需要额外的对象创建开销。
以下来看看示例代码:
1.Document解析XML的基础代码:
DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(path); Element element = document.getDocumentElement();
只需要三行代码就可以把Element对象读出来,这时候只需要遍历Element对象,就可以把数据组装出来。
2.SAXParser解析XML的基础代码
SAXParserFactory factory = SAXParserFactory.newInstance(); try { SAXParser parser = factory.newSAXParser(); parser.parse(path, handler); } catch (ParserConfigurationException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); }
也是三行代码,其中比较重要的是handler的事件回调,这里使用的是DefaultHandler。
3.XMLStreamReader(StAX)
InputStream in = new FileInputStream(path); XMLInputFactory factory = XMLInputFactory.newFactory(); XMLStreamReader reader = factory.createXMLStreamReader(in); while (reader.hasNext()) { int event = reader.next(); if (event == XMLStreamConstants.START_ELEMENT) { } else if (event == XMLStreamConstants.END_ELEMENT) { } else if (event == XMLStreamConstants.END_DOCUMENT) { out("Use StAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms"); } }
这里使用InputStream读入文件流,然后把流数据传递给XMLStreamReader对象,接着就循环遍历,在循环中必须使用.next()返回事件类型。
以下是我测试读取全国地区(含县级)数据的测试时间:
Document使用了103ms,其中SAXParser解析最快,基本上都是10~16ms之间,这取决于个人电脑,我的是比较烂的垃圾笔记本。
以下贴出读取全国XML地区数据的JAVA代码,三种方式:
一、Document
import model.AreaModel; import model.AreaNode; import model.CityModel; import org.w3c.dom.*; import org.xml.sax.SAXException; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import java.io.IOException; import java.util.ArrayList; import java.util.List; /** * Document解析 * Created by alan on 2018/12/16. */ public class XmlParserByDocument extends OutPut { private String path; List<AreaModel> areaModels = new ArrayList<>(); public XmlParserByDocument() { } public XmlParserByDocument(String path) { this.path = path; } public List<AreaModel> getAreaModels() { return areaModels; } public void parser() { long t = System.currentTimeMillis(); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); try { DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(path); Element element = document.getDocumentElement(); out("document v" + document.getXmlVersion() + " encode " + document.getInputEncoding()); if ("root".equals(element.getTagName())) { NodeList nodeList = element.getChildNodes(); AreaModel area = null; CityModel city; for (int i = 0; i < nodeList.getLength(); i++) { String nodeName = nodeList.item(i).getNodeName(); if ("province".equals(nodeName)) { area = new AreaModel(parserNode(nodeList.item(i)), parserNodeList(nodeList.item(i).getChildNodes())); areaModels.add(area); } } out("Use Document object and use time is " + (System.currentTimeMillis() - t) + "ms."); } else { throw new Exception("invalid xml file."); } } catch (ParserConfigurationException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } finally { } } public void test(){ String str = ""; for (AreaModel a : areaModels) { str += a.getProvince() + "\n"; for (AreaNode n : a.getCitys()) { str += "\t" + n + "\n"; for (AreaNode j : n.getChild()) { str += "\t\t" + j + "\n"; } } } out(str); } private List<AreaNode> parserNodeList(NodeList list) { List<AreaNode> nodes = new ArrayList<>(); int l = list.getLength(); for (int i = 0; i < list.getLength(); i++) { if (list.item(i).hasChildNodes()) { AreaNode node = parserNode(list.item(i)); node.setChild(parserNodeList(list.item(i).getChildNodes())); nodes.add(node); } else { AreaNode node = parserNode(list.item(i)); if (node != null) { nodes.add(node); } } } return nodes; } private AreaNode parserNode(Node node) { AreaNode areaNode = null; NamedNodeMap attrs = node.getAttributes(); if (attrs != null) { areaNode = new AreaNode(attrs.getNamedItem("name").getTextContent(), Integer.valueOf(attrs.getNamedItem("postcode").getTextContent())); } return areaNode; } }
二、SAXParser
import model.AreaModel; import model.AreaNode; import org.xml.sax.*; import org.xml.sax.helpers.AttributesImpl; import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import java.io.IOException; import java.util.ArrayList; import java.util.List; /** * Stream解析bySAX * Created by alan on 2018/12/16. */ public class XmlParserBySAX extends OutPut { private String path = "d:/test/area.xml"; private List<AreaModel> areaModels; public XmlParserBySAX() { } public XmlParserBySAX(String path) { this.path = path; } public List<AreaModel> getAreaModels() { return areaModels; } public void parser() { SAXParserFactory factory = SAXParserFactory.newInstance(); try { SAXParser parser = factory.newSAXParser(); parser.parse(path, handler); } catch (ParserConfigurationException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } public void test(){ String str = ""; for (AreaModel a : areaModels) { str += a.getProvince() + "\n"; for (AreaNode n : a.getCitys()) { str += "\t" + n + "\n"; for (AreaNode j : n.getChild()) { str += "\t\t" + j + "\n"; } } } out(str); } private long t = 0; private DefaultHandler handler = new DefaultHandler() { private AreaModel province; private List<AreaNode> citys; private List<AreaNode> areas; private AreaNode city; @Override public void startDocument() throws SAXException { areaModels = new ArrayList<>(); t = System.currentTimeMillis(); // out("start...."); } @Override public void endDocument() throws SAXException { out("Use SAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms"); } @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { switch (qName) { case "province": province = new AreaModel(); province.setProvince(new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode")))); citys = new ArrayList<>(); break; case "city": city = new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode"))); areas = new ArrayList<>(); break; case "area": areas.add(new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode")))); break; } } @Override public void endElement(String uri, String localName, String qName) throws SAXException { switch (qName) { case "province": province.setCitys(citys); areaModels.add(province); break; case "city": city.setChild(areas); citys.add(city); break; case "area": break; } } }; }
三、XMLStreamReader(StAX)
import com.sun.org.apache.bcel.internal.generic.BREAKPOINT; import model.AreaModel; import model.AreaNode; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamConstants; import javax.xml.stream.XMLStreamException; import javax.xml.stream.XMLStreamReader; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.InputStream; import java.util.ArrayList; import java.util.List; /** * 拉解析器解析 * Created by alan on 2018/12/16. */ public class XmlParserByStAX extends OutPut { private String path; private List<AreaModel> areaModels = new ArrayList<>(); public XmlParserByStAX() { } public XmlParserByStAX(String path) { this.path = path; } public List<AreaModel> getAreaModels() { return areaModels; } public void parser() { try { InputStream in = new FileInputStream(path); XMLInputFactory factory = XMLInputFactory.newFactory(); XMLStreamReader reader = factory.createXMLStreamReader(in); AreaModel province = null; List<AreaNode> citys = null; List<AreaNode> areas = null; AreaNode city = null; long t = System.currentTimeMillis(); areaModels = new ArrayList<>(); while (reader.hasNext()) { int event = reader.next(); if (event == XMLStreamConstants.START_ELEMENT) { switch (reader.getName().toString()) { case "province": province = new AreaModel(); province.setProvince(new AreaNode(reader.getAttributeValue(null,"name"), Integer.valueOf(reader.getAttributeValue(null,"postcode")))); citys = new ArrayList<>(); break; case "city": city = new AreaNode(reader.getAttributeValue(null,"name"), Integer.valueOf(reader.getAttributeValue(null,"postcode"))); areas = new ArrayList<>(); break; case "area": areas.add(new AreaNode(reader.getAttributeValue(null,"name"), Integer.valueOf(reader.getAttributeValue(null,"postcode")))); break; } } else if (event == XMLStreamConstants.END_ELEMENT) { switch (reader.getName().toString()) { case "province": province.setCitys(citys); areaModels.add(province); break; case "city": city.setChild(areas); citys.add(city); break; case "area": break; } } else if (event == XMLStreamConstants.END_DOCUMENT) { out("Use StAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms"); } } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (XMLStreamException e) { e.printStackTrace(); } } public void test() { String str = ""; for (AreaModel a : areaModels) { str += a.getProvince() + "\n"; for (AreaNode n : a.getCitys()) { str += "\t" + n + "\n"; for (AreaNode j : n.getChild()) { str += "\t\t" + j + "\n"; } } } out(str); } }
四、AreaModel模型类源码
package model; import java.util.List; /** * Created by alan on 2018/12/15. */ public class AreaModel { private AreaNode province; private List<AreaNode> citys; public AreaModel(){} public AreaModel(AreaNode province, List<AreaNode> citys) { this.province = province; this.citys = citys; } public AreaNode getProvince() { return province; } public void setProvince(AreaNode province) { this.province = province; } public List<AreaNode> getCitys() { return citys; } public void setCitys(List<AreaNode> citys) { this.citys = citys; } }
五、AreaNode模型类源码
package model; import java.util.List; /** * Created by alan on 2018/12/15. */ public class AreaNode { private String name; private Integer postCode; private List<AreaNode> child; public AreaNode() { } public AreaNode(String name, Integer postCode) { this.name = name; this.postCode = postCode; } public String getName() { return name; } public void setName(String name) { this.name = name; } public Integer getPostCode() { return postCode; } public void setPostCode(Integer postCode) { this.postCode = postCode; } public List<AreaNode> getChild() { return child; } public void setChild(List<AreaNode> child) { this.child = child; } @Override public String toString() { String r = "{name:\"%s\",postCode:\"%s\"}"; String str = String.format(r, this.getName(), this.getPostCode()); return str; } }
所有的代码都贴出来了,现在需要一个main()方法测试:
private static String path = "d:/test/area.xml"; public static void main(String[] args) { EventQueue.invokeLater(() -> { out("..."); XmlParserByDocument document = new XmlParserByDocument(path); document.parser(); //the 2. XmlParserBySAX sax = new XmlParserBySAX(path); sax.parser(); //the 3. XmlParserByStAX stAX = new XmlParserByStAX(path); stAX.parser(); out(document.getAreaModels().size()); out(sax.getAreaModels().size()); out(stAX.getAreaModels().size()); // document.test(); // stAX.test(); // sax.test(); }); }
对了,把area.xml文件也分享出来:本地下载