Friday, February 15, 2013

Xml document to DOM object using DocumentBuilderFactory

I am currently modifying a piece of code and I am wondering if the way the XML is formatted (tabs and spacing) will affect the way in which it is parsed into the DocumentBuilderFactory class.
In essence the question is...can I pass a big long string with no spacing into the DocumentBuilderFactory or does it need to be formatted in some way?
Thanks in advance, included below is the Class definition from Oracles website.
Class DocumentBuilderFactory
"Defines a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents. "
").append(newlineChar).append("tagB").append("").append(""); DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); InputStream xmlInput = new ByteArrayInputStream(sb.toString().getBytes()); Element documentRoot = builder.parse(xmlInput).getDocumentElement(); NodeList nodes = documentRoot.getChildNodes(); System.out.println("How many children does the root have? => "nodes.getLength()); for(int index = 0; index < nodes.getLength(); index++){ System.out.println(nodes.item(index).getLocalName()); } Output:
How many children does the root have? => 4
null
A
null
B

But if the new newlineChar is removed from the StringBuilder, the ouptput is:
How many children does the root have? => 2
A
B


This demonstrates that the DOM objects generated by DocumentBuilder are different.
 
 
There shouldn't be any effect regarding the format of the XML-String, but I can remember a strange problem, as I passed a long String to an XML parser. The paser was unable to parse a XML-File as it was written all in one long line.
It may be better if you insert line-breaks, in that kind, that the lines wold not be longer than, lets say 1000 bytes.
But sadly i do neigther remember why that error occured nor which parser I took.
 
 
 
I think xml parsers ignores line feeds. It is DocumentBuilder that builds different DOM objects depends on xml string with or without line feeds  

You are right, but I remember a Bug in an XML-Api or Lib, that was unable to build the DOM, because of that special implementation, did read only x bytes per line.
 
   

No comments: