I am currently modifying a piece of code and I am wondering
if the way the XML is formatted (tabs and spacing) will affect the way
in which it is parsed into the DocumentBuilderFactory class.
In essence the question is...can I pass a big long string with no spacing into the DocumentBuilderFactory or does it need to be formatted in some way? Thanks in advance, included below is the Class definition from Oracles website. Class DocumentBuilderFactory "Defines a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents. "
It should not affect the ability of the parser as long as the string is valid XML. Tabs and newlines are stripped out or ignored by parsers and are really for the aesthetics of the human reader.
Note you will have to pass in an input stream (StringBufferInputStream for example) to the DocumentBuilder as the string version of parse assumes it is a URI to the XML.
The documents will be different. Tabs and new
lines will be converted into text nodes. You can eliminate these using
the following method on DocumentBuilderFactory:
But in order for it to work you must set up your DOM parser to validate the content against a DTD or xml schema. Alternatively you could programmatically remove the extra whitespace yourself using something like the following:
The DocumentBuilder builds different DOM
objects for xml string with line feeds and xml string without line
feeds. Here is the code I tested:
|
How many children does the root have? => 4
null
A
null
B
But if the new
newlineChar
is removed from the StringBuilder,
the ouptput is:
How many children does the root have? => 2
A
B
This demonstrates that the DOM objects generated by DocumentBuilder are different.
There shouldn't be any effect regarding the
format of the XML-String, but I can remember a strange problem, as I
passed a long String to an XML parser. The paser was unable to parse a
XML-File as it was written all in one long line.
It may be better if you insert line-breaks, in that kind, that the lines wold not be longer than, lets say 1000 bytes.
But sadly i do neigther remember why that error occured nor which parser I took.
It may be better if you insert line-breaks, in that kind, that the lines wold not be longer than, lets say 1000 bytes.
But sadly i do neigther remember why that error occured nor which parser I took.
I think xml parsers ignores line feeds. It is
DocumentBuilder that builds different DOM objects depends on xml string
with or without line feeds
You are right, but I remember a Bug in an
XML-Api or Lib, that was unable to build the DOM, because of that
special implementation, did read only x bytes per line.
No comments:
Post a Comment