You can use TagSoup - it is a SAX Compliant parser that can clean malformed content such as HTML from generic web pages into well-formed XML.
You can use TagSoup - it is a SAX Compliant parser that can clean malformed content such as HTML from generic web pages into well-formed XML. This is bold, bold italic, italic, normal text gets correctly rewritten as: This is bold, bold italic, italic, normal text.
TagSoup is very good, especially if you have to parse crappy HTML – Pascal Thivent Sep 16 '09 at 14:59.
You can take a look at NekoHTML, a Java library that performs a best effort cleaning and tag balancing in your document. It is an easy way to parse a malformed HTML (or a non-valid XML) file. It is distributed under the Apache 2.0 license.
JTidy should let you do what you want. Usage is fairly straight forward, but parsing is configurable. E.g.
: InputStream in = ...; Tidy tidy = new Tidy(); // configure Tidy instance as required ... ... Document doc = tidy. ParseDOM(in, null); Element root = doc. GetDocumentElement(); The JavaDoc is hosted here.
HTML Parser seems to support conversion from HTML to XML. Then you can build a DOM tree using the usual Java toolchain.
There are several open source tools to parse HTML from Java. Check java-source.net/open-source/html-parsers Also you can check answers to this question: stackoverflow.com/questions/457684/readi... It is almost the same...
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.