Possible to parse a HTML document and build a DOM tree(java)?

You can use TagSoup - it is a SAX Compliant parser that can clean malformed content such as HTML from generic web pages into well-formed XML.

You can use TagSoup - it is a SAX Compliant parser that can clean malformed content such as HTML from generic web pages into well-formed XML. This is bold, bold italic, italic, normal text gets correctly rewritten as: This is bold, bold italic, italic, normal text.

TagSoup is very good, especially if you have to parse crappy HTML – Pascal Thivent Sep 16 '09 at 14:59.

You can take a look at NekoHTML, a Java library that performs a best effort cleaning and tag balancing in your document. It is an easy way to parse a malformed HTML (or a non-valid XML) file. It is distributed under the Apache 2.0 license.

JTidy should let you do what you want. Usage is fairly straight forward, but parsing is configurable. E.g.

: InputStream in = ...; Tidy tidy = new Tidy(); // configure Tidy instance as required ... ... Document doc = tidy. ParseDOM(in, null); Element root = doc. GetDocumentElement(); The JavaDoc is hosted here.

HTML Parser seems to support conversion from HTML to XML. Then you can build a DOM tree using the usual Java toolchain.

There are several open source tools to parse HTML from Java. Check java-source.net/open-source/html-parsers Also you can check answers to this question: stackoverflow.com/questions/457684/readi... It is almost the same...

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Possible to parse a HTML document and build a DOM tree(java)?

Related Questions

PHP simple html-dom parse, how parse javascript?

O que voces acham da musica Dom Dom Dom?

Select DOM-element of HTML-control and forward its DOM-path to Actionscript method?

Looking for java html parser like simple html dom in PHP?

How should parse with PHP (simple html dom parser) background images and other images of webpage?

JQuery Parse Twitter jSON but not Parse my same PHP document?