HTML to TXT library that mimics the output of “lynx -dump”?

Not sure what you mean by lynx style so I might be completely off by submitting this (if so please excuse me).

Up vote 0 down vote favorite share g+ share fb share tw.

The problem is really that specific. Title says it all: I need a library in java that can take HTML content and generate text in the same format that is generated by the Linux lynx program. Thank you very much!

That's it. Reasons below if you care. I need to expose data provided by 3rd party servers to end users on Android.

Data format is ancient, in badly formatted HTML, so much that I've tried reading it using java and it fails occasionally (unacceptable). It is also growing every month (preinstall ruled out) and I can't convince them to change to "modern" stuff (life would be great in XML etc.). Shortest route: I wrote a class to use the W3 html2txt service online (google search it).

It worked fine on the app until I got complains and noticed that the W3 service fails occasionally. It's not that big of a deal, but the black box logic expects the output to be in this "lynx like" text format. So I would like a library to do the conversion (HTML->TXT) in "lynx style" inside the app and avoid the outages in the W3 service.

And besides, the lynx output the probably the best I've seen, the most organized and neat. Are you guys aware of any? Java html android html-parsing lynx link|improve this question asked Nov 12 '10 at 2:15David2,3931926 93% accept rate.

Not sure what you mean by lynx style so I might be completely off by submitting this (if so please excuse me). I used some piece of code a while back to check HTML/XML files (at the time I was just priting it out in the logs InputStream in = context.getResources(). OpenRawResource(id); StringBuffer inLine = new StringBuffer(); InputStreamReader isr = new InputStreamReader(in); BufferedReader inRd = new BufferedReader(isr); String text; while ((text = inRd.readLine())!

= null) { inLine. Append(text); inLine. Append("\n"); } in.close(); return inLine.toString(); I hope it helps but I got the feeling you need something more complex :P.

Thank you for your answer. Yeah, I don't need to check. Actually, I need to work with defective HTML files, so checking the validity of the HTML is not really the point.

– David Nov 12 '10 at 3:05 I hate pressing ENTER here on stackoverflow... anyway... since my app contacts the W3 service every time it needs to convert the data that it gets from the servers, I was wondering if I could do this work "in house", so as to not depend on the W3 service. The lynx format is just a requirement. Thank you again.

– David Nov 12 '10 at 3:08.

After a year, I give up. Answer is: no way to handle that, no library in Java. At least for now.

I'm closing this. Thank you for your attention.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions