Jsoup: Selecting HTML Between Different Classes?

Think one of the problems is you're asking for Elements and not Nodes. Text nodes are Nodes and not Elements Try this: package grimbo. Test; import java.util.

ArrayList; import java.util. Iterator; import java.util. List; import org.jsoup.

Jsoup; import org.jsoup.nodes. Document; import org.jsoup.nodes. Element; import org.jsoup.nodes.

Node; import org.jsoup.select. Elements; public class StackOverflow { public static void main(String args) { String html = "message-1\n \n \n quoted message\n \n I am a response\n \n \n"; html += "message-2\n \n \n quoted message\n \n I am a response\n \n \n"; Document doc = Jsoup. Parse(html); handleQuotedMessages(doc.

Select(". Quoted-message")); } private static void handleQuotedMessages(Elements quotedMessages) { Element firstQuotedMessage = quotedMessages.first(); List siblings = firstQuotedMessage.siblingNodes(); List elementsBetween = new ArrayList(); Element currentQuotedMessage = firstQuotedMessage; for (int I = 1; I filterElements(String tagName, List nodes) { List els = new ArrayList(); for (Iterator it = nodes.iterator(); it.hasNext();) { Node n = it.next(); if (n instanceof Element) { Element el = (Element) n; if (el.tagName(). Equals(tagName)) { els.

Add(el); } } } return els; } private static void createQuotePost(Element quote, List elementsBetween) { System.out. Println("createQuotePost: " + quote); System.out. Println("createQuotePost: " + elementsBetween); List imgs = filterElements("img", elementsBetween); // handle imgs } }.

Think one of the problems is you're asking for Elements and not Nodes. Text nodes are Nodes and not Elements. Try this: package grimbo.

Test; import java.util. ArrayList; import java.util. Iterator; import java.util.

List; import org.jsoup. Jsoup; import org.jsoup.nodes. Document; import org.jsoup.nodes.

Element; import org.jsoup.nodes. Node; import org.jsoup.select. Elements; public class StackOverflow { public static void main(String args) { String html = "message-1\n \n \n quoted message\n \n I am a response\n \n \n"; html += "message-2\n \n \n quoted message\n \n I am a response\n \n \n"; Document doc = Jsoup.

Parse(html); handleQuotedMessages(doc. Select(". Quoted-message")); } private static void handleQuotedMessages(Elements quotedMessages) { Element firstQuotedMessage = quotedMessages.first(); List siblings = firstQuotedMessage.siblingNodes(); List elementsBetween = new ArrayList(); Element currentQuotedMessage = firstQuotedMessage; for (int I = 1; I filterElements(String tagName, List nodes) { List els = new ArrayList(); for (Iterator it = nodes.iterator(); it.hasNext();) { Node n = it.next(); if (n instanceof Element) { Element el = (Element) n; if (el.tagName().

Equals(tagName)) { els. Add(el); } } } return els; } private static void createQuotePost(Element quote, List elementsBetween) { System.out. Println("createQuotePost: " + quote); System.out.

Println("createQuotePost: " + elementsBetween); List imgs = filterElements("img", elementsBetween); // handle imgs } }.

It will sometimes contain images/links and I have a method already setup to parse those out and would hate to recode it all. – intelacer Sep 10 at 3:31 So you only want to retain the raw text nodes in your elementsBetween List? – Paul Grime Sep 10 at 9:31 Well, I have a method (we'll call it CreateMessage) that requires an Element.

Inside the method I use a . Select("img") to pull the urls for all the images out into an array. Then when displaying the message, it uses that list to put the images back.

Same for urls and such. If its a big pain, I could just recode it to take a string and work off that though. – intelacer Sep 10 at 12:55 I think I understand.

If s are special then in the createQuotePost method you can iterate the elementsBetween List and handle s specially. The problem with using . Select("img") would be that the s aren't descendants of the quoted message , so you would have to do something else.

Personally, a simple loop of elementsBetween would be more than adequate (which I've added to the answer). – Paul Grime Sep 10 at 13:06.

So, I've HTML code which has 8 tables. I'm trying to extract the contents from these tables. For row in soup('table')4.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions