Java - Parsing HTML - get text?

That site is internationalized with German as default. You need to tell the server what language you're accepting by specifying the desired ISO 639-1 language code in the Accept-Language request header URLConnection connection = new URL(url).openConnection(); connection. SetRequestProperty("Accept-Language", "en"); InputStream input = connection.getInputStream() Unrelated to the concrete problem, may I suggest you to have a look at Jsoup as a HTML parser?

It's much more convenient with its jQuery-like CSS selector syntax and therefore much less bloated than your attempt as far: String url = "http://www.wippro.at/module/gallery/index.php?limitstart=0&picno=0&gallery_key=92"; Document document = Jsoup. Connect(url). Header("Accept-Language", "en").get(); String title = document.

Select("#redx_gallery_pic_title").text(); System.out. Println(title); // Beech, glazing V3 That's all.

That site is internationalized with German as default. You need to tell the server what language you're accepting by specifying the desired ISO 639-1 language code in the Accept-Language request header. URLConnection connection = new URL(url).openConnection(); connection.

SetRequestProperty("Accept-Language", "en"); InputStream input = connection.getInputStream(); // ... Unrelated to the concrete problem, may I suggest you to have a look at Jsoup as a HTML parser? It's much more convenient with its jQuery-like CSS selector syntax and therefore much less bloated than your attempt as far: String url = "http://www.wippro.at/module/gallery/index.php?limitstart=0&picno=0&gallery_key=92"; Document document = Jsoup. Connect(url).

Header("Accept-Language", "en").get(); String title = document. Select("#redx_gallery_pic_title").text(); System.out. Println(title); // Beech, glazing V3 That's all.

Thank you very much – Bogdan S Aug 3 at 19:46 You're welcome. – BalusC Aug 3 at 19:47 But, if I want to get the text for romanian language? If I put "ro" instead of "en" I don't get the special characters.

– Bogdan S Aug 3 at 19:55 That's because you're relying on the platform default encoding to read the response body. You need to use the other constructor of InputStreamReader which takes the charset as second argument and specify it with "UTF-8". Jsoup takes this fully transparently into account by the way :) – BalusC Aug 3 at 19:56 2 The problem is there where you displayed or saved the character.

Are you displaying it in an IDE like Eclipse using System.out.println()? If so, set the Eclipse console encoding by Window > Preferences > General > Workspace and then set Text file encoding to UTF-8. Otherwise it'll use the platform default one.

For more hints, see balusc.blogspot. Com/2009/05/… – BalusC Aug 3 at 20:30.

Here is a list of Java HTML Parsers. Look around until you see an API that suits your fancy and use that instead.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions