Error resolving wikipedia url with unicode character with Java URL?

The correct URI is URL1. Many browsers display literals instead of percent-encoded escape sequences. This is considered to be more user-friendly. However, correctly encoded URIs must use percent encoding for characters not permitted in the path part: path = path-abempty ; begins with "/" or is empty / path-absolute ; begins with "/" but not "//" / path-noscheme ; begins with a non-colon segment / path-rootless ; begins with a segment / path-empty ; zero characters path-abempty = *( "/" segment ) path-absolute = "/" segment-nz *( "/" segment ) path-noscheme = segment-nz-nc *( "/" segment ) path-rootless = segment-nz *( "/" segment ) path-empty = 0 segment = *pchar segment-nz = 1*pchar segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) ; non-zero-length segment without any colon ":" pchar = unreserved / pct-encoded / sub-delims / ":" / "@" pct-encoded = "%" HEXDIG HEXDIG unreserved = ALPHA / DIGIT / "-" / ".

" / "_" / "~" sub-delims = "! " / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" The URI class can help you with such sequences: Characters in the other category are permitted wherever RFC 2396 permits escaped octets, that is, in the user-information, path, query, and fragment components, as well as in the authority component if the authority is registry-based. This allows URIs to contain Unicode characters beyond those in the US-ASCII character set.

String literal = "http://en.wikipedia.org/wiki/1992\u201393_UE_Lleida_seasonnow"; URI uri = new URI(literal); System.out. Println(uri.toASCIIString()); You can read more about URI encoding here.

It's not really strange, it's standard use of IRIs. The IRI: en.wikipedia.org/wiki/2009–10_UE_... which includes a Unicode en-dash, is equivalent to the URI: en.wikipedia.org/wiki/2009%E2%80%9310_UE... You can include the IRI form in links and it will work in modern browsers. But many network libraries—including Java's, along with older browsers—require ASCII-only URIs.

(Modern browsers will still show the pretty IRI version in the address bar, even if you linked to it with the encoded URI version. ) To convert an IRI to a URI in general, you use the IDN algorithm on the hostname, and URL-encode any other non-ASCII characters as UTF-8 bytes.In your case, it should be: String urlencoded= URLEncoder. Encode(x, "utf-8").

Replace("+", "%20"); URL url= new URL("http://en.wikipedia.org/wiki/"+urlencoded); Note: replacing + with %20 is necessary to make values of x with spaces in work. URLEncoder does application/x-www-form-urlencoded-encoding as using in query strings. But in a path-URL-segment like this, the +-means-space rule does not apply.

Spaces in paths must be encoded with normal-URL-encoding, to %20. Then again... in the specific case of Wikipedia, for readability, they replace spaces with underlines instead, so you'd probably be better off replacing "+" with "_" directly. The %20 version will still work because they redirect from there to the underline version.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions