Converting UTF-8 to ISO-8859-1 in Java?

I'm not sure if there is a normalization routine in the standard library that will do this. I do not think conversion of "smart" quotes is handled by the standard Unicode normalizer routines - but don't quote me.

I'm not sure if there is a normalization routine in the standard library that will do this. I do not think conversion of "smart" quotes is handled by the standard Unicode normalizer routines - but don't quote me. The smart thing to do is to dump ISO-8859-1 and start using UTF-8.

That said, it is possible to encode any Unicode character into a HTML page encoded as ISO-8859-1. You can encode them using entity escape sequences as shown here: public class HtmlEncoder { public static final HtmlEncoder INSTANCE = new HtmlEncoder(); public void encode(CharSequence sequence, Appendable out) throws IOException { for (int I = 0; I Of(ch) == Character.UnicodeBlock. BASIC_LATIN) { out.

Append(ch); } else { int codepoint = Character. CodePointAt(sequence, i); // handle supplementary range chars I += Character. CharCount(codepoint) - 1; // emit entity out.

Append("&#x"); out. Append(Integer. ToHexString(codepoint)); out.

Append(";"); } } } } Usage: String foo = "This is Cyrillic Ya: \u044F\n" + "This is fraktur G: \uD835\uDD0A\n" + "This is a smart quote: \u201C"; StringBuilder sb = new StringBuilder(); HtmlEncoder.INSTANCE. Encode(foo, sb); System.out. Println(sb.toString()); Above, the character LEFT DOUBLE QUOTATION MARK ( U+201C “ ) is encoded as “.

A couple of other arbitrary characters are likewise encoded. Great care needs to be taken with this approach. If your text needs to be escaped for HTML, that needs to be done before the above code or the ampersands end up being escaped.

Works beautifully. Thank you! – Marcus Aug 13 '09 at 23:05.

Depending on your default encoding, following lines could cause problem, byte latin1 = sb.toString(). GetBytes("ISO-8859-1"); return new String(latin1); In Java, String/Char is always in UTF-16BE. Different encoding is only involved when you convert the characters to bytes.

Say your default encoding is UTF-8, the latin1 buffer is treated as UTF-8 and some sequence of Latin-1 may form invalid UTF-8 sequence and you will get?.

When you instanciate your String object, you need to indicate which encoding to use. So replace : return new String(latin1); by return new String(latin1, "ISO-8859-1").

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions