I have UTF-8 - but still get “Invalid byte 1 of 1-byte UTF-8 sequence”?

If your database contains only a single byte (with value 0xC4) then you aren't using UTF-8 encoding.

If your database contains only a single byte (with value 0xC4) then you aren't using UTF-8 encoding. The character "LATIN CAPITAL LETTER A WITH DIAERESIS" has a code-point value U+00C4, but UTF-8 can't encode that in a single byte. If you check the third column you'll see that UTF-8 encodes that as 0xC3 84 (two bytes).

Please read Joel's article "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses! )" for more info. EDIT: Christian found the answer himself; turned out it was a problem in the Cocoon 3 SAX component (I guess it's the alpha 3 version).

It turns out that if you pass an XML as a String into the XMLGenerator class, something will go wrong during SAX parsing causing this mess. I looked up the code to find the actual problem in Cocoon-stax: if (XMLGenerator.this.logger.isDebugEnabled()) { XMLGenerator.this.logger. Debug("Using a string to produce SAX events."); } XMLUtils.

ToSax(new ByteArrayInputStream(this.xmlString.getBytes()), XMLGenerator.this.getSAXConsumer(); As you can see, this will create a Byte array with the JRE's encoding which will then fail to parse. This is because the XML declares itself to be UTF-8 whereas the data is now in bytes again, and likely using your Windows codepage.As a workaround, one can use the following: new org.apache.cocoon.sax.component. XMLGenerator(xmlInput.

GetBytes("UTF-8"), "UTF-8"); This will trigger the right internal actions (as Christian found out by experimenting with the API). I've opened an issue in Apache's bug tracker. EDIT 2: The issue is fixed and will be included in an upcoming release.

The C4 you see on that page refers to the unicode code point, U+00C4. The byte sequence used to represent such a code point in UTF-8 is NOT "\xC4". What you want is what's in the UTF-8 (hex.) column, namely "\xC3\x84".

Therefore, your data is not in UTF-8. You can read about how data is encoded in UTF-8 here.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions