Double-decoding unicode in python?

Ret.decode() tries implicitely to encode ret with the system encoding - in your case ascii.

Ret.decode() tries implicitely to encode ret with the system encoding - in your case ascii. If you explicitely encode the unicode string, you should be fine. There is a builtin encoding that does what you need: >>> u'X\xc3\xbcY\xc3\x9f'.

Encode('raw_unicode_escape'). Decode('utf-8') u'X\xfcY\xdf' I'd still try to get something sane out of the server, though.

Whew - don't need to use my scary thing. – Chris Morgan Nov 24 '10 at 13:37.

What you want is the encoding where Unicode code point X is encoded to the same byte value X. For code points inside 0-255 you have this in the latin-1 encoding: def double_decode(bstr): return bstr. Decode("utf-8").

Encode("latin-1"). Decode("utf-8").

Don't use this! Use @hop's solution. My nasty hack: (cringe!

But quietly. It's not my fault, it's the server developers' fault) def double_decode_unicode(s, encoding='utf-8'): return ''. Join(chr(ord(c)) for c in s.

Decode(encoding)). Decode(encoding) Then, >>> double_decode_unicode('X\xc3\x83\xc2\xbcY\xc3\x83\xc2\x9f') u'X\xfcY\xdf' >>> print _ XüYß.

Great question, by the way. A nasty situation. I hope someone else can come up with a neater solution than chr(ord(c)) to convert unicode to str, character by character... – Chris Morgan Nov 24 '10 at 13:30 f(char) for char in string cries for an encoding.

– hop Nov 24 '10 at 13:33 @hop: does it? How so? – Chris Morgan Nov 24 '10 at 13:37 transforming each character of string in sequence via some function is the very definition of encoding and decoding, that's how.

– hop Nov 24 '10 at 13:44 @hop: naturally, but as a solution this looks ghastly. Your . Encode('raw_unicode_escape') is much cleaner (quite aside from the fact that the unicode->str step of your solution is over six times as fast as mine).

– Chris Morgan Nov 24 '10 at 13:52.

Here's a little script that might help you, doubledecode. Py -- https://gist.github. Com/1282752.

What you want is the encoding where Unicode code point X is encoded to the same byte value X. For code points inside 0-255 you have this in the latin-1 encoding.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions