Double-decoding unicode in python?

Ret.decode() tries implicitely to encode ret with the system encoding - in your case ascii.

Ret.decode() tries implicitely to encode ret with the system encoding - in your case ascii. If you explicitely encode the unicode string, you should be fine. There is a builtin encoding that does what you need: >>> u'X\xc3\xbcY\xc3\x9f'.

Encode('raw_unicode_escape'). Decode('utf-8') u'X\xfcY\xdf' I'd still try to get something sane out of the server, though.

Whew - don't need to use my scary thing. – Chris Morgan Nov 24 '10 at 13:37.

What you want is the encoding where Unicode code point X is encoded to the same byte value X. For code points inside 0-255 you have this in the latin-1 encoding: def double_decode(bstr): return bstr. Decode("utf-8").

Encode("latin-1"). Decode("utf-8").

Don't use this! Use @hop's solution. My nasty hack: (cringe!

But quietly. It's not my fault, it's the server developers' fault) def double_decode_unicode(s, encoding='utf-8'): return ''. Join(chr(ord(c)) for c in s.

Decode(encoding)). Decode(encoding) Then, >>> double_decode_unicode('X\xc3\x83\xc2\xbcY\xc3\x83\xc2\x9f') u'X\xfcY\xdf' >>> print _ XüYß.

Great question, by the way. A nasty situation. I hope someone else can come up with a neater solution than chr(ord(c)) to convert unicode to str, character by character... – Chris Morgan Nov 24 '10 at 13:30 f(char) for char in string cries for an encoding.

– hop Nov 24 '10 at 13:33 @hop: does it? How so? – Chris Morgan Nov 24 '10 at 13:37 transforming each character of string in sequence via some function is the very definition of encoding and decoding, that's how.

– hop Nov 24 '10 at 13:44 @hop: naturally, but as a solution this looks ghastly. Your . Encode('raw_unicode_escape') is much cleaner (quite aside from the fact that the unicode->str step of your solution is over six times as fast as mine).

– Chris Morgan Nov 24 '10 at 13:52.

Here's a little script that might help you, doubledecode. Py -- https://gist.github. Com/1282752.

What you want is the encoding where Unicode code point X is encoded to the same byte value X. For code points inside 0-255 you have this in the latin-1 encoding.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions


Thank You!
send