Ret.decode() tries implicitely to encode ret with the system encoding - in your case ascii.
Ret.decode() tries implicitely to encode ret with the system encoding - in your case ascii. If you explicitely encode the unicode string, you should be fine. There is a builtin encoding that does what you need: >>> u'X\xc3\xbcY\xc3\x9f'.
Encode('raw_unicode_escape'). Decode('utf-8') u'X\xfcY\xdf' I'd still try to get something sane out of the server, though.
Whew - don't need to use my scary thing. – Chris Morgan Nov 24 '10 at 13:37.
What you want is the encoding where Unicode code point X is encoded to the same byte value X. For code points inside 0-255 you have this in the latin-1 encoding: def double_decode(bstr): return bstr. Decode("utf-8").
Encode("latin-1"). Decode("utf-8").
Don't use this! Use @hop's solution. My nasty hack: (cringe!
But quietly. It's not my fault, it's the server developers' fault) def double_decode_unicode(s, encoding='utf-8'): return ''. Join(chr(ord(c)) for c in s.
Decode(encoding)). Decode(encoding) Then, >>> double_decode_unicode('X\xc3\x83\xc2\xbcY\xc3\x83\xc2\x9f') u'X\xfcY\xdf' >>> print _ XüYß.
Great question, by the way. A nasty situation. I hope someone else can come up with a neater solution than chr(ord(c)) to convert unicode to str, character by character... – Chris Morgan Nov 24 '10 at 13:30 f(char) for char in string cries for an encoding.
– hop Nov 24 '10 at 13:33 @hop: does it? How so? – Chris Morgan Nov 24 '10 at 13:37 transforming each character of string in sequence via some function is the very definition of encoding and decoding, that's how.
– hop Nov 24 '10 at 13:44 @hop: naturally, but as a solution this looks ghastly. Your . Encode('raw_unicode_escape') is much cleaner (quite aside from the fact that the unicode->str step of your solution is over six times as fast as mine).
– Chris Morgan Nov 24 '10 at 13:52.
Here's a little script that might help you, doubledecode. Py -- https://gist.github. Com/1282752.
What you want is the encoding where Unicode code point X is encoded to the same byte value X. For code points inside 0-255 you have this in the latin-1 encoding.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.