Dealing with wacky encodings in Python?

That's just UTF-8 data Use decode to convert it into unicode D\xc3\xa9cor'. Decode('utf-8') u'D\xe9cor You can perform an additional string-escape decode for the D\\xc3\\xa9cor case D\xc3\xa9cor'. Decode('string-escape').

Decode('utf-8') u'D\xe9cor' >>> 'D\\xc3\\xa9cor'. Decode('string-escape'). Decode('utf-8') u'D\xe9cor' >>> u'D\\xc3\\xa9cor'.

Decode('string-escape'). Decode('utf-8') u'D\xe9cor To handle the 2nd case as well, you need to detect if the input is unicode and convert it into a str first def conv(s): ... if isinstance(s, unicode): ... s = s. Encode('iso-8859-1') ... return s.

Decode('string-escape'). Decode('utf-8') ... >>> map(conv, u'D\\xc3\\xa9cor', u'D\xc3\xa9cor', 'D\\xc3\\xa9cor', 'D\xc3\xa9cor') u'D\xe9cor', u'D\xe9cor', u'D\xe9cor', u'D\xe9cor'.

That's just UTF-8 data. Use . Decode to convert it into unicode.

>>> 'D\xc3\xa9cor'. Decode('utf-8') u'D\xe9cor' You can perform an additional string-escape decode for the 'D\\xc3\\xa9cor' case. >>> 'D\xc3\xa9cor'.

Decode('string-escape'). Decode('utf-8') u'D\xe9cor' >>> 'D\\xc3\\xa9cor'. Decode('string-escape').

Decode('utf-8') u'D\xe9cor' >>> u'D\\xc3\\xa9cor'. Decode('string-escape'). Decode('utf-8') u'D\xe9cor' To handle the 2nd case as well, you need to detect if the input is unicode, and convert it into a str first.

>>> def conv(s): ... if isinstance(s, unicode): ... s = s. Encode('iso-8859-1') ... return s. Decode('string-escape').

Decode('utf-8') ... >>> map(conv, u'D\\xc3\\xa9cor', u'D\xc3\xa9cor', 'D\\xc3\\xa9cor', 'D\xc3\xa9cor') u'D\xe9cor', u'D\xe9cor', u'D\xe9cor', u'D\xe9cor'.

It works for that particular case. However: u'D\\xc3\\xa9cor' --> u'D\\xc3\\xa9cor', u'D\xc3\xa9cor' --> UnicodeEncodeError, 'D\\xc3\\xa9cor' --> u'D\\xc3\\xa9cor', – Tyson Jun 7 '10 at 6:06 @Tyson: It can't work for all cases. How can you make sure 'D:\\xc3\\xa9\\xc3xa9.

Png' is really a UTF-8 encoded string, not a Windows path name? – KennyTM Jun 7 '10 at 6:09 I can assume that none of the data I'm receiving are Windows pathnames. – Tyson Jun 7 '10 at 6:17 @Tyson: In the comment you say UnicodeEncodeError.

Notice that it's **En**code, not **De**code. Out of curiosity: Are you printing it out inside a loop (in a console or window)? It's just a wild guess on a Monday morning... – exhuma Jun 7 '10 at 6:40 For debugging, yeah, I was tossing it out to stdout.

– Tyson Jun 7 '10 at 6:42.

Write adapters that know which transformations should be applied to their sources. >>> 'D\xc3\xa9cor'. Decode('utf-8') u'D\xe9cor' >>> 'D\\xc3\\xa9cor'.

Decode('string-escape'). Decode('utf-8') u'D\xe9cor.

Here's the solution I came to before I saw KennyTM's proper, more concise soltion: def ensure_unicode(string): try: string = string. Decode('string-escape'). Decode('string-escape') except UnicodeEncodeError: string = string.

Encode('raw_unicode_escape') return unicode(string, 'utf-8').

The Encoding situation in 1.9.x is not that much better than in python. It is also basically a "Use UTF-8 or suffer" attitude now. It was better in 1.8.x because people who did not NEED Unicode did not HAVE to deal with it.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions