Dealing with wacky encodings in Python?

That's just UTF-8 data Use decode to convert it into unicode D\xc3\xa9cor'. Decode('utf-8') u'D\xe9cor You can perform an additional string-escape decode for the D\\xc3\\xa9cor case D\xc3\xa9cor'. Decode('string-escape').

Decode('utf-8') u'D\xe9cor' >>> 'D\\xc3\\xa9cor'. Decode('string-escape'). Decode('utf-8') u'D\xe9cor' >>> u'D\\xc3\\xa9cor'.

Decode('string-escape'). Decode('utf-8') u'D\xe9cor To handle the 2nd case as well, you need to detect if the input is unicode and convert it into a str first def conv(s): ... if isinstance(s, unicode): ... s = s. Encode('iso-8859-1') ... return s.

Decode('string-escape'). Decode('utf-8') ... >>> map(conv, u'D\\xc3\\xa9cor', u'D\xc3\xa9cor', 'D\\xc3\\xa9cor', 'D\xc3\xa9cor') u'D\xe9cor', u'D\xe9cor', u'D\xe9cor', u'D\xe9cor'.

That's just UTF-8 data. Use . Decode to convert it into unicode.

>>> 'D\xc3\xa9cor'. Decode('utf-8') u'D\xe9cor' You can perform an additional string-escape decode for the 'D\\xc3\\xa9cor' case. >>> 'D\xc3\xa9cor'.

Decode('string-escape'). Decode('utf-8') u'D\xe9cor' >>> 'D\\xc3\\xa9cor'. Decode('string-escape').

Decode('utf-8') u'D\xe9cor' >>> u'D\\xc3\\xa9cor'. Decode('string-escape'). Decode('utf-8') u'D\xe9cor' To handle the 2nd case as well, you need to detect if the input is unicode, and convert it into a str first.

>>> def conv(s): ... if isinstance(s, unicode): ... s = s. Encode('iso-8859-1') ... return s. Decode('string-escape').

Decode('utf-8') ... >>> map(conv, u'D\\xc3\\xa9cor', u'D\xc3\xa9cor', 'D\\xc3\\xa9cor', 'D\xc3\xa9cor') u'D\xe9cor', u'D\xe9cor', u'D\xe9cor', u'D\xe9cor'.

It works for that particular case. However: u'D\\xc3\\xa9cor' --> u'D\\xc3\\xa9cor', u'D\xc3\xa9cor' --> UnicodeEncodeError, 'D\\xc3\\xa9cor' --> u'D\\xc3\\xa9cor', – Tyson Jun 7 '10 at 6:06 @Tyson: It can't work for all cases. How can you make sure 'D:\\xc3\\xa9\\xc3xa9.

Png' is really a UTF-8 encoded string, not a Windows path name? – KennyTM Jun 7 '10 at 6:09 I can assume that none of the data I'm receiving are Windows pathnames. – Tyson Jun 7 '10 at 6:17 @Tyson: In the comment you say UnicodeEncodeError.

Notice that it's **En**code, not **De**code. Out of curiosity: Are you printing it out inside a loop (in a console or window)? It's just a wild guess on a Monday morning... – exhuma Jun 7 '10 at 6:40 For debugging, yeah, I was tossing it out to stdout.

– Tyson Jun 7 '10 at 6:42.

Write adapters that know which transformations should be applied to their sources. >>> 'D\xc3\xa9cor'. Decode('utf-8') u'D\xe9cor' >>> 'D\\xc3\\xa9cor'.

Decode('string-escape'). Decode('utf-8') u'D\xe9cor.

Here's the solution I came to before I saw KennyTM's proper, more concise soltion: def ensure_unicode(string): try: string = string. Decode('string-escape'). Decode('string-escape') except UnicodeEncodeError: string = string.

Encode('raw_unicode_escape') return unicode(string, 'utf-8').

The Encoding situation in 1.9.x is not that much better than in python. It is also basically a "Use UTF-8 or suffer" attitude now. It was better in 1.8.x because people who did not NEED Unicode did not HAVE to deal with it.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Dealing with wacky encodings in Python?

Related Questions

Error 'incompatible character encodings: ASCII-8BIT and UTF-8' due to 8-bit encoding of cookies (Rails 3 and Ruby 1.9)?

Hpricot encodings in ruby 1.9?

Rails 2.3.5, Ruby 1.9, SQLite 3 incompatible character encodings: UTF-8 and ASCII-8BIT?

Rails 3 - (incompatible character encodings: UTF-8 and ASCII-8BIT)?

What do you prefer -- dealing with a crisis or dealing with day-to-day rut?

What are some wacky conspiracy theories circling the media today?