Decoding if it's not unicode?

You could just try decoding it with the 'utf-8' codec, and if that does not work, then return the object.

You could just try decoding it with the 'utf-8' codec, and if that does not work, then return the object. Def myfunction(text): try: text = unicode(text, 'utf-8') except TypeError: return text print(myfunction(u'cer\xf3n')) # cerÃ³n When you take a unicode object and call its decode method with the 'utf-8' codec, Python first tries to convert the unicode object to a string object, and then it calls the string object's decode('utf-8') method. Sometimes the conversion from unicode object to string object fails because Python2 uses the ascii codec by default.So, in general, never try to decode unicode objects.

Or, if you must try, trap it in a try..except block. There may be a few codecs for which decoding unicode objects works in Python2 (see below), but they have been removed in Python3. See this Python bug ticket for an interesting discussion of the issue, and also Guido van Rossum's blog: "We are adopting a slightly different approach to codecs: while in Python 2, codecs can accept either Unicode or 8-bits as input and produce either as output, in Py3k, encoding is always a translation from a Unicode (text) string to an array of bytes, and decoding always goes the opposite direction.

This means that we had to drop a few codecs that don't fit in this model, for example rot13, base64 and bz2 (those conversions are still supported, just not through the encode/decode API).

I'm not aware of any good way to avoid the isinstance check in your function, but maybe someone else will be. I can point out that the two weirdnesses you cite are because you're doing something that doesn't make sense: Trying to decode into Unicode something that's already decoded into Unicode. The first should instead look like this, which decodes the UTF-8 encoding of that string into the Unicode version: >>> 'cer\xc3\xb3n'.

Decode('utf-8') u'cer\xf3n' And your second should look like this (not using a u'' Unicode string literal): >>> unicode('hello', 'utf-8') u'hello.

The weird think is that unicode objects have a decode method. Even weirder is that the method works sometimes and sometimes doesn't. Same for unicode() calls.

– Manuel Ceron Oct 4 '10 at 18:13 Well, there's definitely some strangeness to the API, since a call to unicode with a Unicode string and no encoding specified will always work while a call with any encoding specified will always fail. – Will McCutchen Oct 4 '10 at 19:47.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Decoding if it's not unicode?

Related Questions

UTS #10 Unicode Collation Algorithm is defined with a particular base version of the Unicode Standard, but I am using characters from a later version of Unicode. What shall I do?

Python decoding Unicode is not supported?

Decoding not reversing unicode encoding in Django/Python?

Unicode Encoding and decoding issues in QRCode?

Unicode issue, correctly decoding/encoding string in python?

Double-decoding unicode in python?