Likely, your problem is that you parsed it okay, and now you're trying to print the contents of the XML and you can't because theres some foreign Unicode characters. Try to encode your unicode string as ascii first.
Likely, your problem is that you parsed it okay, and now you're trying to print the contents of the XML and you can't because theres some foreign Unicode characters. Try to encode your unicode string as ascii first: unicodeData. Encode('ascii', 'ignore') the 'ignore' part will tell it to just skip those characters.
From the python docs: >>> you = unichr(40960) + u'abcd' + unichr(1972) >>> u. Encode('utf-8') '\xea\x80\x80abcd\xde\xb4' >>> u. Encode('ascii') Traceback (most recent call last): File "", line 1, in?
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128) >>> u. Encode('ascii', 'ignore') 'abcd' >>> u. Encode('ascii', 'replace') '?
Abcd? ' >>> u. Encode('ascii', 'xmlcharrefreplace') '?
Abcd? ' You might want to read this article: http://www.joelonsoftware.com/articles/Unicode.html, which I found very useful as a basic tutorial on what's going on. After the read, you'll stop feeling like you're just guessing what commands to use (or at least that happened to me).
Perfect, it removed the 's but at least itll print. Thanks! – Alex B Jul 11 '10 at 19:14 I'm trying to make the following string safe: ' foo “bar bar†df'(note the curly quotes), but the above still fails for me.
– Rosarch Jul 11 '10 at 19:26 @Rosarch: Fails how? Same error? And which error-handling rule did you use?
– Scott Stafford Jul 11 '10 at 20:17 @Rosarch, your problem is probably earlier. Try this code: # -*- coding: latin-1 -*- you = u' foo “bar bar†df' print u. Encode('ascii', 'ignore') For you, it was probably converting your string INTO unicode given the encoding you specified for the python scrip that threw the error.
– Scott Stafford Jul 11 '10 at 20:48 I went ahead and made my issue into its own question: stackoverflow. Com/questions/3224427/… – Rosarch Jul 11 '10 at 21:12.
You can use something of the form s. Decode('utf-8') which will convert a UTF-8 encoded bytestring into a Python Unicode string. But the exact procedure to use depends on exactly how you load and parse the XML file, e.g. If you don't ever access the XML string directly, you might have to use a decoder object from the codecs module.
It's already encoded in UTF-8 The error is specifically: myStrings = deque(u'Dorf and Svoboda\u2019s text builds on the str... and Computer Engineering\u2019s subdisciplines. ') The string is in UTF-8 as you can see, but it gets mad about the internal '\u2019' – Alex B Jul 11 '10 at 19:09 Oh, OK, I thought you were having a different problem. – David Zaslavsky Jul 11 '10 at 19:25 2 @Alex B: No, the string is Unicode, not Utf-8.To encode it as Utf-8 use '...'.
Encode('utf-8') – sth Jul 11 '10 at 19:33.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.