Using python 2.7, why does my unicode filename raise an IOError when calling file() on it?

Just guessing - you're pulling the filename from a program which is using a multibyte character set in the current default encoding, which is cp1252 for English versions of Windows. Ascii doesn't include any extended characters, which is why you get the error when you try to encode the string into Unicode using the Ascii encoding Edit: this answer has some information about encoding file names in the current Windows code page.

Just guessing - you're pulling the filename from a program which is using a multibyte character set in the current default encoding, which is cp1252 for English versions of Windows. Ascii doesn't include any extended characters, which is why you get the error when you try to encode the string into Unicode using the Ascii encoding. Edit: this answer has some information about encoding file names in the current Windows code page.

– Fantasizer Jan 17 at 3:50 1 Mark is correct, Winamp uses cp1521, which is the default in Windows. In fact, it probably doesn't use any encoding at all, but just returns the filename as it gets it from Windows. – Lennart Regebro Jan 17 at 20:34.

Use os.listdir() on the directory to see what the filename is, encoded. Then compare that to what you get when you do filename. Encode('cp1252').

There should be a difference, and that should tell you what is wrong. The only real problem I can think of is that something gets decoded twice. You could have normalization problems too, but that seems unlikely in this case.

Ok, so os. Listdir("somepath") gives "08 - Muse - I Belong To You - Mon C\x9cur S'ouvre \xc0 Ta Voix. Mp3", os.

Listdir(u"somepath") gives u"08 - Muse - I Belong To You - Mon C\u0153ur S'ouvre \xc0 Ta Voix. Mp3", and filename. Encode("cp1252") raises UnicodeEncodeError: 'charmap' codec can't encode character u'\x9c' in position 76: character maps to – Fantasizer Jan 17 at 0:19 OK, \x9c is quite correctly an Å“ in cp1252.

When you use a unicode path, this gets correctly decoded to Unicode. Encoding the string u"08 - Muse - I Belong To You - Mon C\u0153ur S'ouvre \xc0 Ta Voix. Mp3" with cp1252 works.

What is "filename" when you get the encode error? Is it a string or unicode? You shouldn't encode strings.

– Lennart Regebro Jan 17 at 7:07 filename = u"C:\\Users\\Felix\\Music\\Muse\\The Resistance\\08 - Muse - I Belong To You - Mon C\x9cur S'ouvre \xc0 Ta Voix. Mp3" when doing filename. Encode("cp1252"), so not a string.

– Fantasizer Jan 17 at 14:19 1 Aha. Now we are getting somewhere. Notice how your unicode string that fails have '\x9c' in it.As you see above the correct unicode character for that is \u0153.

So this unicode string is incorrect. How do you arrive to that filename? Is it before of after your example codes string manglings?

In other words, what does win32api. GetFullPathName(memoryBuffer.raw. Split("\x00")0) return?

– Lennart Regebro Jan 17 at 15:26 1 @Fantasizer: No, \u0153 is not an encoding. That's the Unicode character. \x9c is an encoding of that Unicode character, and it's the encoding of that character in CP-1252.

So you are, correctly, getting the string encoded in cp1252, as expected. So, next question: Why do you not return that string? Anyway, if you want to convert that string to Unicode, the correct code is: filename.

Decode('cp1252'), or unicode(filename, 'cp1252') would work too. That's it. – Lennart Regebro Jan 17 at 20:30.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Using python 2.7, why does my unicode filename raise an IOError when calling file() on it?

Related Questions

UTS #10 Unicode Collation Algorithm is defined with a particular base version of the Unicode Standard, but I am using characters from a later version of Unicode. What shall I do?

Why does Django file upload throws a IOError (2, 'No such file or directory') exception?

How to unpack filename.part1.rar, filename.part2.rar, etc. in Windows Vista?

Did you notice that Mahalo raise the tip for questions to $.50! Hey, we got a raise. Thank you Mahalo! Next step, a raise to $1.00?

Python - converting wide-char strings from a binary file to Python unicode strings?

Properly handling IOError thrown by logging.config.fileConfig?