What could go wrong if I convert ANSI encoded files to UTF-8?

It depends on how much of the text in your codebase is using characters outside the ASCII range of 0..127.

It depends on how much of the text in your codebase is using characters outside the ASCII range of 0..127. You might want to scan for those first, to see how much impact it will have. If your codebase is primarily in English, then you probably don't have much to worry about.

Just to point out that it's not just the codebase he needs to worry about; if there's any dynamic content in the database, that would need to be converted as well. – Simon Howard Nov 29 '08 at 17:31 That's not going to affect merging / comparing in TFS, though; however, you're quite correct wrt. The composition of pages using data from the DB etc.– Barry Kelly Nov 29 '08 at 21:28.

I would write a Python script for fn in os. Listdir(srcdir): data = open(srcdir+"\\"+fn, "rb").read(). Decode("windows-1252") data = data.

Replace("charset=windows-1252", "charset=utf-8") open(srcdir+"\\"+fn, "wb"). Write(data. Encode("utf-8")) The update of the charset assumes that this specific string won't occur elsewhere; you can make it more robust by checking for a longer string, checking whether the old text actually exists in the file, doing proper XML parsing, etc.You might need to put an UTF-8 signature in front of the UTF-8-encoded data; you find one in codecs.

BOM_UTF8 I don't know what consequence this change has for TFS.

Something useful I just discovered is that you can right-click on a file on Source Control Explorer, then choose Properties. You can then see/modify the encoding as far as TFS is concerned.

Pick a file that has a character above the 0-127 ASCII range. Open that with notepad, choose Save As and pick UTF-8 for the encoding. Then see if the character is successfully converted.To automate the procedure, you could write an application that converts all the files from ASCII to UTF-8, using 1252 as code page.

If you don't have characters above 127, you do not need to worry about all these.

This is not necessarily true. I don't know about ASP. Net but we do all our php coding here in Ansi and serve the pages in UTF8.

All our database information is stored in UTF8 as well.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions