Unicode, UTF, ASCII, ANSI format differences?

Going down your list: "Unicode encoding" is more properly known as UTF-16: 2 bytes per "code point". This is the native format of strings in .NET. Values outside the Basic Multilingual Plane (BMP) are encoded as surrogate pairs.(These are relatively rarely used - which is a good job, as very few developers get them right, I suspect.

I very much doubt that I do. ) "Unicode" is really the character set - it's unfortunate that the term is also used as a synonym for UTF-16 in . NET and various Windows applications.

UTF-8: Variable length encoding, 1-4 bytes covers every current character. ASCII values are encoded as ASCII. UTF-7: Usually used for mail encoding.

Chances are if you think you need it and you're not doing mail, you're wrong. (That's just my experience of people posting in newsgroups etc - outside mail, it's really not widely used at all. ) UTF-32: Fixed width encoding using 4 bytes per code point.

This isn't very efficient, but makes life easier outside the BMP. I have a . NET Utf32String class as part of my MiscUtil library, should you ever want it.(It's not been very thoroughly tested, mind you.) ASCII: Single byte encoding only using the bottom 7 bits.(Unicode 0-127.

) No accents etc. ANSI: There's no one fixed ANSI encoding - there are lots of them. Usually when people say "ANSI" they mean "the default code page for my system" which is obtained via Encoding. Default, and is often Windows-1252.

There's more on my Unicode page and tips for debugging Unicode problems. The other big resource of code is unicode. Org which contains more information than you'll ever be able to work your way through - possibly the most useful bit is the code charts.

Very informative Thanks – web dunia Mar 31 '09 at 6:24 Unicode! = UTF-16. Unicode is just the character set, representable as UTF7/8/16/32 – jalf Mar 31 '09 at 7:01 @jalf: But in the context of .

NET or Windows in general, when someone talks about the Unicode encoding, they mean UTF-16. Hence the "more properly known as" bit. – Jon Skeet Mar 31 '09 at 7:12 @jalf: Edited answer to clarify that though.

– Jon Skeet Mar 31 '09 at 7:48.

Some reading to get you started on character encodings: Joel on Software: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses! ) By the way - ASP. NET has nothing to do with it.

Encodings are universal.

The best site to refer would be : msdn.microsoft.com/en-us/library/dd37408...).aspx.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Unicode, UTF, ASCII, ANSI format differences?

Related Questions

Difficulties inherent in ASCII and Extended ASCII, and Unicode Compatibility?

Best way to convert a Unicode URL to ASCII (UTF-8 percent-escaped) in Python?

Manually converting unicode codepoints into UTF-8 and UTF-16?

Erlang has been slow to adopt Unicode. Is Unicode or UTF-8 a problem with CouchDB?

ANSI vs. non-ANSI SQL JOIN syntax?

UTS #10 Unicode Collation Algorithm is defined with a particular base version of the Unicode Standard, but I am using characters from a later version of Unicode. What shall I do?