Assigning a “const char*” to std::string is allowed, but assigning to std::wstring doesn't compile. Why?

The relevant part of the string API is this constructor.

The relevant part of the string API is this constructor: basic_string(const charT*); For std::string, charT is char. For std::wstring it's wchar_t. So the reason it doesn't compile is that wstring doesn't have a char* constructor.

Why doesn't wstring have a char* constructor? There is no one unique way to convert a string of char to a string of wchar. What's the encoding used with the char string?

Is it just 7 bit ASCII? Is it UTF-8? Is it UTF-7?

Is it SHIFT-JIS? So I don't think it would entirely make sense for std::wstring to have an automatic conversion from char*, even though you could cover most cases. You can use: w = std::wstring(h, h + sizeof(h) - 1); which will convert each char in turn to wchar (except the NUL terminator), and in this example that's probably what you want.As int3 says though, if that's what you mean it's most likely better to use a wide string literal in the first place.

You should do: #include int main() { const wchar_t h = L"hello"; std::wstring w = h; return 0; } std::string is a typedef of std::basic_string, while std::wstring is a typedef of std::basic_string. As such, the 'equivalent' C-string of a wstring is an array of wchar_ts. The 'L' in front of the string literal is to indicate that you are using a wide-char string constant.

4 A good way to handle this is like the win32 api and write a TEXT macro that either leaves the string as it is or prepends the L using the ## macro token. So you could write TEXT("hello") and the macro would expand to the correct form. – Mike Weller Dec 6 '09 at 16:14.

Small suggestion... Do not use "Unicode" strings under Linux (a.k.a. Wide strings). Std::string is perfectly fine and holds Unicode very well (UTF-8).

Most Linux API works with char * strings and most popular encoding is UTF-8. So... Just don't bother yourself using wstring.

Not true. For example, string::size() gives you the wrong answer if your string contains UTF-8 characters that aren't ASCII. It is indeed possible to use std::string for this, but you need to be very careful!

– Thomas Dec 6 '09 at 16:07 There is one advantage of UTF-32 (which is what wchar_t is on linux), which is that it's easy to do stuff like reversing strings. To reverse a UTF-8 string, you have to parse it into distinct characters anyway. So if you're doing a lot of stuff that acts on unicode characters (rather than their constituent UTF-8 bytes), then you want a wide representation.

– Steve Jessop Dec 6 '09 at 16:10 Does std::wstring::size() gives correct number of characters? NO! Sizeof(wchar_t) may be 2 and thus, valid codepoints in 0x10000 - 0x1FFFFF would be represented as surrogate pairs, and if you assume that size gives you correct number for wstring your code is WRONG.

;) – Artyom Dec 6 '09 at 16:13 @Steve, reversing UTF-32 string char-by-char would give you wrong results. Because Character! =CodePoint.

For example in hebrew word "? " reversed would give you incorrect diacritic points. Because character "?" consists of 3 code points "?

" and two vowels... – Artyom Dec 6 '09 at 16:17 Artyom: oh, yes, I forgot about Windows and Microsoft's half-baked Unicode... On most other systems, wchar_t is the full 32 bits. But even in that case (diacritics, etc.) you still won't get the right answer. I'm not saying that this is necessarily a problem -- but it will be, if you're not aware of it.

– Thomas Dec 6 '09 at 16:24.

In addition to the other answers, you could use a trick from Microsoft's book (specifically, tchar. H), and write something like this: # ifdef APP_USE_UNICODE typedef std::wstring AppStringType; #define _T(s) (L##s) # else typedef std::string AppStringType; #define _T(s) (s) # endif AppStringType foo = _T("hello world! "); (Note: my macro-fu is weak, and this is untested, but you get the idea.).

Looks like you can do something like this: #include // ... std::wstringstream tmp; tmp.

To convert from a multibyte encoding to a wide character encoding, take a look at the header and the type std::codecvt. The Dinkumware library has a class Dinkum::wstring_convert that makes performing such multibyte-to-wide conversions easier. The function std::codecvt_byname allows one to find a codecvt instance for a particular named encoding.

Unfortunately, discovering the names of the encodings (or locales) on your system is implementation-specific.

You should use #include tstring instead of wstring/string TCHAR* instead of char* and _T("hello") instead of "hello" or L"hello" this will use the appropriate form of string+char, when _UNICODE is defined.

(environment: gcc-4.4.1 on Ubuntu Karmic 32bit)" There is no tchar. H on my Karmic system. I'm pretty sure it's Windows-specific... – Thomas Dec 6 '09 at 16:02 2 -1 TCHAR is windows specific... Never use it in portable apps.

– Artyom Dec 6 '09 at 16:04 I'd never use wchar in portable apps.. Windows has much better support for it than linux : – Yossarian Dec 6 '09 at 16:06 1 The problem is sizeof(Windows::wchar_t)=2, sizeof(AllOtherNonWindowsWorld::wchar_t)=4... Also, UTF-8 is generally much more preferred and less error prone. – Artyom Dec 6 '09 at 16:09 @Artyom: yes, especially because ASCII is a strict subset of UTF-8. It makes the transition quite a bit simple.

– Tom Dec 6 '09 at 18:05.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions