Zhi's profileTo The LighthousePhotosBlogLists Tools Help

Blog


    4/11/2007

    UNICODE/Wide Characters handling in C++

    I was bitten again.

    Life was never meant to be easier, and it's tougher when you come to deal with wide characters in C++ with wfstream, wcout or any other WIDE versions of standard I/O facilities.

    Two Rule of Thumbs:

    #1 Unicode files must be opened as binary

    Example:

    std::wifstream xmlFile(m_FileName, ios::binary);

    std::wofstream xmlFile(m_FileName, ios::binary);

    #2 when working with languages other than English, wifstream/wofstream must be imbued with a non-default facet to read from or write to a real UNICOE file, or else wofstream ends up writing an ANSI file.

    An explanation is available from here .

    Example:

      1:  wstring ws(L"this is a wide string"); 
    
      2:  wofstream of_imbued;
    
      3: 
    
      4:  IMBUE_NULL_CODECVT(of_imbued); 
    
      5: 
    
      6:  of_imbued.open(L"c:\\imbued.txt", ios::binary);
    
      7:  of_imbued<<ws.c_str(); 
    
      8: 
    
      9:  wofstream of_not_imbued;
    
     10:  of_not_imbued.open(L"c:\\not_imbued.txt", ios::binary);
    
     11:  of_not_imbued<<ws.c_str();
    
     12: 

    Outputs of the above code:

    Two imbue facilities are available:

    - Boost Library

    - imbue_null_codecvt (the one used in above example)

    There's also a classical  C way to write UNICODE files:

      1: wchar_t myWString[] = L"Some strange characters." 
    
      2: fwrite(myWString, sizeof(wchar_t), sizeof(myWString)/sizeof(wchar_t), 
    
      3: myFile ); 

    However, it is not portable.

    References:

    1. Unicode Implementation

    http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/ffe0912d1462d7a5/7601a62008fdd25a?lnk=st&q=wfstream+fstream+cout+wcout&rnum=6&hl=en#7601a62008fdd25a

    1. Unicode in C++

    http://groups.google.com/group/comp.lang.c++/browse_thread/thread/f4a6a434b0453187/1edc2bc1f4187597?lnk=st&q=wfstream+fstream+cout+wcout&rnum=3&hl=en#1edc2bc1f4187597

    1. how to read a Unicode file with fstream?

    http://groups.google.com/group/microsoft.public.vc.stl/browse_thread/thread/45d7520ec3ad3f51/d57b41e9abb20117?lnk=st&q=wfstream+fstream+cout+wcout&rnum=2&hl=en#

    1. A very puzzling problem: cout vs. wcout, fstream vs. wfstream

    http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/37c3e24861ca09e3/78fe0aeed7b728de?lnk=st&q=wfstream+fstream+cout+wcout&rnum=1&hl=en#78fe0aeed7b728de

    1. Upgrading an STL-based application to use Unicode

    http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp

    Comments

    Please wait...
    Sorry, the comment you entered is too long. Please shorten it.
    You didn't enter anything. Please try again.
    Sorry, we can't add your comment right now. Please try again later.
    To add a comment, you need permission from your parent. Ask for permission
    Your parent has turned off comments.
    Sorry, we can't delete your comment right now. Please try again later.
    You've exceeded the maximum number of comments that can be left in one day. Please try again in 24 hours.
    Your account has had the ability to leave comments disabled because our systems indicate that you may be spamming other users. If you believe that your account has been disabled in error please contact Windows Live support.
    Complete the security check below to finish leaving your comment.
    The characters you type in the security check must match the characters in the picture or audio.

    To add a comment, sign in with your Windows Live ID (if you use Hotmail, Messenger, or Xbox LIVE, you have a Windows Live ID). Sign in


    Don't have a Windows Live ID? Sign up

    Trackbacks (1)

    The trackback URL for this entry is:
    http://thezebra.spaces.live.com/blog/cns!ae99bcb7ccaab5a8!278.trak
    Weblogs that reference this entry