Migration Considerations - Unicode

For the most part, your older C++ Builder source code and forms will import, compile, link and execute in current C++ Builder 10 Seattle, but you may need to make Unicode changes, and re-build 3rd party libraries, if these apply:

  1. Any libraries that were built with an older version of C++ Builder will need to be recompiled in order to be used by a C++ Builder 10 Seattle project.

    In general, you have to make sure any third party packages/components you are using are upgraded to C++ Builder 10 Seattle first, that your code is not making any assumptions about pointer/integer byte sizes, that any strings that interact with the VCL are Unicode-ready, etc.
  2. Starting with C++ Builder 2007 (including the current C++ Builder 10 Seattle), C++ Builder supports Unicode, meaning you may need to make some Unicode conversions to your code, based on how you are using Strings and Chars, such as String is replaced by AnsiString.  Some examples:

    //Line = Line + String().sprintf("0x%02x", *pData++) + ", ";
    Line = Line + AnsiString().sprintf("0x%02x", *pData++) + ", ";
    //String Cmd = sCommand.Trim();
    //String File;
    AnsiString Cmd = sCommand.Trim();
    AnsiString File;
  3. The newer C++ Builder adheres to the C++ Standard, so any modifications would generally have to do with ambiguities relating to Variants, TDateTime, Cardinal, and how conversions/casts occur between these types.  There are also cases where older C++ Builder compilers used to allow temporaries to be constructed and used in a way which is not allowed by the C++ standard, these will now be flagged as errors at compile time.

 

Tips for migrating legacy C++ Builder to C++ Builder 10 Seattle

  1. Don't let C++ Builder 10 Seattle convert your older Builder project. Copy your files into a new folder, create a new project and add your source files to it. It's a little more effort at first but it saves you a lot later on.
  2. Unicode conversions may be needed (Ansi character to Unicode conversion).

    All VCL functions that used to accept arguments of type char* (for example Application->MessageBox) now require wchar_t*. VCL object properties that returned AnsiString now return UnicodeString (for example Label->Caption).

    If the argument is a string constant all you have to do is place the letter L in front of it. It is more difficult if you are passing a variable as an argument.

    If you are using the type String in your code it now maps to UnicodeString and no longer to AnsiString. UnicodeString.c_str() returns wchar_t* and AnsiString.c_str() still returns char*.

    Don't bother replacing all occurrences of String with AnsiString as suggested elsewhere. Instead define the following two functions:

    #define STR_CONV_BUF_SIZE 12000 // the largest string you may have to convert. depends on your project
    
    wchar_t* __fastcall UnicodeOf(const char* c)
    {
    	static wchar_t w[STR_CONV_BUF_SIZE];
    	memset(w,0,sizeof(w));
    	MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, c, strlen(c), w, STR_CONV_BUF_SIZE);
    	return(w);
    }
    
    char* __fastcall AnsiOf(wchar_t* w)
    {
    	static char c[STR_CONV_BUF_SIZE];
    	memset(c, 0, sizeof(c));
    	WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, w, wcslen(w), c, STR_CONV_BUF_SIZE, NULL, NULL);
    	return(c);
    }
    and use them wherever one type is returned and the other type is required and vice versa. Careful: what makes these functions convenient to use, i.e. the static buffer, can also cause undesired behavior. Make sure AnsiOf/UnicodeOf is not called as an argument of a function that may itself call AnsiOf/UnicodeOf, or that AnsiOf/UnicodeOf is not used for two or more arguments of the same function.

    Or, you could just let the RTL handle conversions for you. You can assign an AnsiString to a UnicodeString and vice versa, and the RTL will convert as needed.
  3. Compile your project and resolve type errors by inserting calls to "AnsiOf" or "UnicodeOf" as applicable. Continue until your project compiles without errors.
  4. Search for all occurrences of sprintf, fwrite etc. These functions accept any variable type for some of their arguments so the compiler may not complain. Manually check whether a call to AnsiOf needs to be inserted.  Or update the format specifiers to use %S instead of %s as needed.
  5. Optimize: Eliminate some redundant use of AnsiOf and UnicodeOf.

    For example fopen(AnsiOf(... can easily be replaced by _wfopen(... Also fnsplit(AnsiOf(... can relatively easily be replaced by _wfnsplit(...

    You will have to define the other arguments as wchar_t as well, which doesn't cause too much of a ripple effect. Furthermore sscanf(UnicodeOf(... can be replaced by swscanf(...

    Consider converting some of your variables from char to wchar_t wherever it makes the most sense.

    If you used string functions such as strcpy etc. all you have to do is replace str with wcs in most cases.

    If you used strncpy(s1, s2, sizeof(s1)-1) as a safe version of strcpy, replace it with wcsncpy(s1, s2, sizeof(s1)/2-1) because the size of an array of wchar_t is twice the characters it can hold.
  6. Be prepared that some VCL components such as TColorGrid may no longer be available and you will have to find a workaround.
  7. The compiler will tell you that some functions or object properties have been deprecated. You will have to work around them as well

 

Tips for Porting an Embarcadero C++ Builder App to Unicode

  1. Change the Application's project setting _TCHAR maps to to wchar_t. It is up to you if you want to make this a default setting for all new applications. By default it is set to char.
  2. Consider starting by doing some general search and replace. For example:
    All instances of AnsiString with UnicodeString
    All instances of AnsiCompare, AnsiPos, and AnsiCompareIC with Compare, Pos, and CompareIC respectively.
  3. Don't change all char to wchar_t automatically. Look at each file first and make sure the changes are appropriate. If you didn't write the code to use AnsiString then chances are there is something there that needs special attention.
  4. Don't forget to prefix character string constants with L, (i.e. L"Hello World") or better yet, use the _T() macro. (ex: _T("Hello World")). This will make sure that the application doesn't have to convert the string constants to Unicode at runtime.
  5. Know your strings:
    AnsiString - A string based on the system default ANSI code page. Avoid using this if you are going to be fully Unicode. If some of your libraries are not Unicode then you may have to use this when calling those library functions.

    UnicodeString - Try and always use this string for internal storage. Especially if you plan to pass your string to the VCL or Windows API.

    UTF8String - Better than AnsiString. There is no chance of losing data when converting a UnicodeString to a UTF8String. If you need an 8-bit string for storing data then consider using this over AnsiString.

    ASCIIString (aka AnsiStringT<20127>) - A handy string when your wanting plain text data and not sure whether the receiving end accepts international characters or what code page will be supported. Assigning a Unicode or ANSI string to this type of string will remove the decoration from many international letters and turn them into their English alphabet equivalent. Handy in a few cases, but beware that unmapped characters turn into question marks.
  6. A common question:  Is there any mechanism to get the TStrings, TOpenDialog.FileNam, etc. to be AnsiStrings instead of UnicodeStrings?   I have a large amount of code that I'd need to move from legacy C++ Builder to the C++ Builder 10 Seattle and was looking for a quick way to make it all work, since I know that my applications will only be used in English.   Many of these applications use the old FILE ( fopen, fread,
    fwrite
    ).

    Answer: 

    You could let the RTL handle conversions for you. You can assign an AnsiString to a UnicodeString and vice versa, and the RTL will convert as needed.

    Search for all occurrences of sprintf, fopen, fread, fwrite etc. These functions accept any variable type for some of their arguments so the compiler may not complain. Manually check whether a call to AnsiOf needs to be inserted.  Or update the format specifiers to use %S instead of %s as needed

 

Additional Resources for migrating legacy C++ Builder

Unicode Migration Resources for Delphi, C++Builder and RAD Studio >


C++ Builder and Migration to Unicode

C++ Specifics Index

Features that distinguish RAD Studio C++, such as descriptions of the C++0x features  supported in C++Builder >

Recommendations on working with UnicodeString in C++ >

Unicode for C++ Index >

Enabling C++ Applications for Unicode >

Unicode in RAD Studio >

_TCHAR Mapping >

Floating Functions >

UTF-8 Conversion Routines >

Directories and Conditionals >

Cheat Sheet: Unicode-enabling Microsoft C/C++ Source Code

Hope this helps in your migrations.

Please let me know what specific C++ Builder Unicode issues you are having.

Embarcadero is here to help!

[DownloadButton Product='Cbuilder' Caption='C++ Builder 10 Seattle Download is here!']

 

Reduce development time and get to market faster with RAD Studio, Delphi, or C++Builder. Design. Code. Compile. Deploy.

Start Free Trial   Free Delphi Community Edition   Free C++Builder Community Edition   Upgrade Today