Converting between NSString and C strings in an iOS project

0.00 avg. rating (0% score) - 0 votes

I recently had to use a C library in an iPhone project, which is mostly in Objective-C. Things were going on smoothly until I had some C functions that return C strings (wchar_t*, char*) and require conversions in order to work with Objective-C NSString* types.

There are 3 ways to declare a string in an iOS project in xCode:

NSString* str = @”hello world”; // declare an Objective-C string
wchar_t* str = L”hello world”; // declare a wide-character (Unicode) string
char* str = “hello world”; // declare an ANSI string

If you declare a Unicode string via the L”” syntax, the compiler defaults it to UTF32. The function wcslen() to get the length (e.g. number of characters) of a string may not work properly if the input string is not UTF8 encoded. For example, try the following code:

wchar_t* str1 = L”Giới thiệu về Google”; // “About Google” in Vietnamese
wchar_t* str2 = L”Gioi thieu ve Google”; // simplified with ANSI characters only
printf(“str1 length: %d”, wcslen(str1));
printf(“str2 length: %d”, wcslen(str2));

The code will output wrong length for str1 and correct length for str2, even though they have the same number of characters. I think wcslen is confused by the UTF32 characters in str1 and counts some characters more than once. However, if I try the folowing code:

char* str3 = “Giới thiệu về Google”;
setlocale(LC_ALL, “en_US.UTF-8″);
int buflen = strlen(str3)+1;
wchar_t* buffer = malloc(buflen * sizeof(wchar_t));
mbstowcs(buffer, str3, buflen);
printf(“str3 length: %d”, wcslen(str3));
free(buffer);

to declare an ANSI string and convert it to UTF8 wide string by using setlocale to ensure the correct Unicode encoding, wcslen will return the correct string length. Not knowing what the problem is, I have to make sure that all C strings in my project are UTF8 encoded.

Conversion from NSString* to an ANSI string (char*) is easy using the built in NSUTF8StringEncoding method. The returned value is valid as long as the original value is valid, so there is no need to release or free it. The following method (taken from my custom NSString category) shows how to achieve this:

- (const char*)getMultiByteString
{
return [self cStringUsingEncoding:NSUTF8StringEncoding];
}

It is a bit more complicated with C function mbstowcs to convert from NSString* to a wide string (wchar_t*):

- (wchar_t*)getWideString
{
const char* temp = [self cStringUsingEncoding:NSUTF8StringEncoding];
int buflen = strlen(temp)+1; //including NULL terminating char
wchar_t* buffer = malloc(buflen * sizeof(wchar_t));
mbstowcs(buffer, temp, buflen);
return buffer;
}

It is the responsibility of the caller to free the returned buffer. To improve, one can free the return value in the dealloc() method of NSString. The return type should then be changed to const wchar_t* to indicate that the returned value is read-only.

Take note that wchar_t is 2 bytes on Windows but 4 bytes on Unix/Linux (including iOS). The above function uses sizeof to determine the size of wchar_t for the sake of generality.

Using stringWithUTF8String and wcstombs we can do the reverse and convert a C string into NSString:

+ (NSString*)stringWithWideString:(const wchar_t*)ws
{
// Destination char array must allocate more than just wcslen(ws)
// since unicode chars may consume more than 1 byte
// we do not yet know how many bytes the created array may consume, so assume the max.
int bufflen = 8*wcslen(ws)+1;
char* temp = malloc(bufflen);
wcstombs(temp, ws, bufflen);
NSString* retVal = [self stringWithUTF8String:temp];
free(temp);
return retVal;
}

I hope this will help other with similar problems.

0.00 avg. rating (0% score) - 0 votes
ToughDev

ToughDev

A tough developer who likes to work on just about anything, from software development to electronics, and share his knowledge with the rest of the world.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>