You are on page 1of 8

THE STRING

INTRODUCTION

The string could be defined as an array of the characters, that has '\0' at it's end.
So, the building blocks of the strings are characters and the last character will serve like '.' in the
sentences we write.

The strings could be of different lengths, which could range from the empty string “” to various
positive lengths.
The different kinds of characters could be used for different types of the strings, this is very important
when we need to use the program in a different language settings.
In the most of it's applications, it gets used in order to communicate with an user. Aldo, some problems
use the strings as a part of it's solution.

Some other programming languages could have different types of the strings.

SOME BASIC FACTS ABOUT THE STRINGS

The Characters are the building blocks of the Strings, which could later be utilized in the textual files or
in the look-up tables for example.

There are different types of characters. Some of the more known types are: ASCII, ANSI, UNICODE,
EBSIDIC, etc...

In order to figure out what type of character set is used on your device, you could use this command:

echo $LANG.

In the case that one would like to figure out, all different types of characters that are ready to use, one
can do something like this:

iconv -l > presentCharacters.txt

Your particular settings, will define the translation environment, but when program is executed on
other devices it will use the settings from execution settings. Oh, by the way, we can change the
settings on our device as well.
In order to figure out that settings and change them from our program, we have few C functions:
 setlocale().
 localeconv().
 getenv()

Your copy of C, will use the source characters set, but particular run time will have its execution
character set. If this two types are different, the execution set will be translated into the execution
character set. This could be issue, in the case that those two sets are different and the settings are
different.

ABOUT THE CHARACTERS

The particular character set will be used as a building block of the string and that will tell your
computer how to interpret array of zeros and ones that are stored in your memory.
The one of first character sets is seven bit ASCII, but now the more commonly eight bit ASCII is used.
In order to define that type of character we write something like this:

char cCharacter = 'C';

All characters are treated as equal, except '\', which is used if we want to state, that the character has
some different interpretation. As an example we could have:
char cEndOfString = '\0', cSlash = '\\', cTab = '\t';

Beside this type of character sets, one can encounter:


 wide characters.
 multbyte characters.

The first type has the same length for all of its characters, and the second type has no such a
characteristics.

The most of the programmers need to use wchar_t, which is C's support for wide characters. The most
of the functions used for chars are very similar to the functions that use wide characters.

In our next example we will see a few of the basic functions used for the characters. Before we start
with our example, I need to say that ctype.h is the library for char type of the functions, and the
wctype.h is the library used for wide characters.
In the case that you have not notice 'w' stands for wide and 'c' for characters.

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int
main( void )
{
char cA = 'A', cB ='B', ca = 'a';
char cNumber = '7',
cDot = '.',
cAp = '\'';
putchar( cA );
( isupper( cA ) ) ? printf(" upper "):printf(" lower ");
printf(" case \n");

putchar( ca );
( isupper( ca ) ) ? printf(" upper "):printf(" lower ");
printf(" case \n");

putchar( cDot );
( ispunct( cDot ) ) ? printf(" punctuation "):printf(" not punctuation ");

putchar( ca );
( isupper( toupper( ca ) ) ) ? printf(" is now the upper "):printf(" is now the lower ");
printf(" case \n");

return EXIT_SUCCESS;
}

EXPLANATION:

In this example we have defined few different characters in the first place.
Later in our program, we have used function isupper( charcter ) and ternary operator, to figure out if
our character is the upper case, similar to this function you have islower.
There are also: isalpha, isdigit, isxdigit, isalnum, isprint, isgraph, etc...

One interesting function is toupper, which will turn the lower case letter into the upper case letter.
After that, we have performed few more tests and presented the results.

For an example that uses wide chars, this will do:

#include <stdio.h>
#include <stdlib.h>
#include <wctype.h>

int
main( void )
{
wchar_t cA = L'A';

putwchar( cA );
( iswupper( cA ) ) ? wprintf( L" upper " ) : wprintf( L" lower " );
wprintf( L" case \n" );

return EXIT_SUCCESS;
}

As an additional exercise, try to create the file and test the type of the file with: file -i test_file.txt.
I hope that this example is illustrative enough, to show you how to use this type of characters.

In one of our earlier articles, we have mention that one could try to create the function that uses only bit
operators in order to test if the character is upper case and avoids usage of operator '>' . I hope that next
part might be useful, in the case that you have not finished it completely:

( cChar – 'A' ) && !( cChar – 'Z' );

And don't forget about inline functions or variadic macros.

The most of those functions, mentioned in this section, will perform in O(1) time, but 1 might be faster
depending on actual implementation.
However, I have not done any tests in order to prove that this implementation would outperform
standard function.

ABOUT THE STRINGS

After we have learned some basic facts about building blocks, we will see how one can declare the
usage of a strings in it's C programs.
There are few usual way's that will be used in the most of your programs:

char BUFFER[ MAX_BUFFER_SIZE ];


char cStyleString[] = { 't', 'h', 'i', ' ', 'i', 's', ' ', 'O', 'K', '!', '\0'};
char* ptrString = “This will work nice”;

This three different definitions will create the string, but they are slightly different.
The first definition will reserve MAX_BUFFER_SIZE, places for our string. This will waste memory,
. However, it could be useful in some of the situations, when we need to have exact memory size
reserved for our string.
The second example will automatically calculate how many bytes we need for our string, and it is very
important to add '\0' at its end. This way, compiler will provide you with more freedom, but that
freedom comes with more responsibilities.
The third example will create pointer of char type, which means that you will be able to change the
address of the first character that is pointing to and one nice touch is that this way character '\0' will be
added automatically at the end of the string.
If you would like to create wchar_t string, you will change char and don't forget to add “L” in the front
of the string.

There are few interesting functions that are used in conjecture with strings. The next example will
illustrate how to use some of them.
CODE:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_BUFFER_SIZE 1024

int
main( void )
{
char buffer [ MAX_BUFFER_SIZE ];
char cString[] = "ABCD AAAA BBBB CCCC DDDD";
char* ptrString = "abcd aaaa bbbb cccc dddd";

printf("The size of buffer =%d \n", sizeof( buffer ) );


printf("The size of cString =%d \n", strlen( cString ) );
printf("The size of ptrString =%d \n", strlen( ptrString ) );

strcpy( buffer, cString);


puts( buffer ); putchar( '\n' );

strcat( buffer, ptrString );


puts( buffer ); putchar( '\n' );

return EXIT_SUCCESS;
}

In our program we have used:


 the operator sizeof in order to calculate the number of bytes occupied in the memory.
 the functions strlen is used when we need to calculate the size of the string.
 the function strcpy, as its longer version of the name said “string copy”, will copy cString into
the buffer.
 the function strcat, concatenates two strings.

Beside this set of functions one could need: strncpy, strncat, strcmp, strcmpi, strncmp, strstr, strchr,
etc...

But, if one wants to use wide characters, library wchar.h needs to be added and you would need to
replace the first 's' in the name of the function with 'wc'. However, we need to add the capital 'L' in the
front of the strings initialization.

The most of already mentioned functions will work in O(n) or O(n+m) time. When one tries to creates
its functions, some could be programmed in O(n^2) time. However, for some of those cases, when we
have O(n^2) time, it is possible to lower that execution time in O(n log n ) time on expense of more
memory used.
SOME USEFUL FUNCTIONS

Beside having those already mentioned functions, there are some useful functions that programmer
needs to know.
The first group are used in the situations in which we need to take the string from user and convert it
into some number type.
Some of very common functions of this type are: atoi, atol, strtol, atof, strtof, etc...
This group of function will have some buffer inside our program, which will be taken from user. After,
we have taken the string with gets or scanf, we convert the function into some other data type. In this
case one could use assert in order to protect it's program during the test phase.

Then, if you would like to figure out if the file is made of chars or wchar_t characters we have fwide
function.
This function is used like this:
fwide( FILE, mode );

and if
mode > 0 then the FILE suggests that it should be red like it is made of wide chars.
mode = 0 then the FILE doesn't have char orientation.
mode < 0 then the FILE is made of chars.

Then we have functions like: wctype, iswctype, wctrans and towctrans. This group of functions will not
be explained, because something is left for the reader.
As one additional function, we will mention mbtowc as well.

So, if you would like to learn more about this topic www.unicode.org should be good starting point.

AN ADDITIONAL EXAMPLE

In our next example we will see, how to get the string from the user and how to calculate how many:
 letters are used in the string,
 are there any numbers in the string,
 and how many vowels we have in input string.

CODE:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_BUFFER_SIZE 1024

#define EOS '\0'

int
main( void )
{
char cCurrentChar = ' ';
char buffer [ MAX_BUFFER_SIZE ];
char* ptrString = NULL;

int nNumberOfLetters , nNumberOfNumbers,


nNumberOfPerhaps = 0, nNumberOfVowels = 0;

nNumberOfLetters = nNumberOfNumbers = 0;

system("clear");
puts( "Input one sentence->" ); ptrString = gets( buffer );

if( ptrString == NULL )


{
puts( "The input was not correct!!!" );
return EXIT_FAILURE;
}

while( *ptrString )
{
cCurrentChar = *ptrString; ptrString++;

if( isalpha( cCurrentChar ) )


{
nNumberOfLetters++;

cCurrentChar = toupper( cCurrentChar );


switch( cCurrentChar )
{
case 'A': case 'E': case 'O': case 'I': case 'U':
nNumberOfVowels++; break;
case 'Y':
nNumberOfPerhaps++;break;
default :
continue;
}

continue;
}

if( isdigit( cCurrentChar ) ) { ++nNumberOfNumbers; }


}
printf( "The number of the letters = %d \n", nNumberOfLetters);
printf( "The number of the ciphers = %d \n", nNumberOfNumbers);
printf( "The number of the vowels = %d \n", nNumberOfVowels );
printf( "The number of the perhaps(y) = %d \n", nNumberOfPerhaps );

ptrString = NULL; free( ptrString );

return EXIT_SUCCESS;
}

So, in this example we have reserved place for our strings and some variables. After that, we read one
string from the user. Then, we test if input of the string was correct. Afterwards, we take any character
from the string and test it. In order to test the character, we use some functions and some commands.
One more thing is left to be done, we need to present the results to the user.

SOME FINAL WORDS

In this article, we have mostly seen, how we could use single characters and wide characters as well.
However, there is notion that this problem could have been solved in the better manner.

The next interesting fact that programmer needs to keep in its mind is that communication with user is
the source of many errors, the user will input wrong data ether on purpose or accidentally. Be sure that
you protect your code from both of this cases.

The execution time for those functions is pretty decent, but some of them could be optimized even
further more.

The strings are very needed when we wanna go global and produce professional applications. That fact
could sometimes be source of slow program as well. Be careful and try to solve it properly.

You might also like