trim, ltrim, and rtrim in C
7
The modus operandi for this is similar to that taken by PHP's implementation of such functions. It's comparitively memory-intensive, but is much faster than running a whole bunch of tests.
Basically, you set a mask -- an array of 256 null bytes -- and set those that correspond to characters you wish to trim. Then, rather than having to test if a character is in the set of characters to trim(O(n), or linear time on *ws), you just test once (O(1), or unit time) to see if the byte in question is set.
And of course, to trim(), you just wrap trim() around both ltrim() and rtrim().
One point of caution: these functions trim in place, so copy strings before trimming them. (Of course, if you usually want access to both pre- and post-trimmed strings, you could always make these malloc() a new string and return a pointer to it . . . )
Basically, you set a mask -- an array of 256 null bytes -- and set those that correspond to characters you wish to trim. Then, rather than having to test if a character is in the set of characters to trim(O(n), or linear time on *ws), you just test once (O(1), or unit time) to see if the byte in question is set.
And of course, to trim(), you just wrap trim() around both ltrim() and rtrim().
One point of caution: these functions trim in place, so copy strings before trimming them. (Of course, if you usually want access to both pre- and post-trimmed strings, you could always make these malloc() a new string and return a pointer to it . . . )
int sgstr_ltrim (char *string, char *ws) {
char mask[256];
int i;
for(i=0; i<256; i++) {
mask[i] = '\0';
}
while(*ws) {
mask[(uchar)*ws] = *ws;
ws++;
}
char *ptr = string;
while(*ptr) {
if(mask[(uchar)*ptr]) {
ptr++;
}
else {
break;
}
}
while(*ptr && ptr != string) { // shift the string until we reach \0
*string = *ptr;
ptr++;
string++;
}
*string = *ptr; // now copy over our null terminus
return 0;
}
char mask[256];
int i;
for(i=0; i<256; i++) {
mask[i] = '\0';
}
while(*ws) {
mask[(uchar)*ws] = *ws;
ws++;
}
char *ptr = string;
while(*ptr) {
if(mask[(uchar)*ptr]) {
ptr++;
}
else {
break;
}
}
while(*ptr && ptr != string) { // shift the string until we reach \0
*string = *ptr;
ptr++;
string++;
}
*string = *ptr; // now copy over our null terminus
return 0;
}
int sgstr_rtrim (char *string, char *ws) {
char mask[256];
int i;
for(i=0; i<256; i++) {
mask[i] = '\0';
}
while(*ws) {
mask[(uchar)*ws] = *ws;
ws++;
}
int len = strlen(string);
while(len) {
len--;
if(mask[(uchar)string[len]]) {
string[len] = '\0';
}
else {
break;
}
}
return 0;
}
char mask[256];
int i;
for(i=0; i<256; i++) {
mask[i] = '\0';
}
while(*ws) {
mask[(uchar)*ws] = *ws;
ws++;
}
int len = strlen(string);
while(len) {
len--;
if(mask[(uchar)string[len]]) {
string[len] = '\0';
}
else {
break;
}
}
return 0;
}






Depending on the processor, the relative address of the array is loaded and then a bittst function calculates the offset to the appropriate bit and does a test on it. There is a memory load, an offset add, a memory access, and then the test. Then a decision jump is done.
The implementation of the array scheme you propose does something similar, except that there is an absolute memory add calculation involved.
So the question is whether, on an individual compiler implementation (depending on the processor as well), it takes less time for the internal processor to do the offset calculation or the individual processor to do the memory add calculation. The compiler creates the array in both cases on initialization, or you can create a blank array and fill it on the go.
When dealing with repeated uses of such arrays, speed is a consideration.
Thanks for the suggestion.
Tim.
Tim.