Re: commit: UT_String class


Subject: Re: commit: UT_String class
From: Joaquín Cuenca Abela (cuenca@celium.net)
Date: Sun Feb 11 2001 - 09:38:25 CST


Aaron Lehmann wrote:
>
> On Sat, Feb 10, 2001 at 07:18:10PM +0100, Mike Nordell wrote:
> > In src/af/util/xp we now have four new files
> > ut_string_class.cpp
> > ut_string_class.h
> > ut_stringbuf.cpp
> > ut_stringbuf.h
>
> I know I should have discussed this back before the commit, but better
> late than never.
>
> I'm very scared of string classes. Take a
> look at the College Board's apstring class
> (http://users.harker.org/hs/EnderB/APCS/classes/apstring.{cpp,h}). I'm
> forced to use it in my AP Computer Science class, and it's pure evil. In
> fact, they strongly discourage the use of ANY C-strings. What's wrong
> with apstring?

I will reply only to the points that seems to be common to Mike's string
class

> 1) It's poorly designed
> It abstracts so much that the programmer does not realize what is
> going on behind the scenes. The memory is reallocated on very many
> operations and programmers are encouraged to use it in inefficient
> ways. This leads to slowdown. See below.

to do an abstraction of what is going on behind the scenes is usually a
good programming practice. The rest of your point (how the memory is
reallocated, etc.) is specific to apstring.

> 2) It's slow
> apstring is useless when C provides standard arrays which do
> everything anyone needs in a string. Using an apstring requires more

if you think that standard arrays do everything anyone needs in a
string, you are in need of a good CS course. char* doesn't abstracts a
string, char* + strcat + strdup + stretc... try to abstract a string.

> memory space for container data, and more time for bounds checking and
> abstration. In this case, the bounds checking is useless becuase it
> abort()s the program on error, and in most cases a segmentation fault
> would actually be more helpful. Users of apstring are encouraged to

yes, but I hope that you are aware that a overflow don't finish in
segfault (except if you are a very very lucky man, or if you're
overflowing many many bytes, or you're using a good mem checker...
usually all seems to just work). So usually, when you overflow in C,
you're screwed.

Ah, and the dude that hacked apstring was so dumb to do bounds checking
(if needed) in a non-debug build, then it's the AP CS fault, not Mike's
fault. (You can just place an assert(...) to protect agains overflows,
and in a non-debug build you will have 0 runtime penalty).

> 3) It's non-standard
> EVERY C or C++ environment provides char*. Just about no projects have
> apstring. This makes code less portable between codebases and harder
> for other programmers to understand.

EVERY C++ environment provides all the stuff that we need to build
UT_String (and to use UT_int32, etc.). Now, I agree that it would be
better to use std::string, that EVERY C++ environment provides anyway
(if it didn't provide std::string, then it's not a C++ environment by
definition).

> Now, this rant was about a different string class. But many of the
> points apply to string classes in general. Yours could cause a mess if
> some strings are C strings and others are UT_Strings. But I haven't

Why?

> even looked at your code yet. My plea is:>
> * Only use UT_String where it makes sense (i.e. something that's
> resized a lot for a good reason).

that will be a good example of why you should definitevily use Mike's
string class. It would do a much better job in this condition that raw
char*. (check the tests).

> * If UT_String does bounds checking, make it dump core or somehow
> invoke a stack trace on out-of-bounds error. Bounds checking should
> not be enabled if !DEBUG, since the code using the class has to check
> bounds anyway, otherwise it would cause a major bug. As you can see, I
> don't believe in strings doing automatic bound checking.

agreed, and don't worry.

> In general, strings are low-level data structures that can be
> implemented well as NULL-terminated arrays, and I am very cautious
> about abstracting something as simple as this.

as simple?
I really hope to see no more char tmp[4086]; lines in the abi sources,
and to see Mike's string class used everywhere.

Cheers,

--
Joaquin Cuenca Abela
cuenca@celium.net



This archive was generated by hypermail 2b25 : Sun Feb 11 2001 - 09:38:31 CST