Re: commit: UT_String class


Subject: Re: commit: UT_String class
From: Aaron Lehmann (aaronl@vitelus.com)
Date: Sat Feb 10 2001 - 17:41:59 CST


On Sat, Feb 10, 2001 at 07:18:10PM +0100, Mike Nordell wrote:
> In src/af/util/xp we now have four new files
> ut_string_class.cpp
> ut_string_class.h
> ut_stringbuf.cpp
> ut_stringbuf.h

I know I should have discussed this back before the commit, but better
late than never.

I'm very scared of string classes. Take a
look at the College Board's apstring class
(http://users.harker.org/hs/EnderB/APCS/classes/apstring.{cpp,h}). I'm
forced to use it in my AP Computer Science class, and it's pure evil. In
fact, they strongly discourage the use of ANY C-strings. What's wrong
with apstring?

1) It's poorly designed
It abstracts so much that the programmer does not realize what is
going on behind the scenes. The memory is reallocated on very many
operations and programmers are encouraged to use it in inefficient
ways. This leads to slowdown. See below.
2) It's slow
apstring is useless when C provides standard arrays which do
everything anyone needs in a string. Using an apstring requires more
memory space for container data, and more time for bounds checking and
abstration. In this case, the bounds checking is useless becuase it
abort()s the program on error, and in most cases a segmentation fault
would actually be more helpful. Users of apstring are encouraged to
use overloaded operators like += which cause all of the memory to be
reallocated and copied (it doesn't even use realloc() iirc for some
reason).
3) It's non-standard
EVERY C or C++ environment provides char*. Just about no projects have
apstring. This makes code less portable between codebases and harder
for other programmers to understand.

Now, this rant was about a different string class. But many of the
points apply to string classes in general. Yours could cause a mess if
some strings are C strings and others are UT_Strings. But I haven't
even looked at your code yet. My plea is:

* Only use UT_String where it makes sense (i.e. something that's
resized a lot for a good reason).
* If UT_String does bounds checking, make it dump core or somehow
invoke a stack trace on out-of-bounds error. Bounds checking should
not be enabled if !DEBUG, since the code using the class has to check
bounds anyway, otherwise it would cause a major bug. As you can see, I
don't believe in strings doing automatic bound checking.

In general, strings are low-level data structures that can be
implemented well as NULL-terminated arrays, and I am very cautious
about abstracting something as simple as this.



This archive was generated by hypermail 2b25 : Sat Feb 10 2001 - 17:42:05 CST