improving the string library


Subject: improving the string library
From: Tomas Frydrych (tomas@frydrych.uklinux.net)
Date: Sat Dec 23 2000 - 12:01:30 CST


First of all, I have got to appologise, because some of the figures I
posted couple of days back were badly wrong; I forgot to turn the
optimatization on on my testing programme :-). I have done some
more tests now and the general conclusion is that it does not
make sense to provide replacements for any of the basic string
functions such as strlen, strcpy and strcat.

This is not so much becuase one could not at all improve on the
raw assembly (although the margins are tight), but because the
optimizing compiler inlines these, and by this I do not mean simply
pastes the assembly in, but rather does so with without the
standard C prologue and epilogue which it cannot do with my
externs.

For the same reasons I agree with Mike that also for the UT_UCS_
functions we should use the wstr* functions from the library if
available.

There is one function from the std lib which I can significantly
speed up, the strstr; my asm version takes only 60% of the time.
The algorithm we have in UT_UCS_strstr is though faster than the
one used by the library, I can improve on it only about 20%.
However, I suspect that neither of these functions is speed critical
for us.

The other function which I can improve on a lot is unichar_to_utf8,
where I can get 30-20% speed up; (my implementation is biased
toward the shorter chars, i.e., 30% for 1byte utf8 and 20% for 6byte
utf8). This so far appears to be the only function that might be
worth replacing.

Just to make sure I have made clear what I have in mind, I am not
talking about writting some inline code into the C++ sources, but
about writting a library entirely in asm, one independent of the C++
sources which would never come near the GNU tools until linkage.
The choice of the functions from the C++ sources or from the asm
lib would be made at compile time using an ABI_OPT_USE_NASM
variable; this only requires some #define's and #ifdef's in ut_string.h
and ut_string.cpp, and avoiding including <string.h> directly, but
rather including it through ut_string.h; there is only one file in the
Unix tree where this happens (I do not know about the other
platforms though).

> GAS's (gnu assembler) syntax is totally different from NASM's one (order of
> operands, etc). You have to use C preprocessor and macros for asm code to be
> compiled by gnu tools (i.e on Linux, BSD and possible Solaris for x86, and
> even may be QNX and BeOS for x86 since they use gnu toolchain AFAIR).

I am aware of this; however Gas is a pain to programme for, and I
have some code for NASM I have written a while back that I can
reuse. Since NASM is freely available, I do not see this as a
problem; however, if the library was to contain only a few functions,
I might consider converting it to Gas syntax once it is debugged.

Tomas

*********************************************
tomas@frydrych.net / www.frydrych.net
PGP keys: http://www.frydrych.net/contact.html



This archive was generated by hypermail 2b25 : Sat Dec 23 2000 - 12:06:04 CST