Subject: Re: Performance benchmarks for Tcl 8.1 (was: 8.1 slower then 8.0???) - DN [1]
derijkp@uia.ua.ac.be (Peter.DeRijk) - 12 May 1999 - comp.lang.tcl
Scott Stanton (scott.stanton@Scriptics.com) wrote:
: Unfortunately the trade-off is that string indexing becomes O(n) instead of
: O(1). The good news is that indexing by character isn't all that common in the
: code we've looked at (e.g. exmh). The bad news is that when it hurts, it tends
: to hurt a lot. I think there are a number of optimizations we may consider
: making to ameliorate the worst cases, but it's going to be an incremental
: process as we figure out where the hot spots really are.
As I work in molecular biology, string indexing is extremely common in
my code, and usually on very long strings. I am sure there are many other
real world uses that do rely on string processing: I have done a small
check, and the slowdown in Tcl8.1 is so bad that it is completely
unacceptable. I am currently staying at 8.0, and if this is not expected
to improve, I will have to look at other options ...
tcl8.0:
% time {for {set i 1} {$i < 10000} {incr i} {append seq A}}
413689 microseconds per iteration
% time {string range $seq 9000 9010} 100
35 microseconds per iteration
tcl8.1:
% time {for {set i 1} {$i < 10000} {incr i} {append seq A}}
616221 microseconds per iteration
% time {string range $seq 9000 9010} 100
10475 microseconds per iteration
: There is already an expandable string object type that could be extended to
: include a character count in addition to the buffer size that it currently
: includes. Increasing the base size of a Tcl_Obj would be very costly in terms
: of the amount of storage used, so I'd be reluctant to make every object pay the
: cost. Keeping the size around will only help with "string length", but not
: "string index".
It would help with index and range as well: if utf size and byte size is
the same, there are no utf characters, and plain and fast indexing
can be used.
: Another possible change I've considered is adding a UnicodeString object type
: that keeps the object in double-byte form similar to the way ByteArray keeps
: the data in a single-byte form. This would make both indexing and length
: computations fast, but would potentially double the storage cost.
Another possibility (that would take a lot of work though) would be to
keep the String object the way it was in Tcl 8.0, and
making a new UnicodeString object type that works the way Strings work in
Tcl 8.1. If unicode characters appear, the String object is converted
to the UnicodeString object, otherwise, the nice and fast old routines
can be used.
: Tcl's current string usage is already pretty ASCII-centric, and the UTF
: routines are actually pretty fast. I think our efforts are best spent looking
: for ways to change algorithms to avoid recomputing information.
I would not call a 300 fold slowdown on a common operation pretty fast.
Especially since I do not need the Unicode ...
--
Peter De Rijk derijkp@uia.ua.ac.be
<a href="http://rrna.uia.ac.be/~peter/">Peter</a>
To achieve the impossible, one must think the absurd.
to look where everyone else has looked, but to see what no one else has seen.
Last modified
1999-09-27
1999-09-27
(195.108.246.50)
Note: you are looking at
the snapshot of an old wiki
- much of this information
is likely to be very outdated
