04 Sep, 2007, Justice wrote in the 21st comment:
Votes: 0
This code is interesting. Rather than using a manager, each string attempts to maintain it's own references. It probably uses less CPU than how me and david handle this, but likely uses more memory. (without a central list, duplicate strings can exist). It looks useful where a single string would be referenced from multiple locations, like an index passing it's strings to an instance.
04 Sep, 2007, kiasyn wrote in the 22nd comment:
Votes: 0
That about sums it up :P
04 Sep, 2007, David Haley wrote in the 23rd comment:
Votes: 0
Justice said:
Was there something I missed then? Yes, I noticed the hashing, but imho, that's part of maintaining the table.

Well, yes; notice that the hash table manager is subclassed off of a general manager. A ref-counting string manager is after all nothing more than an associative array mapping string to reference count. A hash table is one implementation; a BST is another. The StringManager class provides the high-level concepts; the HashTable implements those using a hash table. In other words, a std::map implementation would be an almost immediate subclassing of the general StringManager.

Justice said:
As for efficiency, that depends on many factors. With enough buckets, well balanced data, and a good hashing function, yes a hashtable would have better performance. Given the similarities of strings used in muds, I suspect that either algorithm will have a tendency to be imbalanced.

Do you have empirical data for stating that strings are similar, and if so, what is your similarity metric? I don't have any data, but you sound like you've studied the problem.

Justice said:
Having read your release notes and documentation, I believe this code will be easier to use and maintain.

Mind if I ask why? It sounds like you have found non-trivial problems with the ease of use and maintainability – I would be very happy to improve this stuff if anything for my own uses. If it's just that you prefer your own code, that's a perfectly valid reason… I sometimes feel that way myself. :wink: (Obviously, it's usually easier to deal with something you wrote yourself, because you know it inside out.)
04 Sep, 2007, Justice wrote in the 24th comment:
Votes: 0
DavidHaley said:
Well, yes; notice that the hash table manager is subclassed off of a general manager.

Perhaps it's just semantics… but generally speaking, I view the interface by which a table is accessed as part of maintaining the table. Especially when the said interface defines some of the functionality. Given that only one implementation exists, and in a given application, only one implementation is likely to be used… In my opinion, the overhead of access through virtual functions is wasteful and unnecessary.

DavidHaley said:
Do you have empirical data for stating that strings are similar, and if so, what is your similarity metric? I don't have any data, but you sound like you've studied the problem.

Well, I wouldn't say that I have empirical data on the similarity of strings. However in my experience, builders have preferences in terms of themes they are comfortable using. This results in strings that are quite similar textually. In a stock smaug I can see many references to "a glowing <blah> potion". Or "the dwarven <blah>". "A gnome <blah>".

So yeah, lets discuss the algorithms now, and what the term "unbalanced" really means. std::map does not specify implementation, but you're right it tends to use a binary tree, on the server I'm using it is implemented as a red/black tree.

A hashtable attempts to reduce lookup time by distributing values among several "buckets", it is considered "balanced" when each bucket maintains about the same number of elements. Your worst case performance is of course to attempt to find the last element of the largest bucket. Assuming a well balanced table, the hash function would reduce the number of values to check to 1/nth of the total number of elements. Where n is the number of buckets. Given an imbalanced hashtable your performance can become very inconsistent because some buckets may have 1-0 items in them, while others may have hundreds. The number of buckets, hashing function, and how buckets are selected being a major consideration. A common theme is to use a modulus to select a bucket (as you do), this tends to make the number of buckets even more important as the relationship to the hash values vs the number of buckets.

A binary tree attempts to reduce lookup time by branching the values at each comparison. In a well balanced tree you reduce the number of values to check by roughly half for each comparison. It is considered imbalanced when one side of a "branch" contains more nodes than the other side. It's worst case performance is to search the entire height of the tree. In a well balanced tree, this height is minimal, but in an imbalanced tree it can be rather large.

That being said, here's what wiki has to say about red/black trees.
Wikipedia
Quote
In computer science, a self-balancing binary search tree or height-balanced binary search tree is a binary search tree that attempts to keep its height, or the number of levels of nodes beneath the root, as small as possible at all times, automatically. It is one of the most efficient ways of implementing associative arrays, sets, and other data structures.


DavidHaley said:
Justice said:
Having read your release notes and documentation, I believe this code will be easier to use and maintain.

Mind if I ask why? It sounds like you have found non-trivial problems with the ease of use and maintainability – I would be very happy to improve this stuff if anything for my own uses. If it's just that you prefer your own code, that's a perfectly valid reason… I sometimes feel that way myself. :wink: (Obviously, it's usually easier to deal with something you wrote yourself, because you know it inside out.)


Well, I wouldn't state that they're non-trivial. Generally preferences and what I consider "good practices".

For the usability, as I've stated before, I view the use of an abstract parent to be unnecessary overhead, additionally your documentation suggests you use a macro to define the table. This macro then generates a struct, introducing an additional unnecessary layer. Granted, you can use the object directly without it's abstract parent, or the macro, but without a greater understanding of your code, I'm not certain what the ramifications of this would be.

For the maintainability, by using a fast and stable associative array, the code deals specifically with the problem. Specifically the creation and deletion of strings, and maintaining the reference counts. As a result, it's much simpler and is easier to read. Given the problem at hand, I can assume consistent performance. Of a lesser note, since the entirety of the table is stored in a single standard container it presents itself to more conventional coding practices that make it easier to read and work on.

On a final note, from your release documents, I see that over a course of 13 months, you've made several changes to the code to improve performance and to fix a crash bug. By your own admission, due to a poor hashing function, you likely have worst performance than std::map. Given the limits of my time outside of my other activities (work, recreation, etc), I'd rather not waste the man-hours writing my own implementations with little to no actual benefits. Learned my lesson on that several years ago.
04 Sep, 2007, David Haley wrote in the 25th comment:
Votes: 0
Justice said:
Perhaps it's just semantics… but generally speaking, I view the interface by which a table is accessed as part of maintaining the table. Especially when the said interface defines some of the functionality. Given that only one implementation exists, and in a given application, only one implementation is likely to be used… In my opinion, the overhead of access through virtual functions is wasteful and unnecessary.

Well, a ref-counting string table always has the same interface, no matter how you implement it under the hood. In that sense, I could have just as easily stuck a std::map in instead of a hash table and it'd still be a string table, with the exact same interface. I'm not sure I understand what you're getting at.

The overhead of virtual functions is a good point, however, recall that in C++ if you access the sub-type directly, and that sub-type's methods are not defined to be virtual, you do not trigger a v-table lookup. Since only the destructor is declared virtual in the implementation, there is no v-table lookup for the normal functions, and thus no overhead.

Justice said:
Well, I wouldn't say that I have empirical data on the similarity of strings. However in my experience, builders have preferences in terms of themes they are comfortable using. This results in strings that are quite similar textually. In a stock smaug I can see many references to "a glowing <blah> potion". Or "the dwarven <blah>". "A gnome <blah>".

So by similarity, you mean prefix-similarity. It would seem, then, that a hash table that hashes by taking "random" (not truly random of course) characters from the string would do a lot better than a binary sort function that steps through the string linearly.

Indeed, if you have very many strings that start with "the dwarven …", then your sort function has to keep re-comparing all these "the dwarven" prefixes. But, if your hash function sampled characters from the string; say, the first two, the middle two, and the last two, you might get better performance.

I appreciate your rather in-depth analysis of the hash table for I feel it might help people a great deal in understanding these issues. For my part, I already am pretty aware of all this; my intentions have been to, at some point, get actual empirical data on all this. For instance, I wanted to get data on average bucket size, disparity of the buckets and so forth; I also wanted to try all this with a std::map implementation and compare lookup times. I think this question can be answered quite empirically, and since most MUDs have fairly constant strings, it is not inappropriate to examine your empirical data and choose the solution that maximizes average performance for your particular data set.

Justice said:
For the usability, as I've stated before, I view the use of an abstract parent to be unnecessary overhead,

Do you mean performance or conceptual? If the latter, personally I felt it helped separate quite crisply the interface from various implementation methods. Additionally, my system allows for several string table managers to coexist; for one batch of strings it might be more appropriate to use one method rather than the other. Thus it made sense, for me, to be able to implement the same interface several ways, since the string class itself doesn't care how its manager is implemented.

Justice said:
additionally your documentation suggests you use a macro to define the table. This macro then generates a struct, introducing an additional unnecessary layer. Granted, you can use the object directly without it's abstract parent, or the macro, but without a greater understanding of your code, I'm not certain what the ramifications of this would be.

The macro is just a convenience to avoid copy-pasting the manager skeleton. Also, the wrappers were used because you need a class type to pass as a template parameter to the string, so that the shared string specialized type knows how to get in touch with its manager.

But indeed, all you need to do is to specialize the SharedString template with any class that implements the interface it's expecting (the one from the manager).

Justice said:
For the maintainability, by using a fast and stable associative array, the code deals specifically with the problem. Specifically the creation and deletion of strings, and maintaining the reference counts. As a result, it's much simpler and is easier to read. Given the problem at hand, I can assume consistent performance. Of a lesser note, since the entirety of the table is stored in a single standard container it presents itself to more conventional coding practices that make it easier to read and work on.

I suppose so. As a matter of "good practices", I don't think it's appropriate for outside code to be able to touch the innards of the string table, which is why I kept it separate.

Justice said:
On a final note, from your release documents, I see that over a course of 13 months, you've made several changes to the code to improve performance and to fix a crash bug. By your own admission, due to a poor hashing function, you likely have worst performance than std::map. Given the limits of my time outside of my other activities (work, recreation, etc), I'd rather not waste the man-hours writing my own implementations with little to no actual benefits. Learned my lesson on that several years ago.

Well, it's true that I worked on the project for a while and am still working on it, but don't conclude that I spent a lot of time on it. :smile: Rather, those releases are mainly just "oops, I forget that" type things that didn't take more than about 15 minutes each (except for maybe one or two). But sure, I grant that there might not be a lot of use in reinventing the wheel, except perhaps my own personal edification. That is why I've been thinking of comparing with the std::map, in addition to finding a better hash function out there. I certainly don't really want to write my own… getting it right is hard work. Then again, it would be quite easy to come up with something better than the very silly SMAUG function and the still-very-silly-but-much-better "SampleHash" I provide.

I guess another thing is that while BSTs tend to be superb in the general case if you don't really know what kind of data to expect, this problem is something of a special case. We know almost exactly what to expect; in fact, the only variable (as far as the MUD is concerned) is the small amount of player-entered data, e.g. the player names. Therefore you could in principle perfect the hash function to optimize (or even approximately optimize) bucket disparity, which you cannot really do with a BST. Whether or not you'd want to do this depends to a large extent on how much time you are spending/wasting in string management.

It's also worth noting that e.g. the Lua and Java languages choose to use hash tables for many parts of the language implementation. In Lua's case, the tables – the core (only) data structure – are implemented using hashing, and all strings in the language are interned using a hash table as well. Apparently they studied hashing vs. BSTs and hashing came out much better for them. IOW, there's a reason for hash tables to exist too. :smile:

I appreciate that you took the time to give me your comments, though. They were pretty helpful, if at the very least to give me a good external perspective.
05 Sep, 2007, MaineCoon wrote in the 26th comment:
Votes: 0
Justice said:
This code is interesting. Rather than using a manager, each string attempts to maintain it's own references. It probably uses less CPU than how me and david handle this, but likely uses more memory. (without a central list, duplicate strings can exist). It looks useful where a single string would be referenced from multiple locations, like an index passing it's strings to an instance.


I'm the author of that code. As stated, LexiMUD does not use a hash table.

In LexiMUD, a string is usually only shared among instances of the same entity (for example, in LexiMUD, an object with a VirtualID of weapons:rifles:32 would share its namelist, short, long, and extended descs among all instances of weapons:rifles:32 that have not been given custom strings, but an object with a VID of weapons:rifles:33 that used the same strings would have its own pool). The sharing can occur automatically by default copy constructor and by default assignment operator, or manually in a constructor initialization lists (copy constructor or otherwise) and by assignment.

Memory wise a hash table is only efficient if you truly have many duplicate strings that are duplicated among non-related entities - say, 2 entirely different VNums. std::map is a very poor performer both CPU and memory wise. It is not too difficult to write a good hash table and algorithm that is more efficient (I recommend ELF hash for the algorithm; I use a case insensitive version in my script engine's virtual machine for handling variables). I ran an experiment and found that memory usage actually increased in my MUD, due to the overhead of the hash table, and boot time increased. At the time I had about 1500 mob prototypes and 3000 mobs loaded, 5800 object prototypes with 11000 objects loaded, 3000 script prototypes with 6700 script instances, and 17500 rooms. I saw memory increase by about 2%, and boot time increase from 30 seconds to 50 seconds. The memory decrease was not significant - about 1 meg - but the boot speed was definitely a concern. I'm no stranger to hash functions, hash tables, and writing efficient containers, having done so plenty before in my line of work.

My actual Shared String code takes only 30 lines instead of nearly 100, because the Implementation * is a SharedPtr<Implementation>, and the Implementation itself inherits from Shareable, which is my generic shared-pointer framework. In the code posted here I merged the necessary parts directly into the class and pared it down. I also do everything in namespace Lexi, and I use my own String class, which is a case-insensitive version of std::string, and non-templatized to boot, which does not depend on std::string but implements it's complete interface. You achieve the first part fairly easily, by specializing std::basic_string:

#include <string>
#include <ctype.h>


struct char_traits : public std::char_traits<char>
{
static bool eq(const char_type& c1, const char_type& c2)
{return static_cast<bool>(tolower(c1) == tolower(c2));}
static bool lt(const char_type& c1, const char_type& c2)
{return static_cast<bool>(tolower(c1) < tolower(c2));}

static int compare(const char_type* s1, const char_type* s2, size_t n)
{return strncasecmp(s1, s2, n);}
};


typedef std::basic_string<char, char_traits, std::allocator<char> > String;


Updated: edited the metrics after digging through some old logs
05 Sep, 2007, MaineCoon wrote in the 27th comment:
Votes: 0
Justice said:
For the usability, as I've stated before, I view the use of an abstract parent to be unnecessary overhead,

DavidHaley said:
The overhead of virtual functions is a good point, however, recall that in C++ if you access the sub-type directly, and that sub-type's methods are not defined to be virtual, you do not trigger a v-table lookup. Since only the destructor is declared virtual in the implementation, there is no v-table lookup for the normal functions, and thus no overhead.


The overhead of a virtual function call is relatively minimal, especially on modern hardware. The overhead of it being a function call rather than inline can sometimes be enough to make the extra overhead of it being virtual be minimal in comparison. The extra level of indirection is usually insignificant since, ideally, your most used classes will have chunks of their vtables in the cache. Either way, it is far less than the overhead of a pipeline flush on a pre-Core2 or pre-Athlon64 chip, such as any P4 chip, with a failed branch prediction. I would only be concerned about performance of virtuals if it was in tight loops of performance sensitive code. You should worry more about clean design, while keeping performance in mind, than the other way around. If performance is an issue, you can narrow down the problem areas with profiling, and a clean design will make performance tuning much easier. How much CPU are you really using, usually? Most of the time, you don't need to worry about it.

Want to avoid performance issues? Don't use STL excessively. That alone will save you a lot.
05 Sep, 2007, David Haley wrote in the 28th comment:
Votes: 0
MaineCoon said:
The overhead of it being a function call rather than inline can sometimes be enough to make the extra overhead of it being virtual be minimal in comparison.

Indeed – vtable overhead can be as simple as an assembly jump table, whereas a function call usually entails whole stack frames being pushed and popped.

Fear, if you have time, I would greatly appreciate your thoughts on my shared string library as well.
05 Sep, 2007, Justice wrote in the 29th comment:
Votes: 0
DavidHaley said:
I appreciate your rather in-depth analysis of the hash table for I feel it might help people a great deal in understanding these issues. For my part, I already am pretty aware of all this; my intentions have been to, at some point, get actual empirical data on all this.

Thanks, I felt a need to describe the algorithms in order to compare them. I took a look at wiki's hash table article. The vast majority of it was about the difficulty of writing a good hash table, including ways to evaluate the performance.

DavidHaley said:
It's also worth noting that e.g. the Lua and Java languages choose to use hash tables for many parts of the language implementation. In Lua's case, the tables – the core (only) data structure – are implemented using hashing, and all strings in the language are interned using a hash table as well. Apparently they studied hashing vs. BSTs and hashing came out much better for them. IOW, there's a reason for hash tables to exist too. :smile:


Yeah, well tuned hash tables do have better overall performance than trees do. That's where a cost benefit analysis comes into play. Are the benefits worth the additional development time to write and evaluate a hash function?

Given the life cycle of the string manager, it is primarily accessed during load time, and then infrequently when new strings need to be assigned or dereferenced. Java implementations, for example, often use a hashtable for member lookup. Since this is an integral part of the language, the additional performance potential is worth it. This being a special case in that the performance is essential, but the tables are not updated during execution. This allows the algorithm to focus on lookup performance over modification.
05 Sep, 2007, David Haley wrote in the 30th comment:
Votes: 0
Justice said:
Given the life cycle of the string manager, it is primarily accessed during load time, and then infrequently when new strings need to be assigned or dereferenced. Java implementations, for example, often use a hashtable for member lookup.

That's a good point. It's true that the string table isn't hit that often, except for dereferencing strings (but that doesn't involve the table anyhow, and furthermore isn't a "frequent" event by the measures we're talking about). Lua's usage of hash tables is not only for method lookup, but indeed for any member lookup of a table, which means that it happens all the time and is an absolutely critical component of the language. One of these days I'm going to look at their hash function and see how it works.

Still though, I think I'm going to try collecting empirical data at some point, if anything because I'm quite curious by now at just how often the table is touched, and what kind of strings are in there…
05 Sep, 2007, Justice wrote in the 31st comment:
Votes: 0
MaineCoon said:
std::map is a very poor performer both CPU and memory wise.
… snip …
I saw memory increase by about 2%, and boot time increase from 30 seconds to 50 seconds.line of work.


I'm having trouble believing that.

Given a scenario with 60 character unique strings that are nearly identical: (worst case for the comparison function)
1000 strings : 0.003603 seconds
10000 strings : 0.049894 seconds
100000 strings : 0.632658 seconds
1000000 strings : 7.752863 seconds

Given a scenario with varied strings (10 patterns instead of 1), some 60 char, some 10-20, some less.
1000 strings : 0.001998 seconds
10000 strings : 0.029620 seconds
100000 strings : 0.382988 seconds
1000000 strings : 3.838754 seconds


Both cases are a worst case for the tree, that is, the key does not exist in the table forcing the tree to search it's entire height. The second case demonstrates the effect of the comparison function on the tree. With 10 patterns for the key generator, the performance roughly doubles.
06 Sep, 2007, Tyche wrote in the 32nd comment:
Votes: 0
Actually a much better implementation for a shared string manager would be a TST (ternary search trie). I posted an implementation in both C++ and Ruby a couple years back on TMC specially customized to implement a shortest string command parser. I believe I (and/or Eero/Muir) posted some timing data comparing it with both a hash table implementation and red/black tree and it kicked both their asses. ;-P

The Ruby version of TST command parser winded up in TeensyMud.

Give me a couple days and I'll post my String class from TychoMud hooked up with the TST as a shared string manager. I believe I can get it to work with mutable strings as well, if I figure out how to write a delete method for the trie. :-)
06 Sep, 2007, David Haley wrote in the 33rd comment:
Votes: 0
Ah, a variation on lexical tries (aka Patricia trees), which were going to be what I suggested next, actually. (I have a Lua implementation lying around somewhere, as well as a C++ implementation.)

Lookup time for a word is exactly equal to that word's length, and you get massive compression for shared string prefixes.

Justice, this is one large reason why I chose to separate the interface from implementation in my class design: now I can go write a manager that uses a trie and immediately compare both implementations because they look identical to the rest of the world.
06 Sep, 2007, David Haley wrote in the 34th comment:
Votes: 0
I should probably mention, incidentally, that I used my Lua trie to implement a command table mapping strings to functions. It knocked the tail right off SMAUG's hash table method, and a BST as well.

I have it around somewhere – I'll post it soon somewhere.
06 Sep, 2007, Justice wrote in the 35th comment:
Votes: 0
DavidHaley said:
Justice, this is one large reason why I chose to separate the interface from implementation in my class design: now I can go write a manager that uses a trie and immediately compare both implementations because they look identical to the rest of the world.


David, it doesn't actually matter. I can do exactly the same thing using composition instead of inheritance.

Consider this… if I changed the names of my functions a little bit… get_ref is now addString, remove_ref is now deleteString, count_ref is now countRefs…
My code could probably be used by the majority of the code that relies on your string manager. Without a class heirarchy. You can also use templates to acheive this same goal.

The point being that I can swap out implementations if I feel one works better without a formal interface. I could for example, implement a wrapper around Tyche's TST if I felt the need, and it wouldn't affect the code beneath it at all. Back to the KISS principle, is polymorphism really necessary for this?
06 Sep, 2007, David Haley wrote in the 36th comment:
Votes: 0
Well, sure, if you wanted to use the same string management interface – and thus the same shared string class – but with two different managers. For instance, different kinds of string tables might be better served by different underlying implementations. Having the interface allows this immediately; what you propose makes it more complicated.
06 Sep, 2007, Justice wrote in the 37th comment:
Votes: 0
We can play the "what if" game all day long. Sure, a formal interface has it's uses, but in this case I don't think it does anything for you.

How likely is it that you will need more than one implementation of the string table in a single application?
Realistically, not very likely. Consider the following:

The std::map based code I wrote averaged a worst case performance of about .000007 seconds.

Now, lets assume you need better performance than that. The 10 pattern test clearly shows that the comparison function is responsible for a large % of the overhead. A TST like Tyche suggests could help with that. Assuming it's well implemented, you should get somewhat better performance than the 3-7 millionths of a second the std::map provides us. Why would you need a second implementation?

So lets say you wanted to update your code to use the new StringManager_TST instead of the old StringManager_Map. With an implied interface like I suggested, you simply:

StringManager_Map mngr;

StringManager_TST mngr;


In addition you can prepare for this type of global change with:
typedef StringManager StringManager_TST;

Finally, assuming I want to use a single class with multiple types of StringManagers, a template should be able to handle it.

Heh, I find it amusing how you keep bringing up that interface of yours as if its a major consideration. The only reason I mentioned it was because you asked for my opinion. That opinion is basically… what is the use? The only thing you've mentioned so far that it would actually be used for is testing, in which case making the test a template would handle it just as well.
06 Sep, 2007, David Haley wrote in the 38th comment:
Votes: 0
I was just mentioning it. I saw a case study where I thought the interface served its purpose; you clearly disagree. That said, your point about using templates is rather exactly what I'm talking about… :smile: (One advantage the interface serves in that case is to make it extremely clear to somebody subclassing the manager what needs to be implemented. And we already saw that there isn't a v-table penalty due to the methods not being virtual in the subclasses.) But this is not an important point to me; I was just pointing out something to explain why I chose to do things the way I did, to answer your question "what is the use?". I wasn't trying to start a debate or devalue your opinion.

One thing, though, is that CPU time isn't the only thing to consider. We've both said several times that the strings don't change all that much after boot, and boot time is more or less irrelevant in the global scheme of things (within reason, naturally); perhaps, then, it might be more important to consider memory usage instead of CPU – or, at least, add more weight to memory as you examine the trade-offs.
06 Sep, 2007, Justice wrote in the 39th comment:
Votes: 0
Memory usage is something we can't really examine without actually testing the various implementations.

A hashtable that is virtually full will likely use the least amount of memory, however being virtually full is horrible for performance as it increases the chances of clustering… making the table imbalanced. Once again it becomes a tuning problem, with the hashtable being able to exhibit extreme performance on one hand, and horrid performance on the other. It's this sensitivity that makes me believe a hash table is not the right solution.

The TST will vary quite a bit. In a TST, each character in a string corresponds to a branch node, with the entire string being leaf nodes. Given widely varying strings, it would have a low density and thus require more branch nodes to represent the strings. Given similar strings, the TST would excel, as most of the nodes would represent multiple strings. This being said, either of my worst case tests for the std::map implementation would be "best case" tests for a TST.

The RB tree used by std::map would be the most consistent in regards to memory usage. It requires at most 1 node per string.

How do these compare against each other? It really depends on the situation. The Trie will exhibit it's best space efficiency with short strings that are similar. The hashtable will exhibit it's best space efficiency when it is virtually full. The RB tree will exhibit the similar space efficiency in any condition.

The Trie will exhibit the same lookup performance regardless of it's space efficiency. The hashtable will tend to degrade in lookup performance the closer it is to being full. The RB tree's lookup performance is proportional to the height of it's tree. (although there was no noticeable degradation between 100k and 1m strings).
06 Sep, 2007, Justice wrote in the 40th comment:
Votes: 0
Felt the need to separate this from the discussion on space efficiency.

DavidHaley said:
I was just mentioning it. I saw a case study where I thought the interface served its purpose; you clearly disagree. That said, your point about using templates is rather exactly what I'm talking about… :smile: (One advantage the interface serves in that case is to make it extremely clear to somebody subclassing the manager what needs to be implemented.


Well, I don't disagree that an interface is helpful in that circumstance. What I disagree with is that the circumstance is likely to ever occur. Given that in the event it does occur, there are other options available… what exactly does the abstract parent do for you?

As for the clarity, I don't see what you're talking about. My code has exactly 3 methods. I've pasted both my declaration and yours (stripped of comments). Which is clearer?

class StringManager
{
protected:
typedef std::map<const std::string*, unsigned int, string_compare>::iterator ITERATOR;
typedef std::pair<const std::string*, unsigned int> PAIR;
std::map<const std::string*, unsigned int, string_compare> content;
public:
StringManager(void);
~StringManager(void);
const std::string *get_ref(const std::string &str);
unsigned int count_ref(const std::string &str);
unsigned int remove_ref(const std::string &str);
};


Vs:
class StringManager
{
public:
StringManager();
virtual ~StringManager();
const std::string * addString(const std::string & str);
void deleteString(const std::string & str);
size_t countRefs(const std::string & str) const;
size_t numStrings() const;
virtual size_t bytesUsed() const = 0;
virtual void dumpTable(std::ostream & os) const = 0; // for debugging: dump whole table to os.

protected:
size_t numStrings_;
virtual SharedStringEntry * prepareEntry(const std::string & str) = 0;
virtual bool unrefEntry(const std::string & str) = 0;
virtual const SharedStringEntry * getEntry(const std::string & str) const = 0;
};
20.0/49