On Mon, 2003-11-03 at 16:39, Tom Lane wrote:
> The point is that the elements of the lists could be large, complicated
> structures that will take noticeable amounts of time to compare.
Good point, I agree with your analysis. At the very least, I'll add a
comment to equal() explaining why the algorithm works as it does.
> I will grant that there are situations where your proposed change would
> be a win --- in particular, for long lists wherein the first few
> elements differ, it'd be faster to do it as you suggest.
Do you think it would be worth the trouble to use both algorithms, and
then test on the node tag of the first element to decide which one to
use? (The assumption being lists are homogeneous). We wouldn't need to
enumerate all the possible node tags -- just a few common,
known-to-be-small-and-easy-to-compare types would use the new algorithm,
whereas everything else would use the existing code.
One thing I've been wondering about is whether it would be worth ripping
out the existing List code wholesale, and replacing it with something
like the following:
struct ListNode
{
union
{
void *ptr_value;
int int_value;
Oid oid_value;
} elem;
struct ListNode *next;
};
struct List
{
struct ListNode *head;
struct ListNode *tail;
int length;
};
That would fix the O(n) behavior of both lappend() and length() in one
swell foop -- we've had performance problems from the former on a couple
of occasions before, and given the ~350 odd uses of lappend() in the
backend, I suspect there are more problems waiting to be found. For
example, try creating a 75,000 line pg_hba.conf file, and watch the
postmaster chew minutes (hours?) of CPU time trying to parse it. (That
particular example is unrealistic, of course. )I'd also expect that
keeping track of the length of most lists will end up saving cycles,
overall.
One downside would be that we wouldn't get the nice Lisp-like semantics
of the existing List, but I'm not sure how necessary those are.
Comments?
-Neil