Hashjoin status report - Mailing list pgsql-hackers

From Tom Lane
Subject Hashjoin status report
Date
Msg-id 9026.926007153@sss.pgh.pa.us
Whole thread Raw
Responses Re: [HACKERS] Hashjoin status report  (The Hermit Hacker <scrappy@hub.org>)
List pgsql-hackers
I've committed fixes that deal with all of the coredump problems
I could find in nodeHash.c (there were several :-().

But the code still has a fundamental design flaw: it uses a fixed-size
overflow area to hold tuples that don't fit into the hashbuckets they
are assigned to.  This means you get "hashtable out of memory" errors
if the distribution of tuples is skewed enough, or if the number of
hashbuckets is too small because the system underestimated the number
of tuples in the relation.  Better than a coredump I suppose, but still
very bad, especially since the optimizer likes to use hashjoins more
than it used to.

What I would like to do to fix this is to store the tuples in a Portal
instead of in a fixed-size palloc block.  While at it, I'd rip out the
"relative vs. absolute address" cruft that is in the code now.
(Apparently there was once some thought of using a shared memory block
so that multiple processes could share the work of a hashjoin.  All that
remains is ugly, bug-prone code ...)

The reason I bring all this up is that it'd be a nontrivial revision
to nodeHash.c, and I'm uncomfortable with the notion of making such a
change this late in a beta cycle.  On the other hand it *is* a bug fix,
and a fairly important one IMHO.

Opinions?  Should I plow ahead, or leave this to fix after 6.5 release?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Michael J Davis
Date:
Subject: RE: [HACKERS] I'm planning some changes in lmgr...
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] It would be nice if this could be fixed...