Thread: Hashjoin status report

Hashjoin status report

From
Tom Lane
Date:
I've committed fixes that deal with all of the coredump problems
I could find in nodeHash.c (there were several :-().

But the code still has a fundamental design flaw: it uses a fixed-size
overflow area to hold tuples that don't fit into the hashbuckets they
are assigned to.  This means you get "hashtable out of memory" errors
if the distribution of tuples is skewed enough, or if the number of
hashbuckets is too small because the system underestimated the number
of tuples in the relation.  Better than a coredump I suppose, but still
very bad, especially since the optimizer likes to use hashjoins more
than it used to.

What I would like to do to fix this is to store the tuples in a Portal
instead of in a fixed-size palloc block.  While at it, I'd rip out the
"relative vs. absolute address" cruft that is in the code now.
(Apparently there was once some thought of using a shared memory block
so that multiple processes could share the work of a hashjoin.  All that
remains is ugly, bug-prone code ...)

The reason I bring all this up is that it'd be a nontrivial revision
to nodeHash.c, and I'm uncomfortable with the notion of making such a
change this late in a beta cycle.  On the other hand it *is* a bug fix,
and a fairly important one IMHO.

Opinions?  Should I plow ahead, or leave this to fix after 6.5 release?
        regards, tom lane


Re: [HACKERS] Hashjoin status report

From
The Hermit Hacker
Date:
On Thu, 6 May 1999, Tom Lane wrote:

> I've committed fixes that deal with all of the coredump problems
> I could find in nodeHash.c (there were several :-().
> 
> But the code still has a fundamental design flaw: it uses a fixed-size
> overflow area to hold tuples that don't fit into the hashbuckets they
> are assigned to.  This means you get "hashtable out of memory" errors
> if the distribution of tuples is skewed enough, or if the number of
> hashbuckets is too small because the system underestimated the number
> of tuples in the relation.  Better than a coredump I suppose, but still
> very bad, especially since the optimizer likes to use hashjoins more
> than it used to.
> 
> What I would like to do to fix this is to store the tuples in a Portal
> instead of in a fixed-size palloc block.  While at it, I'd rip out the
> "relative vs. absolute address" cruft that is in the code now.
> (Apparently there was once some thought of using a shared memory block
> so that multiple processes could share the work of a hashjoin.  All that
> remains is ugly, bug-prone code ...)
> 
> The reason I bring all this up is that it'd be a nontrivial revision
> to nodeHash.c, and I'm uncomfortable with the notion of making such a
> change this late in a beta cycle.  On the other hand it *is* a bug fix,
> and a fairly important one IMHO.
> 
> Opinions?  Should I plow ahead, or leave this to fix after 6.5 release?

Estimate of time involved to fix this?  vs likelihood of someone
triggering the bug in production?

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Hashjoin status report

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
>> Opinions?  Should I plow ahead, or leave this to fix after 6.5 release?

> Estimate of time involved to fix this?  vs likelihood of someone
> triggering the bug in production?

I could probably get the coding done this weekend, unless something else
comes up to distract me.  It's the question of how much testing it'd
receive before release that worries me...

As for the likelihood, that's hard to say.  It's very easy to trigger
the bug as a test case.  (Arrange for a hashjoin where the inner table
has a lot of identical rows, or at least many sets of more-than-10-
rows-with-the-same-value-in-the-field-being-hashed-on.)  In real life
you'd like to think that that's pretty improbable.

What started this go-round was Contzen's report of seeing the
"hash table out of memory. Use -B parameter to increase buffers"
message in what was evidently a real-life scenario.  So it can happen.
Do you recall having seen many complaints about that error before?
        regards, tom lane


Re: [HACKERS] Hashjoin status report

From
Tom Lane
Date:
I said:
> What started this go-round was Contzen's report of seeing the
> "hash table out of memory. Use -B parameter to increase buffers"
> message in what was evidently a real-life scenario.  So it can happen.
> Do you recall having seen many complaints about that error before?

A little bit of poking through the mailing list archives found four
other complaints in the past six months (scattered through the sql,
admin, and novices lists).  There might have been more that my search
query missed.

What do you think, does that reach your threshold of pain or not?
        regards, tom lane


Re: [HACKERS] Hashjoin status report

From
The Hermit Hacker
Date:
On Thu, 6 May 1999, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> >> Opinions?  Should I plow ahead, or leave this to fix after 6.5 release?
> 
> > Estimate of time involved to fix this?  vs likelihood of someone
> > triggering the bug in production?
> 
> I could probably get the coding done this weekend, unless something else
> comes up to distract me.  It's the question of how much testing it'd
> receive before release that worries me...

We're looking at 3 weeks till release...I'll let you call it on this one.
If you feel confident about getting the bug fixed before release, with
enough time for testing, go for it.  If it makes more sense to make it an
'untested patch', go for that one.

I believe you can fix this bug, I'm just kinda nervous about adding a
different one, which I believe is your worry also, with the limited amount
of testing possible :(

My preference would be a "use at own risk" patch...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Hashjoin status report

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> I've committed fixes that deal with all of the coredump problems
> I could find in nodeHash.c (there were several :-().
> 
> But the code still has a fundamental design flaw: it uses a fixed-size
> overflow area to hold tuples that don't fit into the hashbuckets they
> are assigned to.  This means you get "hashtable out of memory" errors
> if the distribution of tuples is skewed enough, or if the number of
> hashbuckets is too small because the system underestimated the number
> of tuples in the relation.  Better than a coredump I suppose, but still
> very bad, especially since the optimizer likes to use hashjoins more
> than it used to.
> 
> What I would like to do to fix this is to store the tuples in a Portal
> instead of in a fixed-size palloc block.  While at it, I'd rip out the
> "relative vs. absolute address" cruft that is in the code now.
> (Apparently there was once some thought of using a shared memory block
> so that multiple processes could share the work of a hashjoin.  All that
> remains is ugly, bug-prone code ...)

Fix it! Testing is easy...
Though, I would use chain of big blocks for overflow area,
not Portal - it's too big thing to be directly used in join method.

Vadim


Re: [HACKERS] Hashjoin status report

From
The Hermit Hacker
Date:
On Thu, 6 May 1999, Tom Lane wrote:

> I said:
> > What started this go-round was Contzen's report of seeing the
> > "hash table out of memory. Use -B parameter to increase buffers"
> > message in what was evidently a real-life scenario.  So it can happen.
> > Do you recall having seen many complaints about that error before?
> 
> A little bit of poking through the mailing list archives found four
> other complaints in the past six months (scattered through the sql,
> admin, and novices lists).  There might have been more that my search
> query missed.
> 
> What do you think, does that reach your threshold of pain or not?

Considering the number of instances we have out there, it sounds like a
very rare 'tweak' to the bug...I'd say keep it as an 'untested patch' for
the release, for those that wish to try it if they ahve a problem, and
include it for v6.6 and v6.5.1 ...


Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Hashjoin status report

From
Tatsuo Ishii
Date:
> The Hermit Hacker <scrappy@hub.org> writes:
> >> Opinions?  Should I plow ahead, or leave this to fix after 6.5 release?
> 
> > Estimate of time involved to fix this?  vs likelihood of someone
> > triggering the bug in production?
> 
> I could probably get the coding done this weekend, unless something else
> comes up to distract me.  It's the question of how much testing it'd
> receive before release that worries me...
> 
> As for the likelihood, that's hard to say.  It's very easy to trigger
> the bug as a test case.  (Arrange for a hashjoin where the inner table
> has a lot of identical rows, or at least many sets of more-than-10-
> rows-with-the-same-value-in-the-field-being-hashed-on.)  In real life
> you'd like to think that that's pretty improbable.
> 
> What started this go-round was Contzen's report of seeing the
> "hash table out of memory. Use -B parameter to increase buffers"
> message in what was evidently a real-life scenario.  So it can happen.
> Do you recall having seen many complaints about that error before?

We already have a good example for this "hash table out of memory. Use
-B parameter to increase buffers" syndrome in our source tree. Go
src/test/bench, remove "-B 256" from the last line of runwisc.sh then
run the test. The "-B 256" used to not be in there. That was added by
me while fixing the test suit and elog() (see included posting).  I
don't see the error message in 6.4.2. I guess this is due to the
change of the optimizer.

IMHO, we should fix this before 6.5 is out, or should change the
default settings of -B to 256 or so, this may cause short of shmem,
however.

P.S. At that time I misunderstood in that I didn't have enough sort
memory :-<

>Message-Id: <199904160654.PAA00221@srapc451.sra.co.jp>
>From: Tatsuo Ishii <t-ishii@sra.co.jp>
>To: hackers@postgreSQL.org
>Subject: [HACKERS] elog() and wisconsin bench test fix
>Date: Fri, 16 Apr 1999 15:54:16 +0900
>
>I have modified elog() so that it uses its own pid(using getpid()) as
>the first parameter for kill() in some cases. It used to get its own
>pid from MyProcPid global variable. This was fine until I ran the
>wisconsin benchmark test suit (test/bench/). In the test, postgres is
>run as a command and MyProcPid is set to 0. As a result elog() calls
>kill() with the first parameter being set to 0 and SIGQUIT was issued
>to the process group, not the postgres process itself! This was why
>/bin/sh got core dumped whenever I ran the bench test.
>
>Also, I fixed several bugs in the test quries.
>
>One thing still remains is some queries fail due to insufficient sort
>memory. I modified the test script adding -B option. But is this
>normal? I think not. I thought postgres should use disk files instead
>of memory if there's enough sort buffer.
>
>Comments?
>--
>Tatsuo Ishii


Re: [HACKERS] Hashjoin status report

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
> On Thu, 6 May 1999, Tom Lane wrote:
>> What do you think, does that reach your threshold of pain or not?

> Considering the number of instances we have out there, it sounds like a
> very rare 'tweak' to the bug...I'd say keep it as an 'untested patch' for
> the release, for those that wish to try it if they ahve a problem, and
> include it for v6.6 and v6.5.1 ...

OK, we'll deal with it as a post-release patch then.  It's not like
I haven't got anything else to do before the end of the month :-(
        regards, tom lane


Re: [HACKERS] Hashjoin status report

From
Bruce Momjian
Date:
> The Hermit Hacker <scrappy@hub.org> writes:
> >> Opinions?  Should I plow ahead, or leave this to fix after 6.5 release?
> 
> > Estimate of time involved to fix this?  vs likelihood of someone
> > triggering the bug in production?
> 
> I could probably get the coding done this weekend, unless something else
> comes up to distract me.  It's the question of how much testing it'd
> receive before release that worries me...
> 
> As for the likelihood, that's hard to say.  It's very easy to trigger
> the bug as a test case.  (Arrange for a hashjoin where the inner table
> has a lot of identical rows, or at least many sets of more-than-10-
> rows-with-the-same-value-in-the-field-being-hashed-on.)  In real life
> you'd like to think that that's pretty improbable.
> 
> What started this go-round was Contzen's report of seeing the
> "hash table out of memory. Use -B parameter to increase buffers"
> message in what was evidently a real-life scenario.  So it can happen.
> Do you recall having seen many complaints about that error before?

New optimizer does more hashjoins, so we will see it more often.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Hashjoin status report

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> We already have a good example for this "hash table out of memory. Use
> -B parameter to increase buffers" syndrome in our source tree. Go
> src/test/bench, remove "-B 256" from the last line of runwisc.sh then
> run the test. The "-B 256" used to not be in there. That was added by
> me while fixing the test suit and elog() (see included posting).  I
> don't see the error message in 6.4.2. I guess this is due to the
> change of the optimizer.

> IMHO, we should fix this before 6.5 is out, or should change the
> default settings of -B to 256 or so, this may cause short of shmem,
> however.

It's fixed --- do you want to remove -B 256 from the test suite?
        regards, tom lane