Thread: Hashjoin status report
I've committed fixes that deal with all of the coredump problems I could find in nodeHash.c (there were several :-(). But the code still has a fundamental design flaw: it uses a fixed-size overflow area to hold tuples that don't fit into the hashbuckets they are assigned to. This means you get "hashtable out of memory" errors if the distribution of tuples is skewed enough, or if the number of hashbuckets is too small because the system underestimated the number of tuples in the relation. Better than a coredump I suppose, but still very bad, especially since the optimizer likes to use hashjoins more than it used to. What I would like to do to fix this is to store the tuples in a Portal instead of in a fixed-size palloc block. While at it, I'd rip out the "relative vs. absolute address" cruft that is in the code now. (Apparently there was once some thought of using a shared memory block so that multiple processes could share the work of a hashjoin. All that remains is ugly, bug-prone code ...) The reason I bring all this up is that it'd be a nontrivial revision to nodeHash.c, and I'm uncomfortable with the notion of making such a change this late in a beta cycle. On the other hand it *is* a bug fix, and a fairly important one IMHO. Opinions? Should I plow ahead, or leave this to fix after 6.5 release? regards, tom lane
On Thu, 6 May 1999, Tom Lane wrote: > I've committed fixes that deal with all of the coredump problems > I could find in nodeHash.c (there were several :-(). > > But the code still has a fundamental design flaw: it uses a fixed-size > overflow area to hold tuples that don't fit into the hashbuckets they > are assigned to. This means you get "hashtable out of memory" errors > if the distribution of tuples is skewed enough, or if the number of > hashbuckets is too small because the system underestimated the number > of tuples in the relation. Better than a coredump I suppose, but still > very bad, especially since the optimizer likes to use hashjoins more > than it used to. > > What I would like to do to fix this is to store the tuples in a Portal > instead of in a fixed-size palloc block. While at it, I'd rip out the > "relative vs. absolute address" cruft that is in the code now. > (Apparently there was once some thought of using a shared memory block > so that multiple processes could share the work of a hashjoin. All that > remains is ugly, bug-prone code ...) > > The reason I bring all this up is that it'd be a nontrivial revision > to nodeHash.c, and I'm uncomfortable with the notion of making such a > change this late in a beta cycle. On the other hand it *is* a bug fix, > and a fairly important one IMHO. > > Opinions? Should I plow ahead, or leave this to fix after 6.5 release? Estimate of time involved to fix this? vs likelihood of someone triggering the bug in production? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes: >> Opinions? Should I plow ahead, or leave this to fix after 6.5 release? > Estimate of time involved to fix this? vs likelihood of someone > triggering the bug in production? I could probably get the coding done this weekend, unless something else comes up to distract me. It's the question of how much testing it'd receive before release that worries me... As for the likelihood, that's hard to say. It's very easy to trigger the bug as a test case. (Arrange for a hashjoin where the inner table has a lot of identical rows, or at least many sets of more-than-10- rows-with-the-same-value-in-the-field-being-hashed-on.) In real life you'd like to think that that's pretty improbable. What started this go-round was Contzen's report of seeing the "hash table out of memory. Use -B parameter to increase buffers" message in what was evidently a real-life scenario. So it can happen. Do you recall having seen many complaints about that error before? regards, tom lane
I said: > What started this go-round was Contzen's report of seeing the > "hash table out of memory. Use -B parameter to increase buffers" > message in what was evidently a real-life scenario. So it can happen. > Do you recall having seen many complaints about that error before? A little bit of poking through the mailing list archives found four other complaints in the past six months (scattered through the sql, admin, and novices lists). There might have been more that my search query missed. What do you think, does that reach your threshold of pain or not? regards, tom lane
On Thu, 6 May 1999, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > >> Opinions? Should I plow ahead, or leave this to fix after 6.5 release? > > > Estimate of time involved to fix this? vs likelihood of someone > > triggering the bug in production? > > I could probably get the coding done this weekend, unless something else > comes up to distract me. It's the question of how much testing it'd > receive before release that worries me... We're looking at 3 weeks till release...I'll let you call it on this one. If you feel confident about getting the bug fixed before release, with enough time for testing, go for it. If it makes more sense to make it an 'untested patch', go for that one. I believe you can fix this bug, I'm just kinda nervous about adding a different one, which I believe is your worry also, with the limited amount of testing possible :( My preference would be a "use at own risk" patch... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Tom Lane wrote: > > I've committed fixes that deal with all of the coredump problems > I could find in nodeHash.c (there were several :-(). > > But the code still has a fundamental design flaw: it uses a fixed-size > overflow area to hold tuples that don't fit into the hashbuckets they > are assigned to. This means you get "hashtable out of memory" errors > if the distribution of tuples is skewed enough, or if the number of > hashbuckets is too small because the system underestimated the number > of tuples in the relation. Better than a coredump I suppose, but still > very bad, especially since the optimizer likes to use hashjoins more > than it used to. > > What I would like to do to fix this is to store the tuples in a Portal > instead of in a fixed-size palloc block. While at it, I'd rip out the > "relative vs. absolute address" cruft that is in the code now. > (Apparently there was once some thought of using a shared memory block > so that multiple processes could share the work of a hashjoin. All that > remains is ugly, bug-prone code ...) Fix it! Testing is easy... Though, I would use chain of big blocks for overflow area, not Portal - it's too big thing to be directly used in join method. Vadim
On Thu, 6 May 1999, Tom Lane wrote: > I said: > > What started this go-round was Contzen's report of seeing the > > "hash table out of memory. Use -B parameter to increase buffers" > > message in what was evidently a real-life scenario. So it can happen. > > Do you recall having seen many complaints about that error before? > > A little bit of poking through the mailing list archives found four > other complaints in the past six months (scattered through the sql, > admin, and novices lists). There might have been more that my search > query missed. > > What do you think, does that reach your threshold of pain or not? Considering the number of instances we have out there, it sounds like a very rare 'tweak' to the bug...I'd say keep it as an 'untested patch' for the release, for those that wish to try it if they ahve a problem, and include it for v6.6 and v6.5.1 ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> The Hermit Hacker <scrappy@hub.org> writes: > >> Opinions? Should I plow ahead, or leave this to fix after 6.5 release? > > > Estimate of time involved to fix this? vs likelihood of someone > > triggering the bug in production? > > I could probably get the coding done this weekend, unless something else > comes up to distract me. It's the question of how much testing it'd > receive before release that worries me... > > As for the likelihood, that's hard to say. It's very easy to trigger > the bug as a test case. (Arrange for a hashjoin where the inner table > has a lot of identical rows, or at least many sets of more-than-10- > rows-with-the-same-value-in-the-field-being-hashed-on.) In real life > you'd like to think that that's pretty improbable. > > What started this go-round was Contzen's report of seeing the > "hash table out of memory. Use -B parameter to increase buffers" > message in what was evidently a real-life scenario. So it can happen. > Do you recall having seen many complaints about that error before? We already have a good example for this "hash table out of memory. Use -B parameter to increase buffers" syndrome in our source tree. Go src/test/bench, remove "-B 256" from the last line of runwisc.sh then run the test. The "-B 256" used to not be in there. That was added by me while fixing the test suit and elog() (see included posting). I don't see the error message in 6.4.2. I guess this is due to the change of the optimizer. IMHO, we should fix this before 6.5 is out, or should change the default settings of -B to 256 or so, this may cause short of shmem, however. P.S. At that time I misunderstood in that I didn't have enough sort memory :-< >Message-Id: <199904160654.PAA00221@srapc451.sra.co.jp> >From: Tatsuo Ishii <t-ishii@sra.co.jp> >To: hackers@postgreSQL.org >Subject: [HACKERS] elog() and wisconsin bench test fix >Date: Fri, 16 Apr 1999 15:54:16 +0900 > >I have modified elog() so that it uses its own pid(using getpid()) as >the first parameter for kill() in some cases. It used to get its own >pid from MyProcPid global variable. This was fine until I ran the >wisconsin benchmark test suit (test/bench/). In the test, postgres is >run as a command and MyProcPid is set to 0. As a result elog() calls >kill() with the first parameter being set to 0 and SIGQUIT was issued >to the process group, not the postgres process itself! This was why >/bin/sh got core dumped whenever I ran the bench test. > >Also, I fixed several bugs in the test quries. > >One thing still remains is some queries fail due to insufficient sort >memory. I modified the test script adding -B option. But is this >normal? I think not. I thought postgres should use disk files instead >of memory if there's enough sort buffer. > >Comments? >-- >Tatsuo Ishii
The Hermit Hacker <scrappy@hub.org> writes: > On Thu, 6 May 1999, Tom Lane wrote: >> What do you think, does that reach your threshold of pain or not? > Considering the number of instances we have out there, it sounds like a > very rare 'tweak' to the bug...I'd say keep it as an 'untested patch' for > the release, for those that wish to try it if they ahve a problem, and > include it for v6.6 and v6.5.1 ... OK, we'll deal with it as a post-release patch then. It's not like I haven't got anything else to do before the end of the month :-( regards, tom lane
> The Hermit Hacker <scrappy@hub.org> writes: > >> Opinions? Should I plow ahead, or leave this to fix after 6.5 release? > > > Estimate of time involved to fix this? vs likelihood of someone > > triggering the bug in production? > > I could probably get the coding done this weekend, unless something else > comes up to distract me. It's the question of how much testing it'd > receive before release that worries me... > > As for the likelihood, that's hard to say. It's very easy to trigger > the bug as a test case. (Arrange for a hashjoin where the inner table > has a lot of identical rows, or at least many sets of more-than-10- > rows-with-the-same-value-in-the-field-being-hashed-on.) In real life > you'd like to think that that's pretty improbable. > > What started this go-round was Contzen's report of seeing the > "hash table out of memory. Use -B parameter to increase buffers" > message in what was evidently a real-life scenario. So it can happen. > Do you recall having seen many complaints about that error before? New optimizer does more hashjoins, so we will see it more often. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > We already have a good example for this "hash table out of memory. Use > -B parameter to increase buffers" syndrome in our source tree. Go > src/test/bench, remove "-B 256" from the last line of runwisc.sh then > run the test. The "-B 256" used to not be in there. That was added by > me while fixing the test suit and elog() (see included posting). I > don't see the error message in 6.4.2. I guess this is due to the > change of the optimizer. > IMHO, we should fix this before 6.5 is out, or should change the > default settings of -B to 256 or so, this may cause short of shmem, > however. It's fixed --- do you want to remove -B 256 from the test suite? regards, tom lane