Thread: pgsql: Fix building of large (bigger than shared_buffers) hash indexes.

Fix building of large (bigger than shared_buffers) hash indexes.

When the index is predicted to need more than NBuffers buckets,
CREATE INDEX attempts to sort the index entries by hash key before
insertion, so as to reduce thrashing.  This code path got broken by
commit 9f03ca915196dfc8, which overlooked that _hash_form_tuple() is not
just an alias for index_form_tuple().  The index got built anyway, but
with garbage data, so that searches for pre-existing tuples always
failed.  Fix by refactoring to separate construction of the indexable
data from calling index_form_tuple().

Per bug #14210 from Daniel Newman.  Back-patch to 9.5 where the
bug was introduced.

Report: <20160623162507.17237.39471@wrigleys.postgresql.org>

Branch
------
REL9_5_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/07f69137b15e594edfaec29f73efa86aa442902c

Modified Files
--------------
src/backend/access/hash/hash.c     | 34 +++++++++++++++--------------
src/backend/access/hash/hashutil.c | 44 +++++++++++++++++++++++---------------
src/include/access/hash.h          |  5 +++--
3 files changed, 48 insertions(+), 35 deletions(-)


Re: pgsql: Fix building of large (bigger than shared_buffers) hash indexes.

From
Bruce Momjian
Date:
On Fri, Jun 24, 2016 at 08:57:47PM +0000, Tom Lane wrote:
> Fix building of large (bigger than shared_buffers) hash indexes.
>
> When the index is predicted to need more than NBuffers buckets,
> CREATE INDEX attempts to sort the index entries by hash key before
> insertion, so as to reduce thrashing.  This code path got broken by
> commit 9f03ca915196dfc8, which overlooked that _hash_form_tuple() is not
> just an alias for index_form_tuple().  The index got built anyway, but
> with garbage data, so that searches for pre-existing tuples always
> failed.  Fix by refactoring to separate construction of the indexable
> data from calling index_form_tuple().
>
> Per bug #14210 from Daniel Newman.  Back-patch to 9.5 where the
> bug was introduced.
>
> Report: <20160623162507.17237.39471@wrigleys.postgresql.org>

Do we have any way of helping people find out if they need to recreate
their hash indexes?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +


Re: pgsql: Fix building of large (bigger than shared_buffers) hash indexes.

From
Peter Geoghegan
Date:
On Mon, Jun 27, 2016 at 2:27 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Do we have any way of helping people find out if they need to recreate
> their hash indexes?

No, but I don't think that it's especially needed. It ought to be
completely obvious when the problem arises, because the resulting
index is total garbage.

This tells us a lot about how many people use hash indexes in
production, of course. 9.5 has been out for months.

--
Peter Geoghegan


Bruce Momjian <bruce@momjian.us> writes:
> On Fri, Jun 24, 2016 at 08:57:47PM +0000, Tom Lane wrote:
>> Fix building of large (bigger than shared_buffers) hash indexes.

> Do we have any way of helping people find out if they need to recreate
> their hash indexes?

I do not think that's much of an issue.  The failure mode is hardly
non-obvious, because index searches would NEVER find any pre-existing
rows.  (Well, there'd be circa one chance in 2^32 of an accidental
hashcode match, but that certainly won't be enough to mask the fact
that the index is broken.)  Anyway, users of hash indexes are probably
accustomed to needing to reindex, on account of no WAL support :-(

            regards, tom lane