Re: Hash Indexes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Hash Indexes
Date
Msg-id CA+TgmoavAS_GhM6fLSb7duxxPcK-0o9T18jr_O+34gAUSyKc=A@mail.gmail.com
Whole thread Raw
In response to Re: Hash Indexes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Hash Indexes
List pgsql-hackers
On Tue, Nov 1, 2016 at 9:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 24, 2016 at 10:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> [ new patches ]
>
> I looked over parts of this today, mostly the hashinsert.c changes.

Some more review...

@@ -656,6 +678,10 @@ _hash_squeezebucket(Relation rel,
             IndexTuple  itup;
             Size        itemsz;

+            /* skip dead tuples */
+            if (ItemIdIsDead(PageGetItemId(rpage, roffnum)))
+                continue;

Is this an optimization independent of the rest of the patch, or is
there something in this patch that necessitates it?  i.e. Could this
small change be committed independently?  If not, then I think it
needs a better comment explaining why it is now mandatory.

- *  Caller must hold exclusive lock on the target bucket.  This allows
+ *  Caller must hold cleanup lock on the target bucket.  This allows
  *  us to safely lock multiple pages in the bucket.

The notion of a lock on a bucket no longer really exists; with this
patch, we'll now properly speak of a lock on a primary bucket page.
Also, I think the bit about safely locking multiple pages is bizarre
-- that's not the issue at all: the problem is that reorganizing a
bucket might confuse concurrent scans into returning wrong answers.

I've included a broader updating of that comment, and some other
comment changes, in the attached incremental patch, which also
refactors your changes to _hash_freeovflpage() a bit to avoid some
code duplication.  Please consider this for inclusion in your next
version.

In hashutil.c, I think that _hash_msb() is just a reimplementation of
fls(), which you can rely on being present because we have our own
implementation in src/port.  It's quite similar to yours but slightly
shorter.  :-)   Also, some systems have a builtin fls() function which
actually optimizes down to a single machine instruction, and which is
therefore much faster than either version.

I don't like the fact that _hash_get_newblk() and _hash_get_oldblk()
are working out the bucket number by using the HashOpaque structure
within the bucket page they're examining.  First, it seems weird to
pass the whole structure when you only need the bucket number out of
it.  More importantly, the caller really ought to know what bucket
they care about without having to discover it.

For example, in _hash_doinsert(), we figure out the bucket into which
we need to insert, and we store that in a variable called "bucket".
Then from there we work out the primary bucket page's block number,
which we store in "blkno".  We read the page into "buf" and put a
pointer to that buffer's contents into "page" from which we discover
the HashOpaque, a pointer to which we store into "pageopaque".  Then
we pass that to _hash_get_newblk() which will go look into that
structure to find the bucket number ... but couldn't we have just
passed "bucket" instead?  Similarly, _hash_expandtable() has the value
available in the variable "old_bucket".

The only caller of _hash_get_oldblk() is _hash_first(), which has the
bucket number available in a variable called "bucket".

So it seems to me that these functions could be simplified to take the
bucket number as an argument directly instead of the HashOpaque.

Generally, this pattern recurs throughout the patch.  Every time you
use the data in the page to figure something out which the caller
already knew, you're introducing a risk of bugs: what if the answers
don't match?   I think you should try to root out as much of that from
this code as you can.

As you may be able to tell, I'm working my way into this patch
gradually, starting with peripheral parts and working toward the core
of it.  Generally, I think it's in pretty good shape, but I still have
quite a bit left to study.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Gilles Darold
Date:
Subject: Re: Patch to implement pg_current_logfile() function
Next
From: Kevin Grittner
Date:
Subject: Re: delta relations in AFTER triggers