Thread: Doc chapter for Hash Indexes

Doc chapter for Hash Indexes

From
Simon Riggs
Date:
New chapter for Hash Indexes, designed to help users understand how
they work and when to use them.

Mostly newly written, but a few paras lifted from README when they were helpful.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Doc chapter for Hash Indexes

From
Justin Pryzby
Date:
On Mon, Jun 21, 2021 at 02:08:12PM +0100, Simon Riggs wrote:
> New chapter for Hash Indexes, designed to help users understand how
> they work and when to use them.
> 
> Mostly newly written, but a few paras lifted from README when they were helpful.

+ <para>
+  PostgreSQL includes an implementation of persistent on-disk hash indexes,
+  which are now fully crash recoverable. Any data type can be indexed by a

I don't see any need to mention that they're "now" crash safe.

+  Each hash index tuple stores just the 4-byte hash value, not the actual
+  column value. As a result, hash indexes may be much smaller than B-trees
+  when indexing longer data items such as UUIDs, URLs etc.. The absence of

comma:
URLs, etc.

+  the column value also makes all hash index scans lossy. Hash indexes may
+  take part in bitmap index scans and backward scans.

Isn't it more correct to say that it must use a bitmap scan?

+  through the tree until the leaf page is found. In tables with millions
+  of rows this descent can increase access time to data. The equivalent

rows comma

+  that hash value. When scanning a hash bucket during queries we need to

queries comma

+ <para>
+  As a result of the overflow cases, we can say that hash indexes are
+  most suitable for unique, nearly unique data or data with a low number
+  of rows per hash bucket will be suitable for hash indexes. One

The beginning and end of the sentence duplicate "suitable".

+  Each row in the table indexed is represented by a single index tuple in
+  the hash index. Hash index tuples are stored in the bucket pages, and if
+  they exist, the overflow pages. 

"the overflow pages" didn't sound right, but I was confused by the comma.  
I think it should say ".. in bucket pages and overflow pages, if any."

-- 
Justin



Re: Doc chapter for Hash Indexes

From
Amit Kapila
Date:
On Tue, Jun 22, 2021 at 4:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Jun 21, 2021 at 02:08:12PM +0100, Simon Riggs wrote:
> > New chapter for Hash Indexes, designed to help users understand how
> > they work and when to use them.
> >
> > Mostly newly written, but a few paras lifted from README when they were helpful.
>
>
..
> +  the column value also makes all hash index scans lossy. Hash indexes may
> +  take part in bitmap index scans and backward scans.
>
> Isn't it more correct to say that it must use a bitmap scan?
>

Why? Hash indexes do support regular index scan.

-- 
With Regards,
Amit Kapila.



Re: Doc chapter for Hash Indexes

From
Amit Kapila
Date:
On Mon, Jun 21, 2021 at 6:38 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> New chapter for Hash Indexes, designed to help users understand how
> they work and when to use them.
>
> Mostly newly written, but a few paras lifted from README when they were helpful.
>

Few comments
==============
1.
+  Hash indexes are best optimized for SELECTs and UPDATEs using equality
+  scans on larger tables.

Is there a reason to mention Selects and Updates but not Deletes?

2.
+  Like B-Trees, hash indexes perform simple index tuple deletion. This
+  is a deferred maintenance operation that deletes index tuples that are
+  known to be safe to delete (those whose item identifier's LP_DEAD bit
+  is already set). This is performed speculatively upon each insert,
+  though may not succeed if the page is pinned by another backend.

It is not very clear to me when we perform the simple index tuple
deletion from the above sentence. We perform it when there is no space
to accommodate a new tuple on the bucket page and as a result, we
might need to create an overflow page. Basically, I am not sure
saying: "This is performed speculatively upon each insert .." is
helpful.

3.
+  incrementally expanded.  When a new bucket is to be added to the index,
+  exactly one existing bucket will need to be "split", with some of its
+  tuples being transferred to the new bucket according to the updated
+  key-to-bucket-number mapping.  This is essentially the same hash table

In most places, the patch has used a single space after the full stop
but at some places like above, it has used two spaces after full stop.
I think it is better to be consistent.

4.
 This is essentially the same hash table
+  management technique embodied in src/backend/utils/hash/dynahash.c for
+  in-memory hash tables used within PostgreSQL internals.

I am not sure if there is a need to mention this in the user-facing
doc. I think README is a better place for this.

-- 
With Regards,
Amit Kapila.



Re: Doc chapter for Hash Indexes

From
Simon Riggs
Date:
On Tue, Jun 22, 2021 at 7:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jun 21, 2021 at 6:38 PM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> >
> > New chapter for Hash Indexes, designed to help users understand how
> > they work and when to use them.
> >
> > Mostly newly written, but a few paras lifted from README when they were helpful.
> >
>
> Few comments
> ==============
> 1.
> +  Hash indexes are best optimized for SELECTs and UPDATEs using equality
> +  scans on larger tables.
>
> Is there a reason to mention Selects and Updates but not Deletes?

Deletes decrease the number of rows, so must eventually be matched with inserts.
So deletes imply inserts.

Perhaps it should say "update-heavy"

> 2.
> +  Like B-Trees, hash indexes perform simple index tuple deletion. This
> +  is a deferred maintenance operation that deletes index tuples that are
> +  known to be safe to delete (those whose item identifier's LP_DEAD bit
> +  is already set). This is performed speculatively upon each insert,
> +  though may not succeed if the page is pinned by another backend.
>
> It is not very clear to me when we perform the simple index tuple
> deletion from the above sentence. We perform it when there is no space
> to accommodate a new tuple on the bucket page and as a result, we
> might need to create an overflow page. Basically, I am not sure
> saying: "This is performed speculatively upon each insert .." is
> helpful.

OK, thanks, will reword.

> 3.
> +  incrementally expanded.  When a new bucket is to be added to the index,
> +  exactly one existing bucket will need to be "split", with some of its
> +  tuples being transferred to the new bucket according to the updated
> +  key-to-bucket-number mapping.  This is essentially the same hash table
>
> In most places, the patch has used a single space after the full stop
> but at some places like above, it has used two spaces after full stop.
> I think it is better to be consistent.

OK

> 4.
>  This is essentially the same hash table
> +  management technique embodied in src/backend/utils/hash/dynahash.c for
> +  in-memory hash tables used within PostgreSQL internals.
>
> I am not sure if there is a need to mention this in the user-facing
> doc. I think README is a better place for this.

OK, will remove. Thanks


I've reworded most things from both Amit and Justin; thanks for your reviews.

I attach both clean and compare versions.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Doc chapter for Hash Indexes

From
Amit Kapila
Date:
On Tue, Jun 22, 2021 at 2:31 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> I attach both clean and compare versions.
>

Do we want to hold this work for PG15 or commit in PG14 and backpatch
it till v10 where we have made hash indexes crash-safe? I would vote
for committing in PG14 and backpatch it till v10, however, I am fine
if we want to commit just to PG14 or PG15.

-- 
With Regards,
Amit Kapila.



Re: Doc chapter for Hash Indexes

From
Simon Riggs
Date:
On Wed, Jun 23, 2021 at 5:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jun 22, 2021 at 2:31 PM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> >
> > I attach both clean and compare versions.
> >
>
> Do we want to hold this work for PG15 or commit in PG14 and backpatch
> it till v10 where we have made hash indexes crash-safe? I would vote
> for committing in PG14 and backpatch it till v10, however, I am fine
> if we want to commit just to PG14 or PG15.

Backpatch makes sense to me, but since not everyone will be reading
this thread, I would look towards PG15 only first. We may yet pick up
additional corrections or additions before a backpatch, if that is
agreed.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Doc chapter for Hash Indexes

From
Bruce Momjian
Date:
aOn Wed, Jun 23, 2021 at 12:56:51PM +0100, Simon Riggs wrote:
> On Wed, Jun 23, 2021 at 5:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jun 22, 2021 at 2:31 PM Simon Riggs
> > <simon.riggs@enterprisedb.com> wrote:
> > >
> > > I attach both clean and compare versions.
> > >
> >
> > Do we want to hold this work for PG15 or commit in PG14 and backpatch
> > it till v10 where we have made hash indexes crash-safe? I would vote
> > for committing in PG14 and backpatch it till v10, however, I am fine
> > if we want to commit just to PG14 or PG15.
> 
> Backpatch makes sense to me, but since not everyone will be reading
> this thread, I would look towards PG15 only first. We may yet pick up
> additional corrections or additions before a backpatch, if that is
> agreed.

Yeah, I think backpatching makes sense.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Doc chapter for Hash Indexes

From
Amit Kapila
Date:
On Fri, Jun 25, 2021 at 1:29 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> aOn Wed, Jun 23, 2021 at 12:56:51PM +0100, Simon Riggs wrote:
> > On Wed, Jun 23, 2021 at 5:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Jun 22, 2021 at 2:31 PM Simon Riggs
> > > <simon.riggs@enterprisedb.com> wrote:
> > > >
> > > > I attach both clean and compare versions.
> > > >
> > >
> > > Do we want to hold this work for PG15 or commit in PG14 and backpatch
> > > it till v10 where we have made hash indexes crash-safe? I would vote
> > > for committing in PG14 and backpatch it till v10, however, I am fine
> > > if we want to commit just to PG14 or PG15.
> >
> > Backpatch makes sense to me, but since not everyone will be reading
> > this thread, I would look towards PG15 only first. We may yet pick up
> > additional corrections or additions before a backpatch, if that is
> > agreed.
>
> Yeah, I think backpatching makes sense.
>

I checked and found that there are two commits (7c75ef5715 and
22c5e73562) in the hash index code in PG-11 which might have impacted
what we write in the documentation. However, AFAICS, nothing proposed
in the patch would change due to those commits. Even, if we don't want
to back patch, is there any harm in committing this to PG-14?

-- 
With Regards,
Amit Kapila.



Re: Doc chapter for Hash Indexes

From
Simon Riggs
Date:
On Fri, Jun 25, 2021 at 4:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jun 25, 2021 at 1:29 AM Bruce Momjian <bruce@momjian.us> wrote:
> >
> > aOn Wed, Jun 23, 2021 at 12:56:51PM +0100, Simon Riggs wrote:
> > > On Wed, Jun 23, 2021 at 5:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Tue, Jun 22, 2021 at 2:31 PM Simon Riggs
> > > > <simon.riggs@enterprisedb.com> wrote:
> > > > >
> > > > > I attach both clean and compare versions.
> > > > >
> > > >
> > > > Do we want to hold this work for PG15 or commit in PG14 and backpatch
> > > > it till v10 where we have made hash indexes crash-safe? I would vote
> > > > for committing in PG14 and backpatch it till v10, however, I am fine
> > > > if we want to commit just to PG14 or PG15.
> > >
> > > Backpatch makes sense to me, but since not everyone will be reading
> > > this thread, I would look towards PG15 only first. We may yet pick up
> > > additional corrections or additions before a backpatch, if that is
> > > agreed.
> >
> > Yeah, I think backpatching makes sense.
> >
>
> I checked and found that there are two commits (7c75ef5715 and
> 22c5e73562) in the hash index code in PG-11 which might have impacted
> what we write in the documentation. However, AFAICS, nothing proposed
> in the patch would change due to those commits. Even, if we don't want
> to back patch, is there any harm in committing this to PG-14?

I've reviewed those commits and the related code, so I agree.

As a result, I've tweaked the wording around VACUUM slightly.

Clean and compare patches attached.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Doc chapter for Hash Indexes

From
Amit Kapila
Date:
On Fri, Jun 25, 2021 at 3:11 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> On Fri, Jun 25, 2021 at 4:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jun 25, 2021 at 1:29 AM Bruce Momjian <bruce@momjian.us> wrote:
> > >
> > > aOn Wed, Jun 23, 2021 at 12:56:51PM +0100, Simon Riggs wrote:
> > > > On Wed, Jun 23, 2021 at 5:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jun 22, 2021 at 2:31 PM Simon Riggs
> > > > > <simon.riggs@enterprisedb.com> wrote:
> > > > > >
> > > > > > I attach both clean and compare versions.
> > > > > >
> > > > >
> > > > > Do we want to hold this work for PG15 or commit in PG14 and backpatch
> > > > > it till v10 where we have made hash indexes crash-safe? I would vote
> > > > > for committing in PG14 and backpatch it till v10, however, I am fine
> > > > > if we want to commit just to PG14 or PG15.
> > > >
> > > > Backpatch makes sense to me, but since not everyone will be reading
> > > > this thread, I would look towards PG15 only first. We may yet pick up
> > > > additional corrections or additions before a backpatch, if that is
> > > > agreed.
> > >
> > > Yeah, I think backpatching makes sense.
> > >
> >
> > I checked and found that there are two commits (7c75ef5715 and
> > 22c5e73562) in the hash index code in PG-11 which might have impacted
> > what we write in the documentation. However, AFAICS, nothing proposed
> > in the patch would change due to those commits. Even, if we don't want
> > to back patch, is there any harm in committing this to PG-14?
>
> I've reviewed those commits and the related code, so I agree.
>

Do you agree to just commit this to PG-14 or to commit in PG-14 and
backpatch till PG-10?

> As a result, I've tweaked the wording around VACUUM slightly.
>

Thanks, the changes look good to me.

-- 
With Regards,
Amit Kapila.



Re: Doc chapter for Hash Indexes

From
Amit Kapila
Date:
On Sat, Jun 26, 2021 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jun 25, 2021 at 3:11 PM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> >
> > On Fri, Jun 25, 2021 at 4:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Jun 25, 2021 at 1:29 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > >
> > > > aOn Wed, Jun 23, 2021 at 12:56:51PM +0100, Simon Riggs wrote:
> > > > > On Wed, Jun 23, 2021 at 5:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 22, 2021 at 2:31 PM Simon Riggs
> > > > > > <simon.riggs@enterprisedb.com> wrote:
> > > > > > >
> > > > > > > I attach both clean and compare versions.
> > > > > > >
> > > > > >
> > > > > > Do we want to hold this work for PG15 or commit in PG14 and backpatch
> > > > > > it till v10 where we have made hash indexes crash-safe? I would vote
> > > > > > for committing in PG14 and backpatch it till v10, however, I am fine
> > > > > > if we want to commit just to PG14 or PG15.
> > > > >
> > > > > Backpatch makes sense to me, but since not everyone will be reading
> > > > > this thread, I would look towards PG15 only first. We may yet pick up
> > > > > additional corrections or additions before a backpatch, if that is
> > > > > agreed.
> > > >
> > > > Yeah, I think backpatching makes sense.
> > > >
> > >
> > > I checked and found that there are two commits (7c75ef5715 and
> > > 22c5e73562) in the hash index code in PG-11 which might have impacted
> > > what we write in the documentation. However, AFAICS, nothing proposed
> > > in the patch would change due to those commits. Even, if we don't want
> > > to back patch, is there any harm in committing this to PG-14?
> >
> > I've reviewed those commits and the related code, so I agree.
> >
>
> Do you agree to just commit this to PG-14 or to commit in PG-14 and
> backpatch till PG-10?
>

I am planning to go through the patch once again and would like to
commit and backpatch till v10 in a day to two unless someone thinks
otherwise.

-- 
With Regards,
Amit Kapila.



Re: Doc chapter for Hash Indexes

From
Amit Kapila
Date:
On Tue, Jun 29, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Jun 26, 2021 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> I am planning to go through the patch once again and would like to
> commit and backpatch till v10 in a day to two unless someone thinks
> otherwise.
>

Pushed.

-- 
With Regards,
Amit Kapila.