Re: Next Steps with Hash Indexes - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Next Steps with Hash Indexes
Date
Msg-id CANbhV-FB418MJ+1UC=sr7XhWvhz=CnVrvqMg7eAWGTGBkM6pFQ@mail.gmail.com
Whole thread Raw
In response to Re: Next Steps with Hash Indexes  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Next Steps with Hash Indexes  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, 14 Oct 2021 at 16:09, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Oct 14, 2021 at 12:48 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > The hash index tuples are 20-bytes each. If that were rounded up to
> > 8-byte alignment, then that would be 24 bytes.
> >
> > Using pageinspect, the max(live_items) on any data page (bucket or
> > overflow) is 407 items, so they can't be 24 bytes long.
>
> That's the same as an nbtree page, which confirms my suspicion. The 20
> bytes consists of a 16 byte tuple, plus a 4 byte line pointer. The
> tuple-level alignment overhead gets you from 12 bytes to 16 bytes with
> a single int4 column. So the padding is there for the taking.

Thank you for nudging me to review the tuple length.

Since hash indexes never store Nulls, and the hash is always fixed
length, ISTM that we can compress the hash index entries down to
ItemPointerData (6 bytes) plus any hashes.

That doesn't change any arguments about size differences between
approaches, but we can significantly reduce index size (by up to 50%).

-- 
Simon Riggs                http://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM
Next
From: mp39590@gmail.com
Date:
Subject: [PATCH] Make ENOSPC not fatal in semaphore creation