Re: Hash Index on Partitioned Table - Mailing list pgsql-general

From Tom Lane
Subject Re: Hash Index on Partitioned Table
Date
Msg-id 2657.1685542070@sss.pgh.pa.us
Whole thread Raw
In response to Hash Index on Partitioned Table  ("peter.borissow@kartographia.com" <peter.borissow@kartographia.com>)
Responses Re: Hash Index on Partitioned Table
List pgsql-general
"peter.borissow@kartographia.com" <peter.borissow@kartographia.com> writes:
> I have a rather large database with ~250 billion records in a partitioned table. The database has been running and
ingestingdata continuously for about 3 years. 

> I had a "regular" BTree index on one of the fields (a unique bigint column) but it was getting too big for the disk
itwas on. The index was consuming 6.4 TB of disk space. 

That's ... really about par for the course.  Each row requires an 8-byte
index entry, plus 12 bytes index overhead.  If I'm doing the math right
then the index is physically about 78% full which is typical to good for
a b-tree.  Reindexing would remove most of the extra space, but only
temporarily.

> After doing some research I decided to try to create a hash index instead of a BTree. For my purposes, the index is
onlyused to find specific numbers ("=" and "IN" queries). From what I read, the hash index should run a little faster
thanbtree for my use case and should use less disk space. 

I'm skeptical.  The thing to bear in mind is that btree is the mainstream
use-case and has been refined and optimized far more than the hash index
logic.

> (1) Why is the hash index consuming more disk space than the btree index? Is it because the hash of the bigint values
largerthan the storing the bigints in the btree? 

From memory, the index entries will be the same size in this case,
but hash might have more wasted space.

> (4) Is there any way to estimate when the index process will complete?

An index on a partitioned table isn't a single object, it's one index per
partition.  So you should be able to look at how many partitions have
indexes so far.  You might have to drill down to the point of counting how
many files in the database's directory, if the individual indexes aren't
showing up as committed catalog entries yet.

            regards, tom lane



pgsql-general by date:

Previous
From: Andres Freund
Date:
Subject: Re: Pg 16: will pg_dump & pg_restore be faster?
Next
From: Adrian Klaver
Date:
Subject: Re: Is there a bug in psql? (SELECT ''';)