Hash Index on Partitioned Table - Mailing list pgsql-general

From peter.borissow@kartographia.com
Subject Hash Index on Partitioned Table
Date
Msg-id 1685540675.65414951@apps.rackspace.com
Whole thread Raw
Responses Re: Hash Index on Partitioned Table  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general

Dear PostgreSQL Community,

 

I have a rather large database with ~250 billion records in a partitioned table. The database has been running and ingesting data continuously for about 3 years.

 

I had a "regular" BTree index on one of the fields (a unique bigint column) but it was getting too big for the disk it was on. The index was consuming 6.4 TB of disk space.

 

I created a new disk with double the size, dropped the original index, and started to generate a new index.

 

After doing some research I decided to try to create a hash index instead of a BTree. For my purposes, the index is only used to find specific numbers ("=" and "IN" queries). From what I read, the hash index should run a little faster than btree for my use case and should use less disk space.

 

After 115 hours, the hash index is still generating and is using significantly more disk space than the original BTree index (8.4 TB vs 6.4 TB). I don't know how to check the status of the index creation task so I can't really estimate how much longer it will take or how much disk space it will consume.

 

Questions:

 

(1) Why is the hash index consuming more disk space than the btree index? Is it because the hash of the bigint values larger than the storing the bigints in the btree?

(2) Are there any known issues having a hash index on this many records?

(3) Are there any known issues having a hash index on partitioned tables?

(4) Is there any way to estimate when the index process will complete?

 

Server info:

 - PostgreSQL 13

 - Ubuntu 20.04.6 LTS

 - 64 cores (only 1 is ever used during index creation)

 - Memory usage is steady at 58GB/188GB

 - All disks are high speed NVMe drives

 - 1,686 tables in the partition

 

Thanks in advance,

Peter

pgsql-general by date:

Previous
From: hubert depesz lubaczewski
Date:
Subject: Re: Is there a bug in psql? (SELECT ''';)
Next
From: Andres Freund
Date:
Subject: Re: Pg 16: will pg_dump & pg_restore be faster?