Re: Hash index initial size is too large given NULLs or partialindexes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Hash index initial size is too large given NULLs or partialindexes
Date
Msg-id b0c1be5e-94ce-0496-fe96-b61be7a3067c@2ndquadrant.com
Whole thread Raw
In response to Hash index initial size is too large given NULLs or partial indexes  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Hash index initial size is too large given NULLs or partial indexes
List pgsql-hackers

On 3/8/19 7:14 PM, Jeff Janes wrote:
> Referring to this thread:
> 
> https://dba.stackexchange.com/questions/231647/why-are-partial-postgresql-hash-indices-not-smaller-than-full-indices
> 
> When a hash index is created on a populated table, it estimates the
> number of buckets to start out with based on the number of tuples
> returned by estimate_rel_size.  But this number ignores both the fact
> that NULLs are not stored in hash indexes, and that partial indexes
> exist.  This can lead to much too large hash indexes.  Doing a re-index
> just repeats the logic, so doesn't fix anything.  Fill factor also can't
> fix it, as you are not allowed to increase that beyond 100.
> 

Hmmm :-(

> This goes back to when the pre-sizing was implemented in 2008
> (c9a1cc694abef737548a2a).  It seems to be an oversight, rather than
> something that was considered.
> 
> Is this a bug that should be fixed?  Or if getting a more accurate
> estimate is not possible or not worthwhile, add a code comment about that?
> 

I'd agree this smells like a bug (or perhaps two). The sizing probably
should consider both null_frac and selectivity of the index predicate.
When those two are redundant (i.e. when there's IS NOT NULL condition on
indexed column), this will result in under-estimate. That means the
index build will do a an extra split, but that's probably better than
having permanently bloated index.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Hash index initial size is too large given NULLs or partial indexes
Next
From: Alvaro Herrera
Date:
Subject: Re: PostgreSQL vs SQL/XML Standards