Re: Questions about btree_gin vs btree_gist for low cardinality columns - Mailing list pgsql-general

From Jeremy Finzel
Subject Re: Questions about btree_gin vs btree_gist for low cardinality columns
Date
Msg-id CAMa1XUixPDxyYSy8BN4_spUc518H4WUO-URYYf6DV0=pLmV6Fg@mail.gmail.com
Whole thread Raw
In response to Questions about btree_gin vs btree_gist for low cardinality columns  (Jeremy Finzel <finzelj@gmail.com>)
Responses Re: Questions about btree_gin vs btree_gist for low cardinality columns
List pgsql-general


On Fri, May 24, 2019 at 10:25 AM Jeremy Finzel <finzelj@gmail.com> wrote:
I have been hoping for clearer direction from the community about specifically btree_gin indexes for low cardinality columns (as well as low cardinality multi-column indexes).  In general there is very little discussion about this both online and in the docs.  Rather, the emphasis for GIN indexes discussed is always on full text search of JSON indexing, not btree_gin indexes.

However, I have never been happy with the options open to me for indexing low cardinality columns and was hoping this could be a big win.  Often I use partial indexes as a solution, but I really want to know how many use cases btree_gin could solve better than either a normal btree or a partial index.

Here are my main questions:

1.

"The docs say regarding *index only scans*: The index type must support index-only scans. B-tree indexes always do. GiST and SP-GiST indexes support index-only scans for some operator classes but not others. Other index types have no support. The underlying requirement is that the index must physically store, or else be able to reconstruct, the original data value for each index entry. As a counterexample, GIN indexes cannot support index-only scans because each index entry typically holds only part of the original data value."

This is confusing to say "B-tree indexes always do" and "GIN indexes cannot support index-only scans", when we have a btree_gin index type.  Explanation please ???

Is it true that for a btree_gin index on a regular column, "each index entry typically holds only part of the original data value"?  Do these still not support index only scans?  Could they?  I can't see why they shouldn't be able to for a single indexed non-expression field?

2. 

Lack of index only scans is definitely a downside.  However, I see basically identical performance, but way less memory and space usage, for gin indexes.  In terms of read-only performance, if index only scans are not a factor, why not always recommend btree_gin indexes instead of regular btree for low cardinality fields, which will yield similar performance but use far, far less space and resources?

3.

This relates to 2.  I understand the write overhead can be much greater for GIN indexes, which is why the fastupdate feature exists.  But again, in those discussions in the docs, it appears to me they are emphasizing that penalty more for full text or json GIN indexes.  Does the same overhead apply to a btree_gin index on a regular column with no expressions?

Those are my questions.

FYI, I can see an earlier thread about this topic (https://www.postgresql.org/message-id/flat/E260AEE7-95B3-4142-9A4B-A4416F1701F0%40aol.com#5def5ce1864298a3c0ba2d4881a660c2), but a few questions were left unanswered and unclear there.

I first started seriously considering using btree_gin indexes for low cardinality columns, for example some text field with 30 unique values across 100 million rows, after reading a summary of index types from Bruce's article: https://momjian.us/main/writings/pgsql/indexing.pdf

This article was also helpful but yet again I wonder it's broader viability: http://hlinnaka.iki.fi/2014/03/28/gin-as-a-substitute-for-bitmap-indexes/


Thank you!
Jeremy

Could anyone shed any light on these questions?  I appreciate it.

Thanks,
Jeremy 

pgsql-general by date:

Previous
From: Julie Nishimura
Date:
Subject: Re: POSTGRES_FSM_RELATIONS CRITICAL: DB control fsm relations used:79569 of 80000 (99%)
Next
From: Adrian Klaver
Date:
Subject: Re: POSTGRES_FSM_RELATIONS CRITICAL: DB control fsm relations used:79569 of 80000 (99%)