Re: [PROPOSAL] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PROPOSAL] Effective storage of duplicates in B-tree index.
Date
Msg-id 55E5C737.7040603@2ndquadrant.com
Whole thread Raw
In response to Re: [PROPOSAL] Effective storage of duplicates in B-tree index.  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
List pgsql-hackers

On 09/01/2015 11:31 AM, Alexander Korotkov wrote:
...
>
> Yes, In general GIN is a btree with effective duplicates handling +
> support of splitting single datums into multiple keys.
> This proposal is mostly porting duplicates handling from GIN to btree.
>
>     Sure, there are differences - GIN indexes don't handle UNIQUE indexes,
>
>
> The difference between btree_gin and btree is not only UNIQUE feature.
> 1) There is no gingettuple in GIN. GIN supports only bitmap scans. And
> it's not feasible to add gingettuple to GIN. At least with same
> semantics as it is in btree.
> 2) GIN doesn't support multicolumn indexes in the way btree does.
> Multicolumn GIN is more like set of separate singlecolumn GINs: it
> doesn't have composite keys.
> 3) btree_gin can't effectively handle range searches. "a < x < b" would
> be hangle as "a < x" intersect "x < b". That is extremely inefficient.
> It is possible to fix. However, there is no clear proposal how to fit
> this case into GIN interface, yet.
>
>     but the compression can only be effective when there are duplicate
>     rows. So either the index is not UNIQUE (so the b-tree feature is
>     not needed), or there are many updates.
>
>  From my observations users can use btree_gin only in some cases. They
> like compression, but can't use btree_gin mostly because of #1.

Thanks for the explanation! I'm not that familiar with GIN internals, 
but this mostly matches my understanding. I have only mentioned UNIQUE 
because the lack of gettuple() method seems obvious - and it works fine 
when GIN indexes are used as "bitmap indexes".

But you're right - we can't do index only scans on GIN indexes, which is 
a huge benefit of btree indexes.

>
>     Which brings me to the other benefit of btree indexes - they are
>     designed for high concurrency. How much is this going to be affected
>     by introducing the posting lists?
>
>
> I'd notice that current duplicates handling in PostgreSQL is hack over
> original btree. It is designed so in btree access method in PostgreSQL,
> not btree in general.
> Posting lists shouldn't change concurrency much. Currently, in btree you
> have to lock one page exclusively when you're inserting new value.
> When posting list is small and fits one page you have to do similar
> thing: exclusive lock of one page to insert new value.
> When you have posting tree, you have to do exclusive lock on one page of
> posting tree.

OK.

>
> One can say that concurrency would became worse because index would
> become smaller and number of pages would became smaller too. Since
> number of pages would be smaller, backends are more likely concur for
> the same page. But this argument can be user against any compression and
> for any bloat.

Which might be a problem for some use cases, but I assume we could add 
an option disabling this per-index. Probably having it "off" by default, 
and only enabling the compression explicitly.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: perlcritic
Next
From: Tom Lane
Date:
Subject: Re: WIP: About CMake v2