Re: [PROPOSAL] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PROPOSAL] Effective storage of duplicates in B-tree index. |
Date | |
Msg-id | 55E5C737.7040603@2ndquadrant.com Whole thread Raw |
In response to | Re: [PROPOSAL] Effective storage of duplicates in B-tree index. (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
List | pgsql-hackers |
On 09/01/2015 11:31 AM, Alexander Korotkov wrote: ... > > Yes, In general GIN is a btree with effective duplicates handling + > support of splitting single datums into multiple keys. > This proposal is mostly porting duplicates handling from GIN to btree. > > Sure, there are differences - GIN indexes don't handle UNIQUE indexes, > > > The difference between btree_gin and btree is not only UNIQUE feature. > 1) There is no gingettuple in GIN. GIN supports only bitmap scans. And > it's not feasible to add gingettuple to GIN. At least with same > semantics as it is in btree. > 2) GIN doesn't support multicolumn indexes in the way btree does. > Multicolumn GIN is more like set of separate singlecolumn GINs: it > doesn't have composite keys. > 3) btree_gin can't effectively handle range searches. "a < x < b" would > be hangle as "a < x" intersect "x < b". That is extremely inefficient. > It is possible to fix. However, there is no clear proposal how to fit > this case into GIN interface, yet. > > but the compression can only be effective when there are duplicate > rows. So either the index is not UNIQUE (so the b-tree feature is > not needed), or there are many updates. > > From my observations users can use btree_gin only in some cases. They > like compression, but can't use btree_gin mostly because of #1. Thanks for the explanation! I'm not that familiar with GIN internals, but this mostly matches my understanding. I have only mentioned UNIQUE because the lack of gettuple() method seems obvious - and it works fine when GIN indexes are used as "bitmap indexes". But you're right - we can't do index only scans on GIN indexes, which is a huge benefit of btree indexes. > > Which brings me to the other benefit of btree indexes - they are > designed for high concurrency. How much is this going to be affected > by introducing the posting lists? > > > I'd notice that current duplicates handling in PostgreSQL is hack over > original btree. It is designed so in btree access method in PostgreSQL, > not btree in general. > Posting lists shouldn't change concurrency much. Currently, in btree you > have to lock one page exclusively when you're inserting new value. > When posting list is small and fits one page you have to do similar > thing: exclusive lock of one page to insert new value. > When you have posting tree, you have to do exclusive lock on one page of > posting tree. OK. > > One can say that concurrency would became worse because index would > become smaller and number of pages would became smaller too. Since > number of pages would be smaller, backends are more likely concur for > the same page. But this argument can be user against any compression and > for any bloat. Which might be a problem for some use cases, but I assume we could add an option disabling this per-index. Probably having it "off" by default, and only enabling the compression explicitly. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: