Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: WIP: Covering + unique indexes. |
Date | |
Msg-id | CAH2-WzmNgNi8TtGqpgGQ2gku=Jwb6yXNcoQ1uy-jA2diyfTK6Q@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: Covering + unique indexes. (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Responses |
Re: WIP: Covering + unique indexes.
|
List | pgsql-hackers |
On Sat, Mar 24, 2018 at 12:39 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > +1, putting "nattributes" to argument of index_truncate_tuple() would > make this function way more universal. Great. > Originally that code was written by Teodor, but I also put my hands there. > In GIN entry tree, item pointers are stored in a posting list which is > located > after index tuple attributes. So, both t_tid block number and offset are > used for GIN needs. Well, you worked on the posting list compression stuff, at least. :-) > That makes sense. Let's store the number of tuple attributes to t_tid. > Assuming that our INDEX_MAX_KEYS is quite small number, we will have > higher bits of t_tid free for latter use. I was going to say that you could just treat the low bit in the t_tid offset as representing "see catalog entry". My first idea was that nothing would have to change about the existing format, since internal page items already have only the low bit set within their offset. However, I now see that that won't really work, because we don't change the offset in high keys when they're copied from a real item during a page split. Whatever we do, it has to work equally well for all "separator keys" -- that is, it must work for both downlinks in internal pages, and all high keys (including high keys at the leaf level). A good solution is to use the unused 13th t_bit. If hash can have a INDEX_MOVED_BY_SPLIT_MASK, then nbtree can have a INDEX_ALT_TID_MASK. This avoids a BTREE_VERSION bump, and allows us to deal with the highkey offset issue. Actually, it's even more flexible than that -- it can work with ordinary leaf tuples in the future, too. That is, we can eventually implement prefix truncation and deduplication at the leaf level using this representation, since there is nothing that limits INDEX_ALT_TID_MASK IndexTuples to "separator keys". The main difference between this approach to leaf prefix truncation/compression/deduplication, and the GIN entry tree's posting list representation would be that it wouldn't have to be super-optimized for duplicates, at the expense of more common case for regular nbtree indexes -- having few or no duplicates. A btree_gin index on pgbench_accounts(aid) looks very similar to an equivalent nbtree index if you just compare internal pages from each, but they look quite different at the leaf level, where GIN has 24 byte IndexTuples instead of 16 bytes IndexTuples. Of course, this is because the leaf pages have posting lists that can never be simple heap pointer TIDs. A secondary goal of this INDEX_ALT_TID_MASK representation should be that it won't even be necessary to know that an IndexTuple is contained within a leaf page rather than an index page (again, unlike GIN). I'm pretty confident that we can have a truly universal IndexTuple representation for nbtree, while supporting all of these standard optimizations. Sorry for going off in a tangent, but I think it's somewhat necessary to have a strategy here. Of course, we don't have to get everything right now, but we should be looking in this direction whenever we talk about on-disk nbtree changes. -- Peter Geoghegan
pgsql-hackers by date: