Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Date
Msg-id CAH2-WznhWAs_ti2_UW0LMeaC_Wdnd3as6Y8Yyh=4PbUtkVG-Lg@mail.gmail.com
Whole thread Raw
In response to Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Mar 12, 2019 at 11:40 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Hey, I understood something today!

And I said something that could be understood!

> I think it's pretty clear that we have to view that as acceptable.  I
> mean, we could reduce contention even further by finding a way to make
> indexes 40% larger, but I think it's clear that nobody wants that.
> Now, maybe in the future we'll want to work on other techniques for
> reducing contention, but I don't think we should make that the problem
> of this patch, especially because the regressions are small and go
> away after a few hours of heavy use.  We should optimize for the case
> where the user intends to keep the database around for years, not
> hours.

I think so too. There is a feature in other database systems called
"reverse key indexes", which deals with this problem in a rather
extreme way. This situation is very similar to the situation with
rightmost page splits, where fillfactor is applied to pack leaf pages
full. The only difference is that there are multiple groupings, not
just one single implicit grouping (everything in the index). You could
probably make very similar observations about rightmost page splits
that apply leaf fillfactor.

Here is an example of how the largest index looks for master with the
100 warehouse case that was slightly regressed:

    table_name    |      index_name       | page_type | npages  |
avg_live_items | avg_dead_items | avg_item_size
------------------+-----------------------+-----------+---------+----------------+----------------+---------------
 bmsql_order_line | bmsql_order_line_pkey | R         |         1 |
54.000       |    0.000       |   23.000
 bmsql_order_line | bmsql_order_line_pkey | I         |    11,482 |
143.200       |    0.000       |   23.000
 bmsql_order_line | bmsql_order_line_pkey | L         | 1,621,316 |
139.458       |    0.003       |   24.000

Here is what we see with the patch:

    table_name    |      index_name       | page_type | npages  |
avg_live_items | avg_dead_items | avg_item_size
------------------+-----------------------+-----------+---------+----------------+----------------+---------------
 bmsql_order_line | bmsql_order_line_pkey | R         |       1 |
29.000       |    0.000       |   22.000
 bmsql_order_line | bmsql_order_line_pkey | I         |   5,957 |
159.149       |    0.000       |   23.000
 bmsql_order_line | bmsql_order_line_pkey | L         | 936,170 |
233.496       |    0.052       |   23.999

REINDEX would leave bmsql_order_line_pkey with 262 items, and we see
here that it has 233 after several hours, which is pretty good given
the amount of contention. The index actually looks very much like it
was just REINDEXED when initial bulk loading finishes, before we get
any updates, even though that happens using retail insertions.

Notice that the number of internal pages is reduced by almost a full
50% -- it's somewhat better than the reduction in the number of leaf
pages, because the benefits compound (items in the root are even a bit
smaller, because of this compounding effect, despite alignment
effects). Internal pages are the most important pages to have cached,
but also potentially the biggest points of contention.

-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Paul Ramsey
Date:
Subject: Re: Compressed TOAST Slicing
Next
From: Andres Freund
Date:
Subject: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons