Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Date
Msg-id CAH2-Wz=WY-ghVTdjtp7RxQgrz3bTkEUxveOv-ZM8+RcY9m4wSw@mail.gmail.com
Whole thread Raw
In response to Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Mon, Mar 18, 2019 at 5:12 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Smarter choices on page splits pay off with higher client counts
> because they reduce contention at likely hot points. It's kind of
> crazy that the code in _bt_check_unique() sometimes has to move right,
> while holding an exclusive buffer lock on the original page and a
> shared buffer lock on its sibling page at the same time. It then has
> to hold a third buffer lock concurrently, this time on any heap pages
> it is interested in.

Actually, by the time we get to 16 clients, this workload does make
the indexes and tables smaller. Here is pg_buffercache output
collected after the first 16 client case:

Master
======

                 relname                 │ relforknumber │
size_main_rel_fork_blocks │ buffer_count │     avg_buffer_usg

─────────────────────────────────────────┼───────────────┼───────────────────────────┼──────────────┼────────────────────────
 pgbench_history                         │             0 │
      123,484 │      123,484 │     4.9989715266755207
 pgbench_accounts                        │             0 │
       34,665 │       10,682 │     4.4948511514697622
 pgbench_accounts_pkey                   │             0 │
        5,708 │        1,561 │     4.8731582319026265
 pgbench_tellers                         │             0 │
          489 │          489 │     5.0000000000000000
 pgbench_branches                        │             0 │
          284 │          284 │     5.0000000000000000
 pgbench_tellers_pkey                    │             0 │
           56 │           56 │     5.0000000000000000
....

Patch
=====

                 relname                 │ relforknumber │
size_main_rel_fork_blocks │ buffer_count │     avg_buffer_usg

─────────────────────────────────────────┼───────────────┼───────────────────────────┼──────────────┼────────────────────────
 pgbench_history                         │             0 │
      127,864 │      127,864 │     4.9980447975974473
 pgbench_accounts                        │             0 │
       33,933 │        9,614 │     4.3517786561264822
 pgbench_accounts_pkey                   │             0 │
        5,487 │        1,322 │     4.8857791225416036
 pgbench_tellers                         │             0 │
          204 │          204 │     4.9803921568627451
 pgbench_branches                        │             0 │
          198 │          198 │     4.3535353535353535
 pgbench_tellers_pkey                    │             0 │
           14 │           14 │     5.0000000000000000
....

The main fork for pgbench_history is larger with the patch, obviously,
but that's good. pgbench_accounts_pkey is about 4% smaller, which is
probably the most interesting observation that can be made here, but
the tables are also smaller. pgbench_accounts itself is ~2% smaller.
pgbench_branches is ~30% smaller, and pgbench_tellers is 60% smaller.
Of course, the smaller tables were already very small, so maybe that
isn't important. I think that this is due to more effective pruning,
possibly because we get better lock arbitration as a consequence of
better splits, and also because duplicates are in heap TID order. I
haven't observed this effect with larger databases, which have been my
focus.

It isn't weird that shared_buffers doesn't have all the
pgbench_accounts blocks, since, of course, this is highly skewed by
design -- most blocks were never accessed from the table.

This effect seems to be robust, at least with this workload. The
second round of benchmarks (which have their own pgbench -i
initialization) show very similar amounts of bloat at the same point.
It may not be that significant, but it's also not a fluke.

-- 
Peter Geoghegan

pgsql-hackers by date:

Previous
From: Tatsuro Yamada
Date:
Subject: Re: [HACKERS] CLUSTER command progress monitor
Next
From: Tatsuro Yamada
Date:
Subject: Re: [HACKERS] CLUSTER command progress monitor