Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Deleting older versions in unique indexes to avoid page splits
Date
Msg-id CAH2-WzmzGO7j2wCesRvyqZNL4XTc8896W=W0wX8bX1vX3XGPag@mail.gmail.com
Whole thread Raw
In response to Re: Deleting older versions in unique indexes to avoid page splits  (Victor Yegorov <vyegorov@gmail.com>)
Responses Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Fri, Oct 16, 2020 at 1:00 PM Victor Yegorov <vyegorov@gmail.com> wrote:
> I really like these results, great work!

Thanks Victor!

> I'm also wondering how IO numbers changed due to these improvements, shouldn't be difficult to look into.

Here is the pg_statio_user_indexes for patch for the same run:

 schemaname |     relname      |       indexrelname        |
idx_blks_read | idx_blks_hit
------------+------------------+---------------------------+---------------+---------------
 public     | pgbench_accounts | aid_pkey_include_abalance |
12,828,736 |   534,819,826
 public     | pgbench_accounts | one                       |
12,750,275 |   534,486,742
 public     | pgbench_accounts | two                       |
2,474,893 | 2,216,047,568
(3 rows)

And for master:

 schemaname |     relname      |       indexrelname        |
idx_blks_read | idx_blks_hit
------------+------------------+---------------------------+---------------+---------------
 public     | pgbench_accounts | aid_pkey_include_abalance |
29,526,568 |   292,705,432
 public     | pgbench_accounts | one                       |
28,239,187 |   293,164,160
 public     | pgbench_accounts | two                       |
6,505,615 | 1,318,164,692
(3 rows)

Here is pg_statio_user_tables patch:

 schemaname |     relname      | heap_blks_read | heap_blks_hit |
idx_blks_read | idx_blks_hit  | toast_blks_read | toast_blks_hit |
tidx_blks_read | tidx_blks_hit

------------+------------------+----------------+---------------+---------------+---------------+-----------------+----------------+----------------+---------------
 public     | pgbench_accounts |    123,195,496 |   696,805,485 |
28,053,904 | 3,285,354,136 |                 |                |
        |
 public     | pgbench_branches |             11 |         1,553 |
         |               |                 |                |
      |
 public     | pgbench_history  |              0 |             0 |
         |               |                 |                |
      |
 public     | pgbench_tellers  |             86 |        15,416 |
         |               |                 |                |
      |
(4 rows)

And the pg_statio_user_tables for master:

 schemaname |     relname      | heap_blks_read | heap_blks_hit |
idx_blks_read | idx_blks_hit  | toast_blks_read | toast_blks_hit |
tidx_blks_read | tidx_blks_hit

------------+------------------+----------------+---------------+---------------+---------------+-----------------+----------------+----------------+---------------
 public     | pgbench_accounts |    106,502,089 |   334,875,058 |
64,271,370 | 1,904,034,284 |                 |                |
        |
 public     | pgbench_branches |             11 |         1,553 |
         |               |                 |                |
      |
 public     | pgbench_history  |              0 |             0 |
         |               |                 |                |
      |
 public     | pgbench_tellers  |             86 |        15,416 |
         |               |                 |                |
      |
(4 rows)

Of course, it isn't fair to make a direct comparison because we're
doing ~1.7x times more work with the patch. But even still, the
idx_blks_read is less than half with the patch.

BTW, the extra heap_blks_hit from the patch are not only due to the
fact that the system does more directly useful work. It's also due to
the extra garbage collection triggered in indexes. The same is
probably *not* true with heap_blks_read, though. I minimize the number
of heap pages accessed by the new cleamup mechanism each time, and
temporal locality will help a lot. I think that we delete index
entries pointing to garbage in the heap at pretty predictable
intervals. Heap pages full of LP_DEAD line pointer garbage only get
processed with a few times close together in time, after which they're
bound to either get VACUUM'd or get evicted from shared buffers.

> Peter, according to cfbot patch no longer compiles.
> Can you send and update, please?

Attached is v3, which is rebased against the master branch as of
today. No real changes, though.

-- 
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Internal key management system
Next
From: Tom Lane
Date:
Subject: Re: upcoming API changes for LLVM 12