Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From Victor Yegorov
Subject Re: Deleting older versions in unique indexes to avoid page splits
Date
Msg-id CAGnEbohYF_K6b0v=2uc289=v67qNhc3n01Ftic8X94zP7kKqtw@mail.gmail.com
Whole thread Raw
In response to Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
пт, 13 нояб. 2020 г. в 00:01, Peter Geoghegan <pg@bowt.ie>:
On Wed, Nov 11, 2020 at 12:58 PM Victor Yegorov <vyegorov@gmail.com> wrote:
> On the other hand, there's quite a big drop on the UPDATEs throughput. For sure, undersized shared_bufefrs
> contribute to this drop. Still, my experience tells me that under conditions at hand (disabled HOT due to index
> over update time column) tables will tend to accumulate bloat and produce unnecessary IO also from WAL.

I think that the big SELECT statement with an "ORDER BY mtime ... "
was a good way of demonstrating the advantages of the patch.

Attached is v8, which has the enhancements for low cardinality data
that I mentioned earlier today. It also simplifies the logic for
dealing with posting lists that we need to delete some TIDs from.
These posting list simplifications also make the code a bit more
efficient, which might be noticeable during benchmarking.

Perhaps your "we have 5,2% slowdown in UPDATE speed" issue will be at
least somewhat fixed by the enhancements to v8?

Yes, v8 looks very nice!

I've done two 8 hour long sessions with scale=2000 and shared_buffers=512MB
(previously sent postgresql.auto.conf used here with no changes).
The rest of the setup is the same:
- mtime column that is tracks update time
- index on (mtime, aid)
- tenner low cardinality index from Peter's earlier e-mail
- 3 pgbench scripts run in parallel on master and on v8 patchset (scripts from the previous e-mail used here).

Master
------
        relname        |   nrows   | blk_before | mb_before | blk_after | mb_after |  diff
-----------------------+-----------+------------+-----------+-----------+----------+--------
 pgbench_accounts      | 300000000 |    4918033 |   38422.1 |   5066589 |  39582.7 |  +3.0%
 accounts_mtime        | 300000000 |    1155119 |    9024.4 |   1422354 |  11112.1 | +23.1%
 pgbench_accounts_pkey | 300000000 |     822573 |    6426.4 |    822573 |   6426.4 |     0
 tenner                | 300000000 |     346050 |    2703.5 |    563101 |   4399.2 | +62.7%
(4 rows)

DB size: 59.3..64.5 (+5.2GB / +8.8%)

Patched
-------
        relname        |   nrows   | blk_before | mb_before | blk_after | mb_after |  diff
-----------------------+-----------+------------+-----------+-----------+----------+--------
 pgbench_accounts      | 300000000 |    4918033 |   38422.1 |   5068092 |  39594.5 |  +3.0%
 accounts_mtime        | 300000000 |    1155119 |    9024.4 |   1428972 |  11163.8 | +23.7%
 pgbench_accounts_pkey | 300000000 |     822573 |    6426.4 |    822573 |   6426.4 |     0
 tenner                | 300000000 |     346050 |    2703.5 |    346050 |   2703.5 |     0
(4 rows)

DB size: 59.3..62.8 (+3.5GB / +5.9%)

TPS
---
     query      | Master TPS | Patched TPS |  diff
----------------+------------+-------------+-------
UPDATE + SELECT |       2413 |        2473 | +2.5%
3 SELECT in txn |      19737 |       19545 | -0.9%
15min SELECT    |       0.74 |        1.03 |  +39%


Based on the figures and also on the graphs attached, I can tell v8 has no visible regression
in terms of TPS, IO pattern changes slightly, but the end result is worth it.
In my view, this patch can be applied from a performance POV.

I wanted to share these before I'll finish with the code review, I'm planning to send it tomorrow.


--
Victor Yegorov
Attachment

pgsql-hackers by date:

Previous
From: Jürgen Purtz
Date:
Subject: Re: Additional Chapter for Tutorial
Next
From: Jeff Janes
Date:
Subject: Re: Supporting = operator in gin/gist_trgm_ops