Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Deleting older versions in unique indexes to avoid page splits
Date
Msg-id CAH2-WznjQQwSPSxBiZ6wQyBiKqFmfdjOdeqp28otqw551T7jcQ@mail.gmail.com
Whole thread Raw
In response to Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Deleting older versions in unique indexes to avoid page splits
Re: Deleting older versions in unique indexes to avoid page splits
List pgsql-hackers
On Wed, Oct 14, 2020 at 7:40 AM Peter Geoghegan <pg@bowt.ie> wrote:
> Right. The trick is to pay only a fixed low cost (maybe as low as one
> heap page access) when we start out, and ratchet it up only if the
> first heap page access looks promising.

Just as an example of how the patch can help, consider the following
pgbench variant script:

\set aid1 random_gaussian(1, 100000 * :scale, 2.0)
\set aid2 random_gaussian(1, 100000 * :scale, 2.5)
\set aid3 random_gaussian(1, 100000 * :scale, 2.2)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid1;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid2;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid3;
END;

(These details you see here are a bit arbitrary; don't worry about the
specifics too much.)

Before running the script with pgbench, I initialized pgbench to scale
1500, and made some changes to the indexing (in order to actually test
the patch). There was no standard pgbench_accounts PK. Instead, I
created a unique index that had an include column, which is enough to
make every update a non-HOT update. I also added two more redundant
non-unique indexes to create more overhead from non-HOT updates. It
looked like this:

create unique index aid_pkey_include_abalance on pgbench_accounts
(aid) include (abalance);
create index one on pgbench_accounts (aid);
create index two on pgbench_accounts (aid);

So 3 indexes on the accounts table.

I ran the script for two hours and 16 clients with the patch, then for
another two hours with master. After that time, all 3 indexes were
exactly the same size with the patch, but had almost doubled in size
on master:

aid_pkey_include_abalance: 784,264 pages (or ~5.983 GiB)
one: 769,545 pages (or ~5.871 GiB)
two: 770,295 pages (or ~5.876 GiB)

(With the patch, all three indexes were 100% pristine -- they remained
precisely 411,289 pages in size by the end, which is ~3.137 GiB.)

Note that traditional deduplication is used by the indexes I've called
"one" and "two" here, but not the include index called
"aid_pkey_include_abalance". But it makes little difference, for
reasons that will be obvious if you think about what this looks like
at the leaf page level. Cases that Postgres 13 deduplication does
badly with are often the ones that this new mechanism does well with.
Deduplication by deleting and by merging are truly complementary -- I
haven't just structured the code that way because it was convenient to
use dedup infrastructure just to get the dups at the start. (Yes, it
*was* convenient, but there clearly are cases where each mechanism
competes initially, before nbtree converges on the best strategy at
the local level. So FWIW this patch is a natural and logical extension
of the deduplication work in my mind.)

The TPS/throughput is about what you'd expect for the two hour run:

18,988.762398 TPS for the patch
11,123.551707 TPS for the master branch.

This is a ~1.7x improvement, but I can get more than 3x by changing
the details at the start -- just add more indexes. I don't think that
the precise throughput difference you see here matters. The important
point is that we've more or less fixed a pathological set of behaviors
that have poorly understood cascading effects. Full disclosure: I rate
limited pgbench to 20k for this run, which probably wasn't significant
because neither patch nor master hit that limit for long.

Big latency improvements for that same run, too:

Patch:

statement latencies in milliseconds:
         0.001  \set aid1 random_gaussian(1, 100000 * :scale, 2.0)
         0.000  \set aid2 random_gaussian(1, 100000 * :scale, 2.5)
         0.000  \set aid3 random_gaussian(1, 100000 * :scale, 2.2)
         0.000  \set bid random(1, 1 * :scale)
         0.000  \set tid random(1, 10 * :scale)
         0.000  \set delta random(-5000, 5000)
         0.057  BEGIN;
         0.294  UPDATE pgbench_accounts SET abalance = abalance +
:delta WHERE aid = :aid1;
         0.204  SELECT abalance FROM pgbench_accounts WHERE aid = :aid2;
         0.195  SELECT abalance FROM pgbench_accounts WHERE aid = :aid3;
         0.090  END;

Master:

statement latencies in milliseconds:
         0.002  \set aid1 random_gaussian(1, 100000 * :scale, 2.0)
         0.001  \set aid2 random_gaussian(1, 100000 * :scale, 2.5)
         0.001  \set aid3 random_gaussian(1, 100000 * :scale, 2.2)
         0.001  \set bid random(1, 1 * :scale)
         0.000  \set tid random(1, 10 * :scale)
         0.001  \set delta random(-5000, 5000)
         0.084  BEGIN;
         0.604  UPDATE pgbench_accounts SET abalance = abalance +
:delta WHERE aid = :aid1;
         0.317  SELECT abalance FROM pgbench_accounts WHERE aid = :aid2;
         0.311  SELECT abalance FROM pgbench_accounts WHERE aid = :aid3;
         0.120  END;

Note that the mechanism added by the patch naturally limits the number
of versions that are in the index for each logical row, which seems
much more important than the total amount of garbage tuples cleaned
up. It's great that the index is half its original size, but even that
is less important than the effect of more or less bounding the worst
case number of heap pages accessed by point index scans. Even though
this patch shows big performance improvements (as well as very small
performance regressions for small indexes with skew), the patch is
mostly about stability. I believe that Postgres users want greater
stability and predictability in this area more than anything else.

The behavior of the system as a whole that we see for the master
branch here is not anywhere near linear. Page splits are of course
expensive, but they also occur in distinct waves [1] and have lasting
consequences. They're very often highly correlated, with clear tipping
points, so you see relatively sudden slow downs in the real world.
Worse still, with skew the number of hot pages that you have can
double in a short period of time. This very probably happens at the
worst possible time for the user, since the database was likely
already organically experiencing very high index writes at the point
of experiencing the first wave of splits (splits correlated both
within and across indexes on the same table). And, from that point on,
the number of FPIs for the same workload also doubles forever (or at
least until REINDEX).

[1] https://btw.informatik.uni-rostock.de/download/tagungsband/B2-2.pdf
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Commitfest manager 2020-11
Next
From: Peter Geoghegan
Date:
Subject: Re: More aggressive vacuuming of temporary tables