Re: PG 12 draft release notes - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: PG 12 draft release notes
Date
Msg-id CAH2-WzkPBoDL_HrT6gvG4uVS2H-zXP1pW0h5M=JYCibJGY_F5w@mail.gmail.com
Whole thread Raw
In response to Re: PG 12 draft release notes  (Bruce Momjian <bruce@momjian.us>)
Responses Re: PG 12 draft release notes
List pgsql-hackers
On Tue, May 21, 2019 at 1:51 PM Bruce Momjian <bruce@momjian.us> wrote:
> > My concern here (which I believe Alexander shares) is that it doesn't
> > make sense to group these two items together. They're two totally
> > unrelated pieces of work. Alexander's work does more or less help with
> > lock contention with writes, whereas the feature that that was merged
> > with is about preventing index bloat, which is mostly helpful for
> > reads (it helps writes to the extent that writes are also reads).
> >
> > The release notes go on to say that this item "gives better
> > performance for UPDATEs and DELETEs on indexes with many duplicates",
> > which is wrong. That is something that should have been listed below,
> > under the "duplicate index entries in heap-storage order" item.
>
> OK, I understand how the lock stuff improves things, but I have
> forgotten how indexes are made smaller.  Is it because of better page
> split logic?

That is clearly the main reason, though suffix truncation (which
represents that trailing/suffix columns in index tuples from branch
pages have "negative infinity" sentinel values) also contributes to
making indexes smaller.

The page split stuff was mostly added by commit fab250243 ("Consider
secondary factors during nbtree splits"), but commit f21668f32 ("Add
"split after new tuple" nbtree optimization") added to that in a way
that really helped the TPC-C indexes. The TPC-C indexes are about 40%
smaller now.

> > > Author: Peter Geoghegan <pg@bowt.ie>
> > > 2019-03-20 [dd299df81] Make heap TID a tiebreaker nbtree index column.

> As I remember the benefit currently is that you can find update and
> deleted rows faster, right?

Yes, that's true when writing to the index. But more importantly, it
really helps VACUUM when there are lots of duplicates, which is fairly
common in the real world (imagine an index where 20% of the rows are
NULL, for example). In effect, there are no duplicates anymore,
because all index tuples are unique internally.

Indexes with lots of duplicates group older rows together, and new
rows together, because treating heap TID as a tiebreaker naturally has
that effect. VACUUM will generally dirty far fewer pages, because bulk
deletions tend to be correlated with heap TID. And, VACUUM has a much
better chance of deleting entire leaf pages, because dead tuples end
up getting grouped together.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Michael Meskes
Date:
Subject: Re: SQL statement PREPARE does not work in ECPG
Next
From: Juanjo Santamaria Flecha
Date:
Subject: Re: MSVC Build support with visual studio 2019