Re: Batching page logging during B-tree build - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Batching page logging during B-tree build
Date
Msg-id CAH2-WznHqf9kGt80TdFeQRpX61dMo+SV3ejCAcPXO8J+U9B7Ew@mail.gmail.com
Whole thread Raw
In response to Re: Batching page logging during B-tree build  (Andres Freund <andres@anarazel.de>)
Responses Re: Batching page logging during B-tree build
List pgsql-hackers
On Wed, Sep 23, 2020 at 11:29 AM Andres Freund <andres@anarazel.de> wrote:
> I wonder what the effect of logging WAL records as huge as this (~256kb)
> is on concurrent sessions. I think it's possible that logging 32 pages
> at once would cause latency increases for concurrent OLTP-ish
> writes. And that a smaller batch size would reduce that, while still
> providing most of the speedup.

Something to consider, but I cannot see any speedup, and so have no
way of evaluating the idea right now.

> > It doesn't seem to make any difference on my machine, which has an
> > NVME SSD (a Samsung 970 Pro). This is quite a fast SSD, though the
> > sync time isn't exceptional.
>
> Yea, they are surprisingly slow at syncing, somewhat disappointing for
> the upper end of the consumer oriented devices.

I hesitate to call anything about this SSD disappointing, since
overall it performs extremely well -- especially when you consider the
price tag.

Detachable storage is a trend that's here to stay, so the storage
latency is still probably a lot lower than what you'll see on many
serious production systems. For better or worse.

> Really should replace WAL compression with lz4 (or possibly zstd).

Yeah. WAL compression is generally a good idea, and we should probably
find a way to enable it by default (in the absence of some better way
of dealing with the FPI bottleneck, at least). I had no idea that
compression could hurt this much with index builds until now, though.
To be clear: the *entire* index build takes 3 times more wall clock
time, start to finish -- if I drilled down to the portion of the index
build that actually writes WAL then it would be an even greater
bottleneck.

I know that we've tested different compression methods in the past,
but perhaps index build performance was overlooked. My guess is that
the compression algorithm matters a lot less with pure OLTP workloads.
Also, parallel CREATE INDEX may be a bit of an outlier here. Even
still, it's a very important outlier.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Batching page logging during B-tree build
Next
From: Etsuro Fujita
Date:
Subject: Re: problem with RETURNING and update row movement