Re: Batching page logging during B-tree build - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: Batching page logging during B-tree build
Date
Msg-id 540584F2-A554-40C1-8F59-87AF8D623BB7@yandex-team.ru
Whole thread Raw
In response to Re: Batching page logging during B-tree build  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Batching page logging during B-tree build  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers

> 23 сент. 2020 г., в 23:19, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> On Fri, Sep 18, 2020 at 8:39 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Here is PoC with porting that same routine to B-tree. It allows to build B-trees ~10% faster on my machine.
>
> It doesn't seem to make any difference on my machine, which has an
> NVME SSD (a Samsung 970 Pro). This is quite a fast SSD, though the
> sync time isn't exceptional. My test case is "reindex index
> pgbench_accounts_pkey", with pgbench scale 500. I thought that this
> would be a sympathetic case, since it's bottlenecked on writing the
> index, with relatively little time spent scanning and sorting in
> parallel workers.
> Can you provide a test case that is sympathetic towards the patch?
Thanks for looking into this!

I've tried this test on my machine (2019 macbook) on scale 10 for 20 seconds.
With patch I get consistently ~ tps = 2.403440, without patch ~ tps = 1.951975.
On scale 500 with patch
postgres=# reindex index pgbench_accounts_pkey;
REINDEX
Time: 21577,640 ms (00:21,578)
without patch
postgres=# reindex index pgbench_accounts_pkey;
REINDEX
Time: 26139,175 ms (00:26,139)

I think it's hardware dependent, I will try on servers.
>
> BTW, I noticed that the index build is absurdly bottlenecked on
> compressing WAL with wal_compression=on. It's almost 3x slower with
> compression turned on!



> 24 сент. 2020 г., в 00:33, Andres Freund <andres@anarazel.de> написал(а):
>
>> I know that we've tested different compression methods in the past,
>> but perhaps index build performance was overlooked.
>
> I am pretty sure we have known that pglz for this was much much slower
> than alternatives. I seem to recall somebody posting convincing numbers,
> but can't find them just now.

There was a thread about different compressions[0]. It was demonstrated there that lz4 is 10 times faster on
compression.
We have a patch to speedup pglz compression x1.43 [1], but I was hoping that we will go lz4\zstd way. It seems to me
now,I actually should finish that speedup patch, it's very focused local refactoring. 

Thanks!

Best regards, Andrey Borodin.


[0]
https://www.postgresql.org/message-id/flat/ea57b49a-ecf0-481a-a77b-631833354f7d%40postgrespro.ru#dcac101f8a73dfce98924066f6a12a13
[1]
https://www.postgresql.org/message-id/flat/169163A8-C96F-4DBE-A062-7D1CECBE9E5D%40yandex-team.ru#996a194c12bacd2d093be2cb7ac54ca6


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: dynamic result sets support in extended query protocol
Next
From: Dave Cramer
Date:
Subject: Re: dynamic result sets support in extended query protocol