Re: MultiXact\SLRU buffers configuration - Mailing list pgsql-hackers

From Shawn Debnath
Subject Re: MultiXact\SLRU buffers configuration
Date
Msg-id YemDdpMrsoJFQJnU@f01898859afd.ant.amazon.com
Whole thread Raw
In response to Re: MultiXact\SLRU buffers configuration  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: MultiXact\SLRU buffers configuration
List pgsql-hackers
On Sat, Jan 15, 2022 at 12:16:59PM +0500, Andrey Borodin wrote:

> > I was planning on running a set of stress tests on these patches. Could
> > we confirm which ones we plan to include in the commitfest?
> 
> Many thanks for your interest. Here's the  latest version.

Here are the results of the multixact perf test I ran on the patch that splits
the linear SLRU caches into banks.  With my test setup, the binaries
with the patch applied performed slower marginally across the test
matrix against unpatched binaries. Here are the results:

+-------------------------------+---------------------+-----------------------+------------+
|           workload            | patched average tps | unpatched average tps | difference |
+-------------------------------+---------------------+-----------------------+------------+
| create only                   |         10250.54396 |           10349.67487 | -1.0%      |
| create and select             |         9677.711286 |           9991.065037 | -3.2%      |
| large cache create only       |         10310.96646 |           10337.16455 | -0.3%      |
| large cache create and select |          9654.24077 |           9924.270242 | -2.8%      |
+-------------------------------+---------------------+-----------------------+------------+

The test was configured in the following manner:
- AWS EC2 c5d.24xlarge instances, located in the same AZ, were used as
  the database host and the test driver. These systems have 96 vcpus and
  184 GB memory. NVMe drives were configured as RAID5.
- GUCs were changed from defaults to be the following:
    max_connections = 5000
    shared_buffers = 96GB
    max_wal_size = 2GB
    min_wal_size = 192MB
- pgbench runs were done with -c 1000 -j 1000 and a scale of 10,000
- Two multixact workloads were tested, first [0] was a create only
  script which selected 100 pgbench_account rows for share. Second
  workload [1] added a select statement to visit rows touched in the
  past which had multixacts generated for them. pgbench test script [2]
  wraps the call to the functions inside an explicit transaction.
- Large cache tests are multixact offsets cache size hard coded to 128
  and members cache size hard coded to 256.
- Row selection is based on time based approach that lets all client
  connections coordinate which rows to work with based on the
  millisecond they start executing. To allow for more multixacts to be
  generated and reduce contention, the workload uses offsets ahead of
  the start id based on a random number.
- The one bummer about these runs were that they only ran for 600
  seconds for insert only and 400 seconds for insert and select. I
  consistently ran into checkpointer getting oom-killed on this instance
  after that timeframe. Will dig into this separately. But the TPS was 
  consistent.
- Each test was repeated at least 3 times and the average of those runs
  were used.
- I am using the master branch and changes were applied on commit
  f47ed79cc8a0cfa154dc7f01faaf59822552363f


I think patch 1 is a must-have. Regarding patch 2, I would propose we 
avoid introducing more complexity into SimpleLRU cache and instead focus 
on making the SLRU to buffer cache effort [3] a reality. I would also 
add that we have a few customers in our fleet who have been successfully 
running the large cache configuration on the regular SLRU without any 
issues. With cache sizes this small, the linear searches are still quite 
efficient.

If my test workload can be made better, please let me know. Happy to 
re-run tests as needed.


[0] https://gist.github.com/sdebnath/e015561811adf721dd40dd6638969c69
[1] https://gist.github.com/sdebnath/2f3802e1fe288594b6661a7a59a7ca07
[2] https://gist.github.com/sdebnath/6bbfd5f87945a7d819e30a9a1701bc97
[3] https://www.postgresql.org/message-id/CA%2BhUKGKAYze99B-jk9NoMp-2BDqAgiRC4oJv%2BbFxghNgdieq8Q%40mail.gmail.com



--
Shawn Debnath
Amazon Web Services (AWS)



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Refactoring of compression options in pg_basebackup
Next
From: Robert Haas
Date:
Subject: Re: Pluggable toaster