Re: MultiXact\SLRU buffers configuration - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: MultiXact\SLRU buffers configuration
Date
Msg-id 9b4d17df-b811-8323-16be-3cab913216d1@enterprisedb.com
Whole thread Raw
In response to Re: MultiXact\SLRU buffers configuration  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: MultiXact\SLRU buffers configuration  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
Hi,

After the issue reported in [1] got fixed, I've restarted the multi-xact
stress test, hoping to reproduce the issue. But so far no luck :-(

I've started slightly different tests on two machines - on one machine
I've done this:

  a) init.sql

  create table t (a int);
  insert into t select i from generate_series(1,100000000) s(i);
  alter table t add primary key (a);

  b) select.sql

  SELECT * FROM t
   WHERE a = (1+mod(abs(hashint4(extract(epoch from now())::int)),
                    100000000)) FOR KEY SHARE;

  c) pgbench -n -c 32 -j 8 -f select.sql -T $((24*3600)) test

The idea is to have large table and many clients hitting a small random
subset of the rows. A sample of wait events from ~24h run looks like this:

      e_type  |        e_name        |   sum
    ----------+----------------------+----------
     LWLock   | BufferContent        | 13913863
              |                      |  7194679
     LWLock   | WALWrite             |  1710507
     Activity | LogicalLauncherMain  |   726599
     Activity | AutoVacuumMain       |   726127
     Activity | WalWriterMain        |   725183
     Activity | CheckpointerMain     |   604694
     Client   | ClientRead           |   599900
     IO       | WALSync              |   502904
     Activity | BgWriterMain         |   378110
     Activity | BgWriterHibernate    |   348464
     IO       | WALWrite             |   129412
     LWLock   | ProcArray            |     6633
     LWLock   | WALInsert            |     5714
     IO       | SLRUWrite            |     2580
     IPC      | ProcArrayGroupUpdate |     2216
     LWLock   | XactSLRU             |     2196
     Timeout  | VacuumDelay          |     1078
     IPC      | XactGroupUpdate      |      737
     LWLock   | LockManager          |      503
     LWLock   | WALBufMapping        |      295
     LWLock   | MultiXactMemberSLRU  |      267
     IO       | DataFileWrite        |       68
     LWLock   | BufferIO             |       59
     IO       | DataFileRead         |       27
     IO       | DataFileFlush        |       14
     LWLock   | MultiXactGen         |        7
     LWLock   | BufferMapping        |        1

So, nothing particularly interesting - there certainly are not many wait
events related to SLRU.

On the other machine I did this:

  a) init.sql
  create table t (a int primary key);
  insert into t select i from generate_series(1,1000) s(i);

  b) select.sql
  select * from t for key share;

  c) pgbench -n -c 32 -j 8 -f select.sql -T $((24*3600)) test

and the wait events (24h run too) look like this:

      e_type   |        e_name         |   sum
    -----------+-----------------------+----------
     LWLock    | BufferContent         | 20804925
               |                       |  2575369
     Activity  | LogicalLauncherMain   |   745780
     Activity  | AutoVacuumMain        |   745292
     Activity  | WalWriterMain         |   740507
     Activity  | CheckpointerMain      |   737691
     Activity  | BgWriterHibernate     |   731123
     LWLock    | WALWrite              |   570107
     IO        | WALSync               |   452603
     Client    | ClientRead            |   151438
     BufferPin | BufferPin             |    23466
     LWLock    | WALInsert             |    21631
     IO        | WALWrite              |    19050
     LWLock    | ProcArray             |    15082
     Activity  | BgWriterMain          |    14655
     IPC       | ProcArrayGroupUpdate  |     7772
     LWLock    | WALBufMapping         |     3555
     IO        | SLRUWrite             |     1951
     LWLock    | MultiXactGen          |     1661
     LWLock    | MultiXactMemberSLRU   |      359
     LWLock    | MultiXactOffsetSLRU   |      242
     LWLock    | XactSLRU              |      141
     IPC       | XactGroupUpdate       |      104
     LWLock    | LockManager           |       28
     IO        | DataFileRead          |        4
     IO        | ControlFileSyncUpdate |        1
     Timeout   | VacuumDelay           |        1
     IO        | WALInitWrite          |        1

Also nothing particularly interesting - few SLRU wait events.

So unfortunately this does not really reproduce the SLRU locking issues
you're observing - clearly, there has to be something else triggering
it. Perhaps this workload is too simplistic, or maybe we need to run
different queries. Or maybe the hw needs to be somewhat different (more
CPUs? different storage?)


[1]
https://www.postgresql.org/message-id/20201104013205.icogbi773przyny5@development

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Next
From: Andy Fan
Date:
Subject: Make Append Cost aware of some run time partition prune case