Re: Improve WALRead() to suck data directly from WAL buffers when possible - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date
Msg-id CALj2ACUpQGiwQTzmoSMOFk5=WiJc06FcYpxzBX0SEej4ProRzg@mail.gmail.com
Whole thread Raw
In response to Re: Improve WALRead() to suck data directly from WAL buffers when possible  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Improve WALRead() to suck data directly from WAL buffers when possible  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
On Wed, Mar 1, 2023 at 9:45 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Feb 28, 2023 at 10:38:31AM +0530, Bharath Rupireddy wrote:
> > On Tue, Feb 28, 2023 at 6:14 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> Why do we only read a page at a time in XLogReadFromBuffersGuts()?  What is
> >> preventing us from copying all the data we need in one go?
> >
> > Note that most of the WALRead() callers request a single page of
> > XLOG_BLCKSZ bytes even if the server has less or more available WAL
> > pages. It's the streaming replication wal sender that can request less
> > than XLOG_BLCKSZ bytes and upto MAX_SEND_SIZE (16 * XLOG_BLCKSZ). And,
> > if we read, say, MAX_SEND_SIZE at once while holding
> > WALBufMappingLock, that might impact concurrent inserters (at least, I
> > can say it in theory) - one of the main intentions of this patch is
> > not to impact inserters much.
>
> Perhaps we should test both approaches to see if there is a noticeable
> difference.  It might not be great for concurrent inserts to repeatedly
> take the lock, either.  If there's no real difference, we might be able to
> simplify the code a bit.

I took a stab at this - acquire WALBufMappingLock separately for each
requested WAL buffer page vs acquire WALBufMappingLock once for all
requested WAL buffer pages. I chose the pgbench tpcb-like benchmark
that has 3 UPDATE statements and 1 INSERT statement. I ran pgbench for
30min with scale factor 100 and 4096 clients with primary and 1 async
standby, see [1]. I captured wait_events to see the contention on
WALBufMappingLock. I haven't noticed any contention on the lock and no
difference in TPS too, see [2] for results on HEAD, see [3] for
results on v6 patch which has "acquire WALBufMappingLock separately
for each requested WAL buffer page" strategy and see [4] for results
on v7 patch (attached herewith) which has "acquire WALBufMappingLock
once for all requested WAL buffer pages" strategy. Another thing to
note from the test results is that reduction in WALRead IO wait events
from 136 on HEAD to 1 on v6 or v7 patch. So, the read from WAL buffers
is really  helping here.

With these observations, I'd like to use the approach that acquires
WALBufMappingLock once for all requested WAL buffer pages unlike v6
and the previous patches.

I'm attaching the v7 patch set with this change for further review.

[1]
shared_buffers = '8GB'
wal_buffers = '1GB'
max_wal_size = '16GB'
max_connections = '5000'
archive_mode = 'on'
archive_command='cp %p /home/ubuntu/archived_wal/%f'
./pgbench --initialize --scale=100 postgres
./pgbench -n -M prepared -U ubuntu postgres -b tpcb-like -c4096 -j4096 -T1800

[2]
HEAD:
done in 20.03 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 15.53 s, vacuum 0.19 s, primary keys 4.30 s).
tps = 11654.475345 (without initial connection time)

50950253  Lock            | transactionid
16472447  Lock            | tuple
3869523  LWLock          | LockManager
 739283  IPC             | ProcArrayGroupUpdate
 718549                  |
 439877  LWLock          | WALWrite
 130737  Client          | ClientRead
 121113  LWLock          | BufferContent
  70778  LWLock          | WALInsert
  43346  IPC             | XactGroupUpdate
  18547
  18546  Activity        | LogicalLauncherMain
  18545  Activity        | AutoVacuumMain
  18272  Activity        | ArchiverMain
  17627  Activity        | WalSenderMain
  17207  Activity        | WalWriterMain
  15455  IO              | WALSync
  14963  LWLock          | ProcArray
  14747  LWLock          | XactSLRU
  13943  Timeout         | CheckpointWriteDelay
  10519  Activity        | BgWriterHibernate
   8022  Activity        | BgWriterMain
   4486  Timeout         | SpinDelay
   4443  Activity        | CheckpointerMain
   1435  Lock            | extend
    670  LWLock          | XidGen
    373  IO              | WALWrite
    283  Timeout         | VacuumDelay
    268  IPC             | ArchiveCommand
    249  Timeout         | VacuumTruncate
    136  IO              | WALRead
    115  IO              | WALInitSync
     74  IO              | DataFileWrite
     67  IO              | WALInitWrite
     36  IO              | DataFileFlush
     35  IO              | DataFileExtend
     17  IO              | DataFileRead
      4  IO              | SLRUWrite
      3  IO              | BufFileWrite
      2  IO              | DataFileImmediateSync
      1 Tuples only is on.
      1  LWLock          | SInvalWrite
      1  LWLock          | LockFastPath
      1  IO              | ControlFileSyncUpdate

[3]
done in 19.99 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 15.52 s, vacuum 0.18 s, primary keys 4.28 s).
tps = 11689.584538 (without initial connection time)

50678977  Lock            | transactionid
16252048  Lock            | tuple
4146827  LWLock          | LockManager
 768256                  |
 719923  IPC             | ProcArrayGroupUpdate
 432836  LWLock          | WALWrite
 140354  Client          | ClientRead
 124203  LWLock          | BufferContent
  74355  LWLock          | WALInsert
  39852  IPC             | XactGroupUpdate
  30728
  30727  Activity        | LogicalLauncherMain
  30726  Activity        | AutoVacuumMain
  30420  Activity        | ArchiverMain
  29881  Activity        | WalSenderMain
  29418  Activity        | WalWriterMain
  23428  Activity        | BgWriterHibernate
  15960  Timeout         | CheckpointWriteDelay
  15840  IO              | WALSync
  15066  LWLock          | ProcArray
  14577  Activity        | CheckpointerMain
  14377  LWLock          | XactSLRU
   7291  Activity        | BgWriterMain
   4336  Timeout         | SpinDelay
   1707  Lock            | extend
    720  LWLock          | XidGen
    362  Timeout         | VacuumTruncate
    360  IO              | WALWrite
    304  Timeout         | VacuumDelay
    301  IPC             | ArchiveCommand
    106  IO              | WALInitSync
     82  IO              | DataFileWrite
     66  IO              | WALInitWrite
     45  IO              | DataFileFlush
     25  IO              | DataFileExtend
     18  IO              | DataFileRead
      5  LWLock          | LockFastPath
      2  IO              | DataFileSync
      2  IO              | DataFileImmediateSync
      1 Tuples only is on.
      1  LWLock          | BufferMapping
      1  IO              | WALRead
      1  IO              | SLRUWrite
      1  IO              | SLRURead
      1  IO              | ReplicationSlotSync
      1  IO              | BufFileRead

[4]
done in 19.92 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 15.53 s, vacuum 0.23 s, primary keys 4.16 s).
tps = 11671.869074 (without initial connection time)

50614021  Lock            | transactionid
16482561  Lock            | tuple
4086451  LWLock          | LockManager
 777507                  |
 714329  IPC             | ProcArrayGroupUpdate
 420593  LWLock          | WALWrite
 138142  Client          | ClientRead
 125381  LWLock          | BufferContent
  75283  LWLock          | WALInsert
  38759  IPC             | XactGroupUpdate
  20283
  20282  Activity        | LogicalLauncherMain
  20281  Activity        | AutoVacuumMain
  20002  Activity        | ArchiverMain
  19467  Activity        | WalSenderMain
  19036  Activity        | WalWriterMain
  15836  IO              | WALSync
  15708  Timeout         | CheckpointWriteDelay
  15346  LWLock          | ProcArray
  15095  LWLock          | XactSLRU
  11852  Activity        | BgWriterHibernate
   8424  Activity        | BgWriterMain
   4636  Timeout         | SpinDelay
   4415  Activity        | CheckpointerMain
   2048  Lock            | extend
   1457  Timeout         | VacuumTruncate
    646  LWLock          | XidGen
    402  IO              | WALWrite
    306  Timeout         | VacuumDelay
    278  IPC             | ArchiveCommand
    117  IO              | WALInitSync
     74  IO              | DataFileWrite
     66  IO              | WALInitWrite
     35  IO              | DataFileFlush
     29  IO              | DataFileExtend
     24  LWLock          | LockFastPath
     14  IO              | DataFileRead
      2  IO              | SLRUWrite
      2  IO              | DataFileImmediateSync
      2  IO              | BufFileWrite
      1 Tuples only is on.
      1  LWLock          | BufferMapping
      1  IO              | WALRead
      1  IO              | SLRURead
      1  IO              | BufFileRead

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Önder Kalacı
Date:
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Next
From: Heikki Linnakangas
Date:
Subject: Re: Testing autovacuum wraparound (including failsafe)