Improve WALRead() to suck data directly from WAL buffers when possible - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Improve WALRead() to suck data directly from WAL buffers when possible |
Date | |
Msg-id | CALj2ACXKKK=wbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54+Na=Q@mail.gmail.com Whole thread Raw |
Responses |
Re: Improve WALRead() to suck data directly from WAL buffers when possible
(Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
|
List | pgsql-hackers |
Hi, WALRead() currently reads WAL from the WAL file on the disk, which means, the walsenders serving streaming and logical replication (callers of WALRead()) will have to hit the disk/OS's page cache for reading the WAL. This may increase the amount of read IO required for all the walsenders put together as one typically maintains many standbys/subscribers on production servers for high availability, disaster recovery, read-replicas and so on. Also, it may increase replication lag if all the WAL reads are always hitting the disk. It may happen that WAL buffers contain the requested WAL, if so, the WALRead() can attempt to read from the WAL buffers first before reading from the file. If the read hits the WAL buffers, then reading from the file on disk is avoided. This mainly reduces the read IO/read system calls. It also enables us to do other features specified elsewhere [1]. I'm attaching a patch that implements the idea which is also noted elsewhere [2]. I've run some tests [3]. The WAL buffers hit ratio with the patch stood at 95%, in other words, the walsenders avoided 95% of the time reading from the file. The benefit, if measured in terms of the amount of data - 79% (13.5GB out of total 17GB) of the requested WAL is read from the WAL buffers as opposed to 21% from the file. Note that the WAL buffers hit ratio can be very low for write-heavy workloads, in which case, file reads are inevitable. The patch introduces concurrent readers for the WAL buffers, so far only there are concurrent writers. In the patch, WALRead() takes just one lock (WALBufMappingLock) in shared mode to enable concurrent readers and does minimal things - checks if the requested WAL page is present in WAL buffers, if so, copies the page and releases the lock. I think taking just WALBufMappingLock is enough here as the concurrent writers depend on it to initialize and replace a page in WAL buffers. I'll add this to the next commitfest. Thoughts? [1] https://www.postgresql.org/message-id/CALj2ACXCSM%2BsTR%3D5NNRtmSQr3g1Vnr-yR91azzkZCaCJ7u4d4w%40mail.gmail.com [2] * XXX probably this should be improved to suck data directly from the * WAL buffers when possible. */ bool WALRead(XLogReaderState *state, [3] 1 primary, 1 sync standby, 1 async standby ./pgbench --initialize --scale=300 postgres ./pgbench --jobs=16 --progress=300 --client=32 --time=900 --username=ubuntu postgres PATCHED: -[ RECORD 1 ]----------+---------------- application_name | assb1 wal_read | 31005 wal_read_bytes | 3800607104 wal_read_time | 779.402 wal_read_buffers | 610611 wal_read_bytes_buffers | 14493226440 wal_read_time_buffers | 3033.309 sync_state | async -[ RECORD 2 ]----------+---------------- application_name | ssb1 wal_read | 31027 wal_read_bytes | 3800932712 wal_read_time | 696.365 wal_read_buffers | 610580 wal_read_bytes_buffers | 14492900832 wal_read_time_buffers | 2989.507 sync_state | sync HEAD: -[ RECORD 1 ]----+---------------- application_name | assb1 wal_read | 705627 wal_read_bytes | 18343480640 wal_read_time | 7607.783 sync_state | async -[ RECORD 2 ]----+------------ application_name | ssb1 wal_read | 705625 wal_read_bytes | 18343480640 wal_read_time | 4539.058 sync_state | sync -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: