Re: WAL Performance Improvements - Mailing list pgsql-patches

From Janardhana Reddy
Subject Re: WAL Performance Improvements
Date
Msg-id 3C79E3A4.77005625@mediaring.com.sg
Whole thread Raw
In response to WAL Performance Improvements  (Janardhana Reddy <jana-reddy@mediaring.com.sg>)
List pgsql-patches
Tom Lane wrote:

> Janardhana Reddy <jana-reddy@mediaring.com.sg> writes:
> >   I've attached a patch  which should improve the performance of WAL by
> > reducing the fsync time
> >  and write time by 50%(if OS page size is 4k) , if the transaction
> > generate the WAL data  less then 4k. Instead of
> >  writing every time  8k data in to the WAL file  it will write only the
> > portion of the data which
> >  as changed from the last time(Example : if transaction generates 150
> > bytes of WAL data ,then it writes
> >  only 150 bytes instead of 8k).
>
> As near as I can tell, this breaks WAL by failing to ensure that the
> rest of the current page is zeroed.  After crash and recovery, you might
> read obsolete WAL records (written during the previous cycle of life
> of the WAL segment file) and think they are valid.
>
> I'd also be interested to see the measurements backing up the claim of 50%
> performance improvement.  That'd depend very largely on the filesystem block
> size, no?
>
>                         regards, tom lane

 correct, this breaks WAL by failing to ensure that the rest of the
current page is zeroed  when the WAL file is reused. I am thinking to
 fix this by writing an extra WAL record(few bytes which are zeroed ) more when
there is write
 and size of data is less then BLKSIZE, this should fix the problem.

  I think  performance  improvement depends on the  OS page size , since OS
looks which page is dirty  and writes entire page
 at the the of sync even if few bytes of the page are modified. I think
 for linux it is 4k. The measurement of the test on Linux is as follows:

This is output of "strace -c" of the backend before the patch is applied:
 % time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 87.75    0.462903        2269       204           fdatasync
  6.13    0.032322         158       204           send
  2.91    0.015330          75       204           recv
  2.55    0.013477          63       214           write
  0.23    0.001226           6       210           lseek
  0.21    0.001089           5       204           time
  0.15    0.000765           4       204           gettimeofday
  0.07    0.000362          91         4           read
  0.01    0.000035          35         1           open
------ ----------- ----------- --------- --------- ----------------
100.00    0.527509                  1449           total

This ouput is  after the patch is applied
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 47.92    0.101630         498       204           fdatasync
 47.14    0.099969         490       204           recv
  2.30    0.004879          23       215           write
  1.57    0.003340          16       204           send
  0.51    0.001084           5       204           time
  0.38    0.000809           4       204           gettimeofday
  0.13    0.000269          67         4           read
  0.02    0.000046           7         7           lseek
  0.02    0.000041          41         1           open
------ ----------- ----------- --------- --------- ----------------
100.00    0.212067                  1247           total

     The main  improvement comes from  fdatasync from  2269 usec to 498 usec.
but i expect
 the fdatasync time to reduce by 50% (since the linux OS 2.4  uses 4K page
size) but all the tests show the reduction by 75%. In all the tests
 each  transaction generates/writes  150 bytes in to the WAL file.

regards
jana



pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Updated TODO item
Next
From: Paul Eggert
Date:
Subject: support for POSIX 1003.1-2001 hosts