Re: WAL Performance Improvements - Mailing list pgsql-patches
From | Janardhana Reddy |
---|---|
Subject | Re: WAL Performance Improvements |
Date | |
Msg-id | 3C79E3A4.77005625@mediaring.com.sg Whole thread Raw |
In response to | WAL Performance Improvements (Janardhana Reddy <jana-reddy@mediaring.com.sg>) |
List | pgsql-patches |
Tom Lane wrote: > Janardhana Reddy <jana-reddy@mediaring.com.sg> writes: > > I've attached a patch which should improve the performance of WAL by > > reducing the fsync time > > and write time by 50%(if OS page size is 4k) , if the transaction > > generate the WAL data less then 4k. Instead of > > writing every time 8k data in to the WAL file it will write only the > > portion of the data which > > as changed from the last time(Example : if transaction generates 150 > > bytes of WAL data ,then it writes > > only 150 bytes instead of 8k). > > As near as I can tell, this breaks WAL by failing to ensure that the > rest of the current page is zeroed. After crash and recovery, you might > read obsolete WAL records (written during the previous cycle of life > of the WAL segment file) and think they are valid. > > I'd also be interested to see the measurements backing up the claim of 50% > performance improvement. That'd depend very largely on the filesystem block > size, no? > > regards, tom lane correct, this breaks WAL by failing to ensure that the rest of the current page is zeroed when the WAL file is reused. I am thinking to fix this by writing an extra WAL record(few bytes which are zeroed ) more when there is write and size of data is less then BLKSIZE, this should fix the problem. I think performance improvement depends on the OS page size , since OS looks which page is dirty and writes entire page at the the of sync even if few bytes of the page are modified. I think for linux it is 4k. The measurement of the test on Linux is as follows: This is output of "strace -c" of the backend before the patch is applied: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 87.75 0.462903 2269 204 fdatasync 6.13 0.032322 158 204 send 2.91 0.015330 75 204 recv 2.55 0.013477 63 214 write 0.23 0.001226 6 210 lseek 0.21 0.001089 5 204 time 0.15 0.000765 4 204 gettimeofday 0.07 0.000362 91 4 read 0.01 0.000035 35 1 open ------ ----------- ----------- --------- --------- ---------------- 100.00 0.527509 1449 total This ouput is after the patch is applied % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 47.92 0.101630 498 204 fdatasync 47.14 0.099969 490 204 recv 2.30 0.004879 23 215 write 1.57 0.003340 16 204 send 0.51 0.001084 5 204 time 0.38 0.000809 4 204 gettimeofday 0.13 0.000269 67 4 read 0.02 0.000046 7 7 lseek 0.02 0.000041 41 1 open ------ ----------- ----------- --------- --------- ---------------- 100.00 0.212067 1247 total The main improvement comes from fdatasync from 2269 usec to 498 usec. but i expect the fdatasync time to reduce by 50% (since the linux OS 2.4 uses 4K page size) but all the tests show the reduction by 75%. In all the tests each transaction generates/writes 150 bytes in to the WAL file. regards jana
pgsql-patches by date: