Re: Potential Large Performance Gain in WAL synching - Mailing list pgsql-hackers
From | Curtis Faith |
---|---|
Subject | Re: Potential Large Performance Gain in WAL synching |
Date | |
Msg-id | DMEEJMCDOJAKPPFACMPMMECCCEAA.curtis@galtair.com Whole thread Raw |
In response to | Re: Potential Large Performance Gain in WAL synching (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Potential Large Performance Gain in WAL synching
Re: Potential Large Performance Gain in WAL synching |
List | pgsql-hackers |
tom lane replies: > "Curtis Faith" <curtis@galtair.com> writes: > > So, why don't we use files opened with O_DSYNC | O_APPEND for > the WAL log > > and then use aio_write for all log writes? > > We already offer an O_DSYNC option. It's not obvious to me what > aio_write brings to the table (aside from loss of portability). > You still have to wait for the final write to complete, no? Well, for starters by the time the write which includes the commit log entry is written, much of the writing of the log for the transaction will already be on disk, or in a controller on its way. I don't see any O_NONBLOCK or O_NDELAY references in the sources so it looks like the log writes are blocking. If I read correctly, XLogInsert calls XLogWrite which calls write which blocks. If these assumptions are correct, there should be some significant gain here but I won't know how much until I try to change it. This issue only affects the speed of a given back-ends transaction processing capability. The REAL issue and the one that will greatly affect total system throughput is that of contention on the file locks. Since fsynch needs to obtain a write lock on the file descriptor, as does the write calls which originate from XLogWrite as the writes are written to the disk, other back-ends will block while another transaction is committing if the log cache fills to the point where their XLogInsert results in a XLogWrite call to flush the log cache. I'd guess this means that one won't gain much by adding other back-end processes past three or four if there are a lot of inserts or updates. The method I propose does not result in any blocking because of writes other than the final commit's write and it has the very significant advantage of allowing other transactions (from other back-ends) to continue until they enter commit (and blocking waiting for their final commit write to complete). > > 2) Allow transactions to complete and do work while other threads are > > waiting on the completion of the log write. > > I'm missing something. There is no useful work that a transaction can > do between writing its commit record and reporting completion, is there? > It has to wait for that record to hit disk. The key here is that a thread that has not committed and therefore is not blocking can do work while "other threads" (should have said back-ends or processes) are waiting on their commit writes. - Curtis P.S. If I am right in my assumptions about the way the current system works, I'll bet the change would speed up inserts in Shridhar's huge database test by at least a factor of two or three, perhaps even an order of magnitude. :-) > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Thursday, October 03, 2002 7:17 PM > To: Curtis Faith > Cc: Pgsql-Hackers > Subject: Re: [HACKERS] Potential Large Performance Gain in WAL synching > > > "Curtis Faith" <curtis@galtair.com> writes: > > So, why don't we use files opened with O_DSYNC | O_APPEND for > the WAL log > > and then use aio_write for all log writes? > > We already offer an O_DSYNC option. It's not obvious to me what > aio_write brings to the table (aside from loss of portability). > You still have to wait for the final write to complete, no? > > > 2) Allow transactions to complete and do work while other threads are > > waiting on the completion of the log write. > > I'm missing something. There is no useful work that a transaction can > do between writing its commit record and reporting completion, is there? > It has to wait for that record to hit disk. > > regards, tom lane >
pgsql-hackers by date: