Re: Potential Large Performance Gain in WAL synching - Mailing list pgsql-hackers

From Curtis Faith
Subject Re: Potential Large Performance Gain in WAL synching
Date
Msg-id DMEEJMCDOJAKPPFACMPMMECCCEAA.curtis@galtair.com
Whole thread Raw
In response to Re: Potential Large Performance Gain in WAL synching  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Potential Large Performance Gain in WAL synching  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: Potential Large Performance Gain in WAL synching  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
tom lane replies:
> "Curtis Faith" <curtis@galtair.com> writes:
> > So, why don't we use files opened with O_DSYNC | O_APPEND for 
> the WAL log
> > and then use aio_write for all log writes?
> 
> We already offer an O_DSYNC option.  It's not obvious to me what
> aio_write brings to the table (aside from loss of portability).
> You still have to wait for the final write to complete, no?

Well, for starters by the time the write which includes the commit
log entry is written, much of the writing of the log for the
transaction will already be on disk, or in a controller on its 
way.

I don't see any O_NONBLOCK or O_NDELAY references in the sources 
so it looks like the log writes are blocking. If I read correctly,
XLogInsert calls XLogWrite which calls write which blocks. If these
assumptions are correct, there should be some significant gain here but I
won't know how much until I try to change it. This issue only affects the
speed of a given back-ends transaction processing capability.

The REAL issue and the one that will greatly affect total system
throughput is that of contention on the file locks. Since fsynch needs to
obtain a write lock on the file descriptor, as does the write calls which
originate from XLogWrite as the writes are written to the disk, other
back-ends will block while another transaction is committing if the
log cache fills to the point where their XLogInsert results in a 
XLogWrite call to flush the log cache. I'd guess this means that one
won't gain much by adding other back-end processes past three or four
if there are a lot of inserts or updates.

The method I propose does not result in any blocking because of writes
other than the final commit's write and it has the very significant
advantage of allowing other transactions (from other back-ends) to
continue until they enter commit (and blocking waiting for their final
commit write to complete).

> > 2) Allow transactions to complete and do work while other threads are
> > waiting on the completion of the log write.
> 
> I'm missing something.  There is no useful work that a transaction can
> do between writing its commit record and reporting completion, is there?
> It has to wait for that record to hit disk.

The key here is that a thread that has not committed and therefore is
not blocking can do work while "other threads" (should have said back-ends 
or processes) are waiting on their commit writes.

- Curtis

P.S. If I am right in my assumptions about the way the current system
works, I'll bet the change would speed up inserts in Shridhar's huge
database test by at least a factor of two or three, perhaps even an
order of magnitude. :-)

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Thursday, October 03, 2002 7:17 PM
> To: Curtis Faith
> Cc: Pgsql-Hackers
> Subject: Re: [HACKERS] Potential Large Performance Gain in WAL synching 
> 
> 
> "Curtis Faith" <curtis@galtair.com> writes:
> > So, why don't we use files opened with O_DSYNC | O_APPEND for 
> the WAL log
> > and then use aio_write for all log writes?
> 
> We already offer an O_DSYNC option.  It's not obvious to me what
> aio_write brings to the table (aside from loss of portability).
> You still have to wait for the final write to complete, no?
> 
> > 2) Allow transactions to complete and do work while other threads are
> > waiting on the completion of the log write.
> 
> I'm missing something.  There is no useful work that a transaction can
> do between writing its commit record and reporting completion, is there?
> It has to wait for that record to hit disk.
> 
>             regards, tom lane
> 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Return of INSTEAD rules
Next
From: Bruce Momjian
Date:
Subject: Re: Potential Large Performance Gain in WAL synching