Re: Proposed LogWriter Scheme, WAS: Potential Large Performance - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Proposed LogWriter Scheme, WAS: Potential Large Performance
Date
Msg-id 3718.1033831962@sss.pgh.pa.us
Whole thread Raw
In response to Re: Proposed LogWriter Scheme, WAS: Potential Large  (Hannu Krosing <hannu@tm.ee>)
Responses Re: Proposed LogWriter Scheme, WAS: Potential Large Performance  ("Curtis Faith" <curtis@galtair.com>)
Re: Proposed LogWriter Scheme, WAS: Potential Large  (Hannu Krosing <hannu@tm.ee>)
List pgsql-hackers
Hannu Krosing <hannu@tm.ee> writes:
> The writer process should just issue a continuous stream of
> aio_write()'s while there are any waiters and keep track which waiters
> are safe to continue - thus no guessing of who's gonna commit.

This recipe sounds like "eat I/O bandwidth whether we need it or not".
It might be optimal in the case where activity is so heavy that we
do actually need a WAL write on every disk revolution, but in any
scenario where we're not maxing out the WAL disk's bandwidth, it will
hurt performance.  In particular, it would seriously degrade performance
if the WAL file isn't on its own spindle but has to share bandwidth with
data file access.

What we really want, of course, is "write on every revolution where
there's something worth writing" --- either we've filled a WAL blovk
or there is a commit pending.  But that just gets us back into the
same swamp of how-do-you-guess-whether-more-commits-will-arrive-soon.
I don't see how an extra process makes that problem any easier.

BTW, it would seem to me that aio_write() buys nothing over plain write()
in terms of ability to gang writes.  If we issue the write at time T
and it completes at T+X, we really know nothing about exactly when in
that interval the data was read out of our WAL buffers.  We cannot
assume that commit records that were stored into the WAL buffer during
that interval got written to disk.  The only safe assumption is that
only records that were in the buffer at time T are down to disk; and
that means that late arrivals lose.  You can't issue aio_write
immediately after the previous one completes and expect that this
optimizes performance --- you have to delay it as long as you possibly
can in hopes that more commit records arrive.  So it comes down to being
the same problem.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Doug McNaught
Date:
Subject: Re: Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching
Next
From: Tom Lane
Date:
Subject: Re: [SQL] [GENERAL] CURRENT_TIMESTAMP