Re: Separate BLCKSZ for data and logging - Mailing list pgsql-hackers

From Mark Wong
Subject Re: Separate BLCKSZ for data and logging
Date
Msg-id 200603162327.k2GNRRDZ014683@smtp.osdl.org
Whole thread Raw
In response to Re: Separate BLCKSZ for data and logging  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Thu, 16 Mar 2006 20:51:54 +0000
Simon Riggs <simon@2ndquadrant.com> wrote:

> On Thu, 2006-03-16 at 12:22 -0800, Mark Wong wrote:
> 
> > I was hoping that in the case where 2 or more data blocks are written to
> > the log that they could written once within a single larger log block. 
> > The log block size must be larger than the data block size, of course.
> 
> I think Tom's right... the OS blocksize is smaller than BLCKSZ, so
> reducing the size might help with a very high transaction load when
> commits are required very frequently. At checkpoint it sounds like we
> might benefit from a large WAL blocksize because of all the additional
> blocks written, but we often write more than one block at a time anyway,
> and that still translates to multiple OS blocks whichever way you cut
> it, so I'm not convinced yet.
> 
> On Thu, 2006-03-16 at 15:21 -0500, Tom Lane wrote: 
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > > Overall, the two things are fairly separate, apart from the fact that we
> > > do currently log whole data blocks straight to the log. Usually just
> > > one, but possibly 2 or three. So I have a feeling that things would
> > > become less efficient if you did this, not more.
> > 
> > > But its a good line of thought and I'll have a look at that.
> > 
> > I too think reducing the size of WAL blocks might be a win, because
> > we currently always write whole blocks, and so a series of small
> > transactions will be rewriting the same 8K block multiple times.
> > If the filesystem's native block size is less than 8K, matching that
> > size should theoretically make things faster.
> 
> Might it be possible to do this: When committing, if the current WAL
> page is less than half-full wait for a single spin-lock cycle and then
> do the write? (With the spin-lock, I mean on a single CPU we wait zero,
> on a multi-CPU we wait a while). This is effectively a modification of
> the group commit idea, but not to wait every time - only when it is
> write-efficient to do so. (And we'd make that optional, too). We could
> then ditch the remnant of the group-commit code.

Sounds like there is some agreement that this could be an interesting
exercise.  I'll see what I can do.

Thanks,
Mark


pgsql-hackers by date:

Previous
From: "Dann Corbit"
Date:
Subject: Re: qsort, once again
Next
From: Tom Lane
Date:
Subject: Re: qsort, once again