Thread: WIP: splitting BLCKSZ

WIP: splitting BLCKSZ

From
Mark Wong
Date:
I proposed to explore splitting BLCKSZ into separate values for logging
and data to see if there might be anything to gain:
    http://archives.postgresql.org/pgsql-hackers/2006-03/msg00745.php

My first pass was to do more or less a search and replace (attached) and
I am already running into trouble with a 'make check' (below).  I'm
guessing that when initdb is run, I'm not properly saving the values
that I've defined for DATA_BLCKSZ and possibly LOG_BLCKSZ.

So I'm hoping someone could give me a pointer and I thought it might be
a good idea send something out.

Thanks,
Mark

-----

Running in noclean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "markw".
This user must also own the server process.

The database cluster will be initialized with locale C.

creating directory /home/markw/shell/src/pgsql/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers/max_fsm_pages ... 3000/150000
creating configuration files ... ok
creating template1 database in /home/markw/shell/src/pgsql/src/test/regress/./tmp_check/data/base/1 ... PANIC:
databasefiles are incompatible with server 
DETAIL:  The database cluster was initialized with DATA_BLCKSZ 0, but the server was compiled with DATA_BLCKSZ 8192.
HINT:  It looks like you need to recompile or initdb.
child process was terminated by signal 6

Attachment

Re: WIP: splitting BLCKSZ

From
Tom Lane
Date:
Mark Wong <markw@osdl.org> writes:
> I proposed to explore splitting BLCKSZ into separate values for logging
> and data to see if there might be anything to gain:
>     http://archives.postgresql.org/pgsql-hackers/2006-03/msg00745.php
> My first pass was to do more or less a search and replace (attached) and
> I am already running into trouble with a 'make check' (below).  I'm
> guessing that when initdb is run, I'm not properly saving the values
> that I've defined for DATA_BLCKSZ and possibly LOG_BLCKSZ.

I'd suggest leaving BLCKSZ as-is and inventing XLOG_BLCKSZ to be used
only within the WAL code; should make for a *far* smaller patch.
Offhand I don't think that anything except xlog.c knows the WAL block
size --- it should be fairly closely associated with dependencies on
XLOG_SEG_SIZE, if you are looking for something to grep for.

            regards, tom lane

Re: WIP: splitting BLCKSZ

From
Mark Wong
Date:
On Wed, 22 Mar 2006 14:19:48 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Mark Wong <markw@osdl.org> writes:
> > I proposed to explore splitting BLCKSZ into separate values for logging
> > and data to see if there might be anything to gain:
> >     http://archives.postgresql.org/pgsql-hackers/2006-03/msg00745.php
> > My first pass was to do more or less a search and replace (attached) and
> > I am already running into trouble with a 'make check' (below).  I'm
> > guessing that when initdb is run, I'm not properly saving the values
> > that I've defined for DATA_BLCKSZ and possibly LOG_BLCKSZ.
>
> I'd suggest leaving BLCKSZ as-is and inventing XLOG_BLCKSZ to be used
> only within the WAL code; should make for a *far* smaller patch.
> Offhand I don't think that anything except xlog.c knows the WAL block
> size --- it should be fairly closely associated with dependencies on
> XLOG_SEG_SIZE, if you are looking for something to grep for.

Ok, I have attached something much smaller.  Appears to pass a 'make
check' but I'll keep going to make sure it's really correct and works.

Thanks,
Mark

Attachment

Re: WIP: splitting BLCKSZ

From
Simon Riggs
Date:
On Thu, 2006-03-23 at 11:27 -0800, Mark Wong wrote:
> On Wed, 22 Mar 2006 14:19:48 -0500
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Mark Wong <markw@osdl.org> writes:
> > > I proposed to explore splitting BLCKSZ into separate values for logging
> > > and data to see if there might be anything to gain:
> > >     http://archives.postgresql.org/pgsql-hackers/2006-03/msg00745.php
> > > My first pass was to do more or less a search and replace (attached) and
> > > I am already running into trouble with a 'make check' (below).  I'm
> > > guessing that when initdb is run, I'm not properly saving the values
> > > that I've defined for DATA_BLCKSZ and possibly LOG_BLCKSZ.
> >
> > I'd suggest leaving BLCKSZ as-is and inventing XLOG_BLCKSZ to be used
> > only within the WAL code; should make for a *far* smaller patch.
> > Offhand I don't think that anything except xlog.c knows the WAL block
> > size --- it should be fairly closely associated with dependencies on
> > XLOG_SEG_SIZE, if you are looking for something to grep for.
>
> Ok, I have attached something much smaller.  Appears to pass a 'make
> check' but I'll keep going to make sure it's really correct and works.

AFAICS this patch won't work properly yet, even though the idea is cool.

Backup data blocks have the "hole" removed from them in xlog.c. This is
definitely BLCKSZ not XLOG_BLCKSZ at lines 675,798,799 and 813 maybe
others. But then you need to put them into a block of size XLOG_BLCKSZ,
e.g. line 708. It might be worth looking at xlog.c for 8.0 to see which
places used BLCKSZ before hole-removal was introduced in 8.1.

I think we should set the default wal_buffers setting to 16 at least
now, if we are halving the default size of the blocks. 32 might be more
realistic for 8.2.

Best Regards, Simon Riggs





Re: WIP: splitting BLCKSZ

From
Mark Wong
Date:
Here's an updated patch with help from Simon.  Once I get a test system
going again in the lab I'll start posting some data.  I'm planning a
combination of block sizes (BLCKSZ and XLOG_BLCKSZ) and number of WAL
buffers.

Thanks,
Mark

Attachment

Re: WIP: splitting BLCKSZ

From
"Jonah H. Harris"
Date:
On 4/3/06, Mark Wong <markw@osdl.org> wrote:
> Once I get a test system going again in the lab I'll start
> posting some data.  I'm planning a combination of
> block sizes (BLCKSZ and XLOG_BLCKSZ) and number
> of WAL buffers.

Cool.  I'm looking forward to the results.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: WIP: splitting BLCKSZ

From
Tom Lane
Date:
Mark Wong <markw@osdl.org> writes:
> Here's an updated patch with help from Simon.  Once I get a test system
> going again in the lab I'll start posting some data.  I'm planning a
> combination of block sizes (BLCKSZ and XLOG_BLCKSZ) and number of WAL
> buffers.

If there's no objection, I'll go ahead and apply the parts of this that
create a separate XLOG_BLCKSZ symbol, but not (yet) the parts that
actually change any parameter values.  I can't see any very good reason
why data block size and xlog block size were ever tied together, and I
think it'll make the code read better if they're separated.

            regards, tom lane

Re: WIP: splitting BLCKSZ

From
Tom Lane
Date:
Mark Wong <markw@osdl.org> writes:
> Here's an updated patch with help from Simon.  Once I get a test system
> going again in the lab I'll start posting some data.  I'm planning a
> combination of block sizes (BLCKSZ and XLOG_BLCKSZ) and number of WAL
> buffers.

Applied with minor corrections (you missed pg_resetxlog, for one).
Also, I did not apply the change in default wal_buffers setting,
as that seems it should wait for evidence of what XLOG_BLCKSZ ought to
be.

            regards, tom lane

Re: WIP: splitting BLCKSZ

From
Simon Riggs
Date:
On Mon, 2006-04-03 at 19:37 -0400, Tom Lane wrote:
> Mark Wong <markw@osdl.org> writes:
> > Here's an updated patch with help from Simon.  Once I get a test system
> > going again in the lab I'll start posting some data.  I'm planning a
> > combination of block sizes (BLCKSZ and XLOG_BLCKSZ) and number of WAL
> > buffers.
>
> Applied with minor corrections (you missed pg_resetxlog, for one).

Thanks. (That omission was mine, not Mark's.)


On Mon, 2006-04-03 at 18:08 -0400, Tom Lane wrote:
> I can't see any very good reason
> why data block size and xlog block size were ever tied together, and I
> think it'll make the code read better if they're separated.

I see you've changed the control file back from XLOG_BLCKSZ to BLCKSZ; I
wasn't sure which one of those to choose. Perhaps that also should be
changed to PGCONTROL_BLCKSZ to more clearly differentiate that also (but
not put it in pg_config_manual.h?

Best Regards, Simon Riggs


Re: WIP: splitting BLCKSZ

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> I see you've changed the control file back from XLOG_BLCKSZ to BLCKSZ; I
> wasn't sure which one of those to choose.

Hm.  The entire point of having a BLCKSZ-sized control file is to have
it *not* change in size across format revisions (see the comments) ...
which I suppose means that we really ought to have a hard-wired separate
constant, rather than depending on something that someone might want to
twiddle.

            regards, tom lane

Re: WIP: splitting BLCKSZ

From
Simon Riggs
Date:
On Tue, 2006-04-04 at 11:13 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I see you've changed the control file back from XLOG_BLCKSZ to BLCKSZ; I
> > wasn't sure which one of those to choose.
>
> Hm.  The entire point of having a BLCKSZ-sized control file is to have
> it *not* change in size across format revisions (see the comments) ...
> which I suppose means that we really ought to have a hard-wired separate
> constant, rather than depending on something that someone might want to
> twiddle.

Patch enclosed.

Tests with make check and this sequence of actions works fine also...
initdb b512
pg_controldata b512
pg_resetxlog b512
pg_controldata b512

Best Regards, Simon Riggs

Attachment

Re: WIP: splitting BLCKSZ

From
Simon Riggs
Date:
On Tue, 2006-04-04 at 17:33 +0100, Simon Riggs wrote:
> On Tue, 2006-04-04 at 11:13 -0400, Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > > I see you've changed the control file back from XLOG_BLCKSZ to BLCKSZ; I
> > > wasn't sure which one of those to choose.
> >
> > Hm.  The entire point of having a BLCKSZ-sized control file is to have
> > it *not* change in size across format revisions (see the comments) ...
> > which I suppose means that we really ought to have a hard-wired separate
> > constant, rather than depending on something that someone might want to
> > twiddle.

An additional patch enclosed that adds xlog blcksz onto the xlog long
header at the start of each xlog file, so we can cross-check between
file and system, as we do with xlog seg size.

(This is an *additional* patch, not a replacement one).

Best Regards, Simon Riggs

Attachment

Re: WIP: splitting BLCKSZ

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> An additional patch enclosed that adds xlog blcksz onto the xlog long
> header at the start of each xlog file, so we can cross-check between
> file and system, as we do with xlog seg size.

That would require an xlog format change (XLOG_PAGE_MAGIC bump).  Might
be worth it anyway to avoid confusion in PITR log-shipping situations.
I thought about it yesterday, and concluded it wasn't really worth the
trouble, but am willing to reconsider.

Note you forgot pg_resetxlog again .-)

            regards, tom lane

Re: WIP: splitting BLCKSZ

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Tue, 2006-04-04 at 11:13 -0400, Tom Lane wrote:
>> Hm.  The entire point of having a BLCKSZ-sized control file is to have
>> it *not* change in size across format revisions (see the comments) ...
>> which I suppose means that we really ought to have a hard-wired separate
>> constant, rather than depending on something that someone might want to
>> twiddle.

> Patch enclosed.

Applied with minor editorialization (I thought the symbol ought to be
defined in pg_control.h, not some random xlog file).

            regards, tom lane

Re: WIP: splitting BLCKSZ

From
Simon Riggs
Date:
On Tue, 2006-04-04 at 18:41 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Tue, 2006-04-04 at 11:13 -0400, Tom Lane wrote:
> >> Hm.  The entire point of having a BLCKSZ-sized control file is to have
> >> it *not* change in size across format revisions (see the comments) ...
> >> which I suppose means that we really ought to have a hard-wired separate
> >> constant, rather than depending on something that someone might want to
> >> twiddle.
>
> > Patch enclosed.
>
> Applied with minor editorialization

Thanks.

> (I thought the symbol ought to be
> defined in pg_control.h, not some random xlog file).

It wasn't a random xlog header file. The symbol was placed adjacent to
this line in xlog_internal.h

    #define XLOG_CONTROL_FILE    "global/pg_control"

Perhaps that should be moved to pg_control.h also? Or at least a comment
in pg_control to indicate where pg_control's location is defined.

Best Regards, Simon Riggs


Re: WIP: splitting BLCKSZ

From
Simon Riggs
Date:
On Tue, 2006-04-04 at 15:26 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > An additional patch enclosed that adds xlog blcksz onto the xlog long
> > header at the start of each xlog file, so we can cross-check between
> > file and system, as we do with xlog seg size.
>
> That would require an xlog format change (XLOG_PAGE_MAGIC bump).  Might
> be worth it anyway to avoid confusion in PITR log-shipping situations.
> I thought about it yesterday, and concluded it wasn't really worth the
> trouble, but am willing to reconsider.

Thanks,

> Note you forgot pg_resetxlog again .-)

Eyeball polishing now in progress.

Best Regards, Simon Riggs