Thread: Testing the async-commit patch

Testing the async-commit patch

From

Tom Lane

Date:

13 August 2007, 14:08:59

So I was testing my fix for the problem noted here:
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00196.php
and promptly found *another* bug.  To wit, that repair_frag calls
HeapTupleSatisfiesVacuum without bothering to acquire any buffer
content lock.  This results in an Assert failure inside
SetBufferCommitInfoNeedsSave, if HeapTupleSatisfiesVacuum tries to
update any hint bits for the tuple.  I think that is impossible in
current releases, because the tuple's logical status was fully
determined by the prior call in scan_heap.  But it's possible as of
8.3 because the walwriter or other backends could have moved the WAL
flush point, allowing a previously unhintable XMAX to become hintable.

I think the best solution for this is to acquire the buffer content lock
before calling HeapTupleSatisfiesVacuum --- it's really a pretty ugly
shortcut that the code didn't do that already.  We could alternatively
refuse to do shrinking unless both XMIN and XMAX are correctly hinted
at scan_heap time; but there is not anything else in vacuum.c that seems
to require XMAX_COMMITTED to be set, so I'd rather not make that
restriction.

But to get to the point: the urgency of testing the patch more
extensively has just moved up a full order of magnitude, IMHO anyway.
I muttered something in the other thread about providing a buildfarm
option to run the regression tests with synchronous_commit off.  That
would still be a good idea in the long run, but I want to take some more
drastic measures now.  I propose that we actually set synchronous_commit
off by default for the next little while --- at least up to 8.3beta1,
maybe until we approach the RC point.  That will ensure that every
buildfarm machine is exercising the async-commit behavior, as well as
every developer who's testing HEAD.

Of course the risk is that we might forget to turn it back on before
release :-(

Comments?
        regards, tom lane

Re: Testing the async-commit patch

From

Andrew Dunstan

Date:

13 August 2007, 15:07:18

Tom Lane wrote:
> But to get to the point: the urgency of testing the patch more
> extensively has just moved up a full order of magnitude, IMHO anyway.
> I muttered something in the other thread about providing a buildfarm
> option to run the regression tests with synchronous_commit off.  That
> would still be a good idea in the long run, but I want to take some more
> drastic measures now.  I propose that we actually set synchronous_commit
> off by default for the next little while --- at least up to 8.3beta1,
> maybe until we approach the RC point.  That will ensure that every
> buildfarm machine is exercising the async-commit behavior, as well as
> every developer who's testing HEAD.
>
> Of course the risk is that we might forget to turn it back on before
> release :-(
>
>
>   

Turn it off, I doubt we'll forget.

I have some ideas about testing configuration items. Doing all our tests 
with the default config is not ideal, I think. Essentially we'd put up a 
server that would have sets of <branch, list-of-config-lines>. The 
client would connect to the server if it could and get the set(s) of 
lines for the branch on question, and for each set it would try another 
run of installcheck  (I'm also wondering if we should switch to doing 
installcheck-parallel). Anyway, this would be a config option on the 
buildfarm, so we wouldn't overburden hosts with limited run windows 
(e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. 
some of the old and/or tiny hardware we have).

If this seems worth it I'll put it on my TODO.

cheers

andrew

Re: Testing the async-commit patch

From

Gregory Stark

Date:

13 August 2007, 15:36:22

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> But to get to the point: the urgency of testing the patch more
> extensively has just moved up a full order of magnitude, IMHO anyway.
> I muttered something in the other thread about providing a buildfarm
> option to run the regression tests with synchronous_commit off.  That
> would still be a good idea in the long run, but I want to take some more
> drastic measures now.  I propose that we actually set synchronous_commit
> off by default for the next little while --- at least up to 8.3beta1,
> maybe until we approach the RC point.  That will ensure that every
> buildfarm machine is exercising the async-commit behavior, as well as
> every developer who's testing HEAD.
>
> Of course the risk is that we might forget to turn it back on before
> release :-(

I'll set a cron job to remind us. What date should I set it for? :)

Seems like a fine plan to me. It's supposed to be 100% reliable and have
indistinguishable behaviour barring a system crash and nobody should be
running production data on a beta or pre-beta build, so they should never see
a difference.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Testing the async-commit patch

From

Tom Lane

Date:

13 August 2007, 16:16:16

Andrew Dunstan <andrew@dunslane.net> writes:
> I have some ideas about testing configuration items. Doing all our tests 
> with the default config is not ideal, I think. Essentially we'd put up a 
> server that would have sets of <branch, list-of-config-lines>. The 
> client would connect to the server if it could and get the set(s) of 
> lines for the branch on question, and for each set it would try another 
> run of installcheck  (I'm also wondering if we should switch to doing 
> installcheck-parallel). Anyway, this would be a config option on the 
> buildfarm, so we wouldn't overburden hosts with limited run windows 
> (e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. 
> some of the old and/or tiny hardware we have).

> If this seems worth it I'll put it on my TODO.

Sounds like a good plan, except that an extra server seems unnecessary
mechanism (and perhaps an unnecessary security risk).  We can just put
a file into CVS src/test/regress showing what we'd like tested.
        regards, tom lane

Re: Testing the async-commit patch

From

Tom Lane

Date:

13 August 2007, 16:17:23

Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> I propose that we actually set synchronous_commit
>> off by default for the next little while --- at least up to 8.3beta1,
>> maybe until we approach the RC point.  That will ensure that every
>> buildfarm machine is exercising the async-commit behavior, as well as
>> every developer who's testing HEAD.
>> 
>> Of course the risk is that we might forget to turn it back on before
>> release :-(

> I'll set a cron job to remind us. What date should I set it for? :)

I thought of a low-tech solution for that: put a note in
src/tools/RELEASE_CHANGES about it.  We invariably look at that file
while preparing releases.
        regards, tom lane

Re: Testing the async-commit patch

From

Andrew Dunstan

Date:

13 August 2007, 17:28:24


Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I have some ideas about testing configuration items. Doing all our tests 
>> with the default config is not ideal, I think. Essentially we'd put up a 
>> server that would have sets of <branch, list-of-config-lines>. The 
>> client would connect to the server if it could and get the set(s) of 
>> lines for the branch on question, and for each set it would try another 
>> run of installcheck  (I'm also wondering if we should switch to doing 
>> installcheck-parallel). Anyway, this would be a config option on the 
>> buildfarm, so we wouldn't overburden hosts with limited run windows 
>> (e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. 
>> some of the old and/or tiny hardware we have).
>>     
>
>   
>> If this seems worth it I'll put it on my TODO.
>>     
>
> Sounds like a good plan, except that an extra server seems unnecessary
> mechanism (and perhaps an unnecessary security risk).  We can just put
> a file into CVS src/test/regress showing what we'd like tested.
>
>             
>   

That could work.

Let's say that this file looks just like a postgresql.conf file, except 
that any line beginning with '[<identifier]>' is a config set name for 
the lines that follow. So we might have:
   [asynch_commit]   synchronous_commit = off
   [no_fsync]   fsync = off
   [csvlogs]   start_log_collector = true   log_destination = 'stderr, csvlog'

Then there would be an extra installcheck-parallel run for each set. If 
the file isn't there we do nothing.

cheers

andrew

Re: Testing the async-commit patch

From

"Simon Riggs"

Date:

14 August 2007, 05:18:20

On Mon, 2007-08-13 at 16:27 -0400, Andrew Dunstan wrote:

> Let's say that this file looks just like a postgresql.conf file, except 
> that any line beginning with '[<identifier]>' is a config set name for 
> the lines that follow. So we might have:
> 
>     [asynch_commit]
>     synchronous_commit = off
> 
>     [no_fsync]
>     fsync = off
> 
>     [csvlogs]
>     start_log_collector = true
>     log_destination = 'stderr, csvlog'
> 
> Then there would be an extra installcheck-parallel run for each set. If 
> the file isn't there we do nothing.

Sounds fine, though I'd prefer this in XML to allow further
extensibility in the future.

--  Simon Riggs EnterpriseDB  http://www.enterprisedb.com

Re: Testing the async-commit patch

From

Gregory Stark

Date:

14 August 2007, 07:02:03

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> But to get to the point: the urgency of testing the patch more
> extensively has just moved up a full order of magnitude, 

The problem testing this patch is that the window for a committed transaction
to not be synced is quite narrow, especially for the regression tests. For
testing purposes I wonder if there are ways we can widen this window. Some
ideas, some wackier than others, are:

. Raise the default wal_writer_delay to 5s or so -- also temporary until release

. Add an ifdef USE_ASSERT_CHECKING which randomly omits setting hint bits even when it could.

. add an ifdef USE_ASSERT_CHECKING which randomly fails to update the LSN when syncing WAL so that even after a buffer
flushwe still can't set hint bits. 

Only the first one isn't really wacky, but perhaps there's something there.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Testing the async-commit patch

From

Alvaro Herrera

Date:

14 August 2007, 10:25:29

Simon Riggs wrote:
> On Mon, 2007-08-13 at 16:27 -0400, Andrew Dunstan wrote:
> 
> > Let's say that this file looks just like a postgresql.conf file, except 
> > that any line beginning with '[<identifier]>' is a config set name for 
> > the lines that follow. So we might have:
> > 
> >     [asynch_commit]
> >     synchronous_commit = off
> > 
> >     [no_fsync]
> >     fsync = off
> > 
> >     [csvlogs]
> >     start_log_collector = true
> >     log_destination = 'stderr, csvlog'
> > 
> > Then there would be an extra installcheck-parallel run for each set. If 
> > the file isn't there we do nothing.
> 
> Sounds fine, though I'd prefer this in XML to allow further
> extensibility in the future.

YAML?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Testing the async-commit patch

From

Tom Lane

Date:

14 August 2007, 10:56:10

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Simon Riggs wrote:
>> Sounds fine, though I'd prefer this in XML to allow further
>> extensibility in the future.

> YAML?

That would seem to require making pg_regress depend on some XML library
or other, which is a dependency I'd rather not add.
        regards, tom lane

Re: Testing the async-commit patch

From

Andrew Dunstan

Date:

14 August 2007, 11:13:58

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>   
>> Simon Riggs wrote:
>>     
>>> Sounds fine, though I'd prefer this in XML to allow further
>>> extensibility in the future.
>>>       
>
>   
>> YAML?
>>     
>
> That would seem to require making pg_regress depend on some XML library
> or other, which is a dependency I'd rather not add.
>
>     
>   

Yeah, I think the way I set it out would work just fine for the intended 
purpose. XML, YAML, JSON et al are all well suited to tree structured 
data. But what we're describing here isn't tree structured. It is simply 
some named sets of postgresql.conf directives. As such, it would be best 
if it were as close as possible to actual postgresql.conf syntax.

And I am also reluctant to add an additional dependency onto the 
buildfarm script. (I wasn't actually thinking of doing this throught 
pg_regress, but I'm open to persuasion).

cheers

andrew

Re: Testing the async-commit patch

From

Josh Berkus

Date:

14 August 2007, 11:37:36

Tom, 

We're getting some additional test infrastructre at Sun; I'll throw this on 
the pile of stuff to test.

However, the current tests we're doing are regression tests and benchmark 
runs.  If there's some other kind of testing we need to do, I'll need 
specifics.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: Testing the async-commit patch

From

Tom Lane

Date:

14 August 2007, 13:10:42

Gregory Stark <stark@enterprisedb.com> writes:
> The problem testing this patch is that the window for a committed transaction
> to not be synced is quite narrow, especially for the regression tests. For
> testing purposes I wonder if there are ways we can widen this window. Some
> ideas, some wackier than others, are:

> . Raise the default wal_writer_delay to 5s or so -- also temporary until
>   release

I ran 100+ cycles of the parallel regression tests with this setting,
and didn't see any failures.

> . Add an ifdef USE_ASSERT_CHECKING which randomly omits setting hint
>   bits even when it could.

I think this is better done by code inspection, ie, look for places that
assume HEAP_XMIN/XMAX_COMMITTED is or can be set.

I made a pass over CVS HEAD and found some apparent trouble spots:
heapam.c lines 1843-1852 presume previous xmax can be hinted
immediately, ditto lines 2167-2176, ditto lines 2716-2725.
I think probably we should just remove those lines --- they are only
trying to save work for future tqual.c calls.
        regards, tom lane

Re: Testing the async-commit patch

From

"Simon Riggs"

Date:

14 August 2007, 13:21:50

On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote:

> I think this is better done by code inspection, ie, look for places that
> assume HEAP_XMIN/XMAX_COMMITTED is or can be set.
> 
> I made a pass over CVS HEAD and found some apparent trouble spots:
> heapam.c lines 1843-1852 presume previous xmax can be hinted
> immediately, ditto lines 2167-2176, ditto lines 2716-2725.
> I think probably we should just remove those lines --- they are only
> trying to save work for future tqual.c calls.

I'll check those out later tonight.

--  Simon Riggs EnterpriseDB  http://www.enterprisedb.com

Re: Testing the async-commit patch

From

Tom Lane

Date:

14 August 2007, 13:29:39

"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote:
>> heapam.c lines 1843-1852 presume previous xmax can be hinted
>> immediately, ditto lines 2167-2176, ditto lines 2716-2725.
>> I think probably we should just remove those lines --- they are only
>> trying to save work for future tqual.c calls.

> I'll check those out later tonight.

[ looks closer ] Actually, we can't just dike out those code sections,
because the immediately following code assumes that XMAX_INVALID is
correct.  So I guess we need to export HeapTupleSetHintBits from tqual.c
and do the full pushup in these places.
        regards, tom lane

Re: Testing the async-commit patch

From

"Simon Riggs"

Date:

14 August 2007, 13:34:39

On Tue, 2007-08-14 at 12:29 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote:
> >> heapam.c lines 1843-1852 presume previous xmax can be hinted
> >> immediately, ditto lines 2167-2176, ditto lines 2716-2725.
> >> I think probably we should just remove those lines --- they are only
> >> trying to save work for future tqual.c calls.
> 
> > I'll check those out later tonight.
> 
> [ looks closer ] Actually, we can't just dike out those code sections,
> because the immediately following code assumes that XMAX_INVALID is
> correct.  So I guess we need to export HeapTupleSetHintBits from tqual.c
> and do the full pushup in these places.

[ without looking ] XMAX_INVALID is always set if we know it, so that
wouldn't be a reason to do that.

--  Simon Riggs EnterpriseDB  http://www.enterprisedb.com

Re: Testing the async-commit patch

From

Greg Smith

Date:

14 August 2007, 15:37:32

On Mon, 2007-08-13 at 16:27 -0400, Andrew Dunstan wrote:
> [asynch_commit]
> synchronous_commit = off
> [no_fsync]
>  fsync = off

This is the Windows INI file format.  As such, it's easy to find code 
samples in almost any language that parse this format for you.  For 
example, Python has a core library called ConfigParser that will read 
these in, and somewhere around here I even have some UNIX shell code that 
parses it.  There's already a PostgreSQL-related project using this 
format--even on UNIX systems the odbc.ini config files look like this.

I already have a program that generates multiple postgresql.conf files 
using this format around here for exactly this sort of test (compare 
benchmark results with two different configurations), just never had a 
reason to package it up for anybody else to use.  It's trivial code if you 
use the Python parser.

On Tue, 14 Aug 2007, Simon Riggs wrote:

> Sounds fine, though I'd prefer this in XML to allow further
> extensibility in the future.

Putting this in XML significantly raises the bar for how complicated tools 
that work on these files must be, with the implicit dependencies that go 
with that.  And as Andrew already pointed out, there is very little 
tree-structure to this data that justifies the extra complexity.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Testing the async-commit patch

From

"Simon Riggs"

Date:

15 August 2007, 07:46:38

On Tue, 2007-08-14 at 12:29 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote:
> >> heapam.c lines 1843-1852 presume previous xmax can be hinted
> >> immediately, ditto lines 2167-2176, ditto lines 2716-2725.
> >> I think probably we should just remove those lines --- they are only
> >> trying to save work for future tqual.c calls.
> 
> > I'll check those out later tonight.
> 
> [ looks closer ] Actually, we can't just dike out those code sections,
> because the immediately following code assumes that XMAX_INVALID is
> correct.  So I guess we need to export HeapTupleSetHintBits from tqual.c
> and do the full pushup in these places.

Looks good: the code is neater and better commented than before as well
as being fully accurate with async commit.

--  Simon Riggs EnterpriseDB  http://www.enterprisedb.com