Thread: Testing the async-commit patch
So I was testing my fix for the problem noted here: http://archives.postgresql.org/pgsql-hackers/2007-08/msg00196.php and promptly found *another* bug. To wit, that repair_frag calls HeapTupleSatisfiesVacuum without bothering to acquire any buffer content lock. This results in an Assert failure inside SetBufferCommitInfoNeedsSave, if HeapTupleSatisfiesVacuum tries to update any hint bits for the tuple. I think that is impossible in current releases, because the tuple's logical status was fully determined by the prior call in scan_heap. But it's possible as of 8.3 because the walwriter or other backends could have moved the WAL flush point, allowing a previously unhintable XMAX to become hintable. I think the best solution for this is to acquire the buffer content lock before calling HeapTupleSatisfiesVacuum --- it's really a pretty ugly shortcut that the code didn't do that already. We could alternatively refuse to do shrinking unless both XMIN and XMAX are correctly hinted at scan_heap time; but there is not anything else in vacuum.c that seems to require XMAX_COMMITTED to be set, so I'd rather not make that restriction. But to get to the point: the urgency of testing the patch more extensively has just moved up a full order of magnitude, IMHO anyway. I muttered something in the other thread about providing a buildfarm option to run the regression tests with synchronous_commit off. That would still be a good idea in the long run, but I want to take some more drastic measures now. I propose that we actually set synchronous_commit off by default for the next little while --- at least up to 8.3beta1, maybe until we approach the RC point. That will ensure that every buildfarm machine is exercising the async-commit behavior, as well as every developer who's testing HEAD. Of course the risk is that we might forget to turn it back on before release :-( Comments? regards, tom lane
Tom Lane wrote: > But to get to the point: the urgency of testing the patch more > extensively has just moved up a full order of magnitude, IMHO anyway. > I muttered something in the other thread about providing a buildfarm > option to run the regression tests with synchronous_commit off. That > would still be a good idea in the long run, but I want to take some more > drastic measures now. I propose that we actually set synchronous_commit > off by default for the next little while --- at least up to 8.3beta1, > maybe until we approach the RC point. That will ensure that every > buildfarm machine is exercising the async-commit behavior, as well as > every developer who's testing HEAD. > > Of course the risk is that we might forget to turn it back on before > release :-( > > > Turn it off, I doubt we'll forget. I have some ideas about testing configuration items. Doing all our tests with the default config is not ideal, I think. Essentially we'd put up a server that would have sets of <branch, list-of-config-lines>. The client would connect to the server if it could and get the set(s) of lines for the branch on question, and for each set it would try another run of installcheck (I'm also wondering if we should switch to doing installcheck-parallel). Anyway, this would be a config option on the buildfarm, so we wouldn't overburden hosts with limited run windows (e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. some of the old and/or tiny hardware we have). If this seems worth it I'll put it on my TODO. cheers andrew
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > But to get to the point: the urgency of testing the patch more > extensively has just moved up a full order of magnitude, IMHO anyway. > I muttered something in the other thread about providing a buildfarm > option to run the regression tests with synchronous_commit off. That > would still be a good idea in the long run, but I want to take some more > drastic measures now. I propose that we actually set synchronous_commit > off by default for the next little while --- at least up to 8.3beta1, > maybe until we approach the RC point. That will ensure that every > buildfarm machine is exercising the async-commit behavior, as well as > every developer who's testing HEAD. > > Of course the risk is that we might forget to turn it back on before > release :-( I'll set a cron job to remind us. What date should I set it for? :) Seems like a fine plan to me. It's supposed to be 100% reliable and have indistinguishable behaviour barring a system crash and nobody should be running production data on a beta or pre-beta build, so they should never see a difference. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > I have some ideas about testing configuration items. Doing all our tests > with the default config is not ideal, I think. Essentially we'd put up a > server that would have sets of <branch, list-of-config-lines>. The > client would connect to the server if it could and get the set(s) of > lines for the branch on question, and for each set it would try another > run of installcheck (I'm also wondering if we should switch to doing > installcheck-parallel). Anyway, this would be a config option on the > buildfarm, so we wouldn't overburden hosts with limited run windows > (e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. > some of the old and/or tiny hardware we have). > If this seems worth it I'll put it on my TODO. Sounds like a good plan, except that an extra server seems unnecessary mechanism (and perhaps an unnecessary security risk). We can just put a file into CVS src/test/regress showing what we'd like tested. regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> I propose that we actually set synchronous_commit >> off by default for the next little while --- at least up to 8.3beta1, >> maybe until we approach the RC point. That will ensure that every >> buildfarm machine is exercising the async-commit behavior, as well as >> every developer who's testing HEAD. >> >> Of course the risk is that we might forget to turn it back on before >> release :-( > I'll set a cron job to remind us. What date should I set it for? :) I thought of a low-tech solution for that: put a note in src/tools/RELEASE_CHANGES about it. We invariably look at that file while preparing releases. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> I have some ideas about testing configuration items. Doing all our tests >> with the default config is not ideal, I think. Essentially we'd put up a >> server that would have sets of <branch, list-of-config-lines>. The >> client would connect to the server if it could and get the set(s) of >> lines for the branch on question, and for each set it would try another >> run of installcheck (I'm also wondering if we should switch to doing >> installcheck-parallel). Anyway, this would be a config option on the >> buildfarm, so we wouldn't overburden hosts with limited run windows >> (e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. >> some of the old and/or tiny hardware we have). >> > > >> If this seems worth it I'll put it on my TODO. >> > > Sounds like a good plan, except that an extra server seems unnecessary > mechanism (and perhaps an unnecessary security risk). We can just put > a file into CVS src/test/regress showing what we'd like tested. > > > That could work. Let's say that this file looks just like a postgresql.conf file, except that any line beginning with '[<identifier]>' is a config set name for the lines that follow. So we might have: [asynch_commit] synchronous_commit = off [no_fsync] fsync = off [csvlogs] start_log_collector = true log_destination = 'stderr, csvlog' Then there would be an extra installcheck-parallel run for each set. If the file isn't there we do nothing. cheers andrew
On Mon, 2007-08-13 at 16:27 -0400, Andrew Dunstan wrote: > Let's say that this file looks just like a postgresql.conf file, except > that any line beginning with '[<identifier]>' is a config set name for > the lines that follow. So we might have: > > [asynch_commit] > synchronous_commit = off > > [no_fsync] > fsync = off > > [csvlogs] > start_log_collector = true > log_destination = 'stderr, csvlog' > > Then there would be an extra installcheck-parallel run for each set. If > the file isn't there we do nothing. Sounds fine, though I'd prefer this in XML to allow further extensibility in the future. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > But to get to the point: the urgency of testing the patch more > extensively has just moved up a full order of magnitude, The problem testing this patch is that the window for a committed transaction to not be synced is quite narrow, especially for the regression tests. For testing purposes I wonder if there are ways we can widen this window. Some ideas, some wackier than others, are: . Raise the default wal_writer_delay to 5s or so -- also temporary until release . Add an ifdef USE_ASSERT_CHECKING which randomly omits setting hint bits even when it could. . add an ifdef USE_ASSERT_CHECKING which randomly fails to update the LSN when syncing WAL so that even after a buffer flushwe still can't set hint bits. Only the first one isn't really wacky, but perhaps there's something there. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Mon, 2007-08-13 at 16:27 -0400, Andrew Dunstan wrote: > > > Let's say that this file looks just like a postgresql.conf file, except > > that any line beginning with '[<identifier]>' is a config set name for > > the lines that follow. So we might have: > > > > [asynch_commit] > > synchronous_commit = off > > > > [no_fsync] > > fsync = off > > > > [csvlogs] > > start_log_collector = true > > log_destination = 'stderr, csvlog' > > > > Then there would be an extra installcheck-parallel run for each set. If > > the file isn't there we do nothing. > > Sounds fine, though I'd prefer this in XML to allow further > extensibility in the future. YAML? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Simon Riggs wrote: >> Sounds fine, though I'd prefer this in XML to allow further >> extensibility in the future. > YAML? That would seem to require making pg_regress depend on some XML library or other, which is a dependency I'd rather not add. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> Simon Riggs wrote: >> >>> Sounds fine, though I'd prefer this in XML to allow further >>> extensibility in the future. >>> > > >> YAML? >> > > That would seem to require making pg_regress depend on some XML library > or other, which is a dependency I'd rather not add. > > > Yeah, I think the way I set it out would work just fine for the intended purpose. XML, YAML, JSON et al are all well suited to tree structured data. But what we're describing here isn't tree structured. It is simply some named sets of postgresql.conf directives. As such, it would be best if it were as close as possible to actual postgresql.conf syntax. And I am also reluctant to add an additional dependency onto the buildfarm script. (I wasn't actually thinking of doing this throught pg_regress, but I'm open to persuasion). cheers andrew
Tom, We're getting some additional test infrastructre at Sun; I'll throw this on the pile of stuff to test. However, the current tests we're doing are regression tests and benchmark runs. If there's some other kind of testing we need to do, I'll need specifics. -- Josh Berkus PostgreSQL @ Sun San Francisco
Gregory Stark <stark@enterprisedb.com> writes: > The problem testing this patch is that the window for a committed transaction > to not be synced is quite narrow, especially for the regression tests. For > testing purposes I wonder if there are ways we can widen this window. Some > ideas, some wackier than others, are: > . Raise the default wal_writer_delay to 5s or so -- also temporary until > release I ran 100+ cycles of the parallel regression tests with this setting, and didn't see any failures. > . Add an ifdef USE_ASSERT_CHECKING which randomly omits setting hint > bits even when it could. I think this is better done by code inspection, ie, look for places that assume HEAP_XMIN/XMAX_COMMITTED is or can be set. I made a pass over CVS HEAD and found some apparent trouble spots: heapam.c lines 1843-1852 presume previous xmax can be hinted immediately, ditto lines 2167-2176, ditto lines 2716-2725. I think probably we should just remove those lines --- they are only trying to save work for future tqual.c calls. regards, tom lane
On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote: > I think this is better done by code inspection, ie, look for places that > assume HEAP_XMIN/XMAX_COMMITTED is or can be set. > > I made a pass over CVS HEAD and found some apparent trouble spots: > heapam.c lines 1843-1852 presume previous xmax can be hinted > immediately, ditto lines 2167-2176, ditto lines 2716-2725. > I think probably we should just remove those lines --- they are only > trying to save work for future tqual.c calls. I'll check those out later tonight. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote: >> heapam.c lines 1843-1852 presume previous xmax can be hinted >> immediately, ditto lines 2167-2176, ditto lines 2716-2725. >> I think probably we should just remove those lines --- they are only >> trying to save work for future tqual.c calls. > I'll check those out later tonight. [ looks closer ] Actually, we can't just dike out those code sections, because the immediately following code assumes that XMAX_INVALID is correct. So I guess we need to export HeapTupleSetHintBits from tqual.c and do the full pushup in these places. regards, tom lane
On Tue, 2007-08-14 at 12:29 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote: > >> heapam.c lines 1843-1852 presume previous xmax can be hinted > >> immediately, ditto lines 2167-2176, ditto lines 2716-2725. > >> I think probably we should just remove those lines --- they are only > >> trying to save work for future tqual.c calls. > > > I'll check those out later tonight. > > [ looks closer ] Actually, we can't just dike out those code sections, > because the immediately following code assumes that XMAX_INVALID is > correct. So I guess we need to export HeapTupleSetHintBits from tqual.c > and do the full pushup in these places. [ without looking ] XMAX_INVALID is always set if we know it, so that wouldn't be a reason to do that. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Mon, 2007-08-13 at 16:27 -0400, Andrew Dunstan wrote: > [asynch_commit] > synchronous_commit = off > [no_fsync] > fsync = off This is the Windows INI file format. As such, it's easy to find code samples in almost any language that parse this format for you. For example, Python has a core library called ConfigParser that will read these in, and somewhere around here I even have some UNIX shell code that parses it. There's already a PostgreSQL-related project using this format--even on UNIX systems the odbc.ini config files look like this. I already have a program that generates multiple postgresql.conf files using this format around here for exactly this sort of test (compare benchmark results with two different configurations), just never had a reason to package it up for anybody else to use. It's trivial code if you use the Python parser. On Tue, 14 Aug 2007, Simon Riggs wrote: > Sounds fine, though I'd prefer this in XML to allow further > extensibility in the future. Putting this in XML significantly raises the bar for how complicated tools that work on these files must be, with the implicit dependencies that go with that. And as Andrew already pointed out, there is very little tree-structure to this data that justifies the extra complexity. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Tue, 2007-08-14 at 12:29 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote: > >> heapam.c lines 1843-1852 presume previous xmax can be hinted > >> immediately, ditto lines 2167-2176, ditto lines 2716-2725. > >> I think probably we should just remove those lines --- they are only > >> trying to save work for future tqual.c calls. > > > I'll check those out later tonight. > > [ looks closer ] Actually, we can't just dike out those code sections, > because the immediately following code assumes that XMAX_INVALID is > correct. So I guess we need to export HeapTupleSetHintBits from tqual.c > and do the full pushup in these places. Looks good: the code is neater and better commented than before as well as being fully accurate with async commit. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com