Thread: Why we still see some reports of "could not access transaction status"
Having seen a couple recent reports of "could not access status of transaction" for old, not-obviously-corrupt transaction numbers, I went looking to see if I could find a way that the system could truncate CLOG before it's really marked all occurrences of old transaction numbers as known-dead or known-good. I found one. The problem is that there are several places where a tqual.c routine is called without checking to see if it changed the tuple's commit hint bits, and without necessarily writing the page immediately after. One example is the code path in heap_update where we decide that we can't update the tuple because a concurrent transaction did so. If HeapTupleSatisfiesUpdate had set the XMIN_COMMITTED or XMAX_COMMITTED bits, those bits would remain set in the shared buffer, but *the buffer would not get marked dirty*. Before PG 7.2 this was not a bug, because the hint bits could always be set again later. But now, consider this scenario: while the buffer remains in memory, VACUUM passes over the table. It doesn't find any changes needed in that page, so it doesn't write the page either. At completion of the vacuum, we check whether we can truncate CLOG, discover we can, and do so. At some later point, the in-memory buffer is discarded, still without having been written. When next read in, the page contains an un-hinted transaction status that could easily point to a transaction before the new CLOG boundary. Ooops. The odds of such a problem seem exceedingly small ... in other words, just about right to explain the small numbers of reports we get. I think what we ought to do to solve this problem permanently is to stop making the callers of the HeapTupleSatisfiesFoo() routines responsible for checking for hint bit updates. It would be a lot safer, and AFAICS not noticeably less efficient, for those routines to call SetBufferCommitInfoNeedsSave for themselves. This would require adding to their parameter lists, because they aren't currently told which buffer the tuple is in, but that's no big deal considering we get to simplify the calling logic in all the places that are faithfully doing the t_infomask update check. Comments? regards, tom lane
Re: Why we still see some reports of "could not access transaction status"
From
"Michael Paesold"
Date:
Tom Lane wrote: > Having seen a couple recent reports of "could not access status of > transaction" for old, not-obviously-corrupt transaction numbers, I went > looking to see if I could find a way that the system could truncate CLOG > before it's really marked all occurrences of old transaction numbers as > known-dead or known-good. > > I found one. I was starting to wonder about those reports, too. Actually I was thinking about bringing this up as soon as I would find time. So I am glad you picked that up yourself -- and found a problem already. > I think what we ought to do to solve this problem permanently is to stop ... > > Comments? Well, I am not able to comment here, but I can say I usually trust your judgement. Best Regards, Michael Paesold
Re: Why we still see some reports of "could not access transaction status"
From
Alvaro Herrera
Date:
On Wed, Oct 13, 2004 at 12:18:08PM -0400, Tom Lane wrote: > I think what we ought to do to solve this problem permanently is to stop > making the callers of the HeapTupleSatisfiesFoo() routines responsible > for checking for hint bit updates. It would be a lot safer, and AFAICS > not noticeably less efficient, for those routines to call > SetBufferCommitInfoNeedsSave for themselves. This would require adding > to their parameter lists, because they aren't currently told which > buffer the tuple is in, but that's no big deal considering we get to > simplify the calling logic in all the places that are faithfully doing > the t_infomask update check. > > Comments? I remember seeing this code when coding the phantom Xid idea and wondering why such an error-prone style was used. It never ocurred to me to change it (or maybe have the guts to do it), but now that you mention it it certainly seems a good idea. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Tulio: oh, para qué servirá este boton, Juan Carlos? Policarpo: No, aléjense, no toquen la consola! Juan Carlos: Lo apretaré una y otra vez.
Tom Lane wrote: > Having seen a couple recent reports of "could not access status of > transaction" for old, not-obviously-corrupt transaction numbers, I went > looking to see if I could find a way that the system could truncate CLOG > before it's really marked all occurrences of old transaction numbers as > known-dead or known-good. > > I found one. > Are you going to fix it for the 8.0 and/or back patch it ? Regards Gaetano Mendola
Gaetano Mendola wrote: > Are you going to fix it for the 8.0 and/or back patch it ? http://archives.postgresql.org/pgsql-committers/2004-10/msg00229.php http://archives.postgresql.org/pgsql-committers/2004-10/msg00191.php plus backpatches to older branches (REL7_3_STABLE, REL7_2_STABLE). Has there been any thought about putting out another 7.4 release with this fix? -Neil
Neil Conway <neilc@samurai.com> writes: > Has there been any thought about putting out another 7.4 release with > this fix? There has, but there are some other open issues I'd like to deal with first. If anyone has any pending 7.4 fixes, getting them in in the next few days would be a Good Plan. regards, tom lane
Tom Lane wrote: > >If anyone has any pending 7.4 fixes, getting them in in the next >few days would be a Good Plan. > > > Do we want to backport tighter security for plperl? In particular, insisting on Safe.pm >= 2.09 and removing the :base_io set of ops? cheers andrew
Andrew Dunstan wrote: > > > Tom Lane wrote: > >> >> If anyone has any pending 7.4 fixes, getting them in in the next >> few days would be a Good Plan. >> >> >> > > > Do we want to backport tighter security for plperl? In particular, > insisting on Safe.pm >= 2.09 and removing the :base_io set of ops? > > And it would also be nice if we could add contrib/cube/expected/cube_1.out to the 7.4 branch, I think, so that more platforms could pass the contrib installcheck tests. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Do we want to backport tighter security for plperl? In particular, > insisting on Safe.pm >= 2.09 and removing the :base_io set of ops? I'd vote not: 7.4.5 => 7.4.6 is not an update that people would expect to break their plperl code ... regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>Do we want to backport tighter security for plperl? In particular, >>insisting on Safe.pm >= 2.09 and removing the :base_io set of ops? >> >> > >I'd vote not: 7.4.5 => 7.4.6 is not an update that people would expect >to break their plperl code ... > > > > *shrug* OK. Then plperl should probably not be regarded as being as "trusted" as we would like. Note that old versions of Safe.pm have been the subject of security advisories such as this one http://www.securityfocus.com/bid/6111/info/ for some time. cheers andrew
On Tue, 2004-10-19 at 02:45, Andrew Dunstan wrote: > *shrug* OK. Then plperl should probably not be regarded as being as > "trusted" as we would like. Note that old versions of Safe.pm have been > the subject of security advisories such as this one > http://www.securityfocus.com/bid/6111/info/ for some time. Perhaps a compromise would be to require the newer version of Safe.pm, but leave the other changes for 8.0. Upgrading Safe.pm can presumably be done without needing any changes to the rest of one's pl/perl code. -Neil
Neil Conway wrote: >On Tue, 2004-10-19 at 02:45, Andrew Dunstan wrote: > > >>*shrug* OK. Then plperl should probably not be regarded as being as >>"trusted" as we would like. Note that old versions of Safe.pm have been >>the subject of security advisories such as this one >>http://www.securityfocus.com/bid/6111/info/ for some time. >> >> > >Perhaps a compromise would be to require the newer version of Safe.pm, >but leave the other changes for 8.0. Upgrading Safe.pm can presumably be >done without needing any changes to the rest of one's pl/perl code. > > > > s/the rest of/any of/ Indeed it can. The other thing I suggested was removing the :base_io set of ops - I would regard plperl functions that did things like printing to STDOUT as broken to start with. But maybe we can just live with what we have and advertise that 8.0's plperl is more secure. cheers andrew
On Tue, Oct 19, 2004 at 08:47:20AM -0400, Andrew Dunstan wrote: > But maybe we can just live with what we have and advertise that 8.0's > plperl is more secure. The release notes should point out that 7.4's plperl is unsecure unless the correct version of Safe.pm is installed. Maybe it works to make it croak if an unsafe version of Safe.pm is found? I'm not sure about "living with" known security vulnerabilities. What about ISPs which give Pg hosting with plperl installed? They surely will want to know about this. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) One man's impedance mismatch is another man's layer of abstraction. (Lincoln Yeoh)