Thread: Why we still see some reports of "could not access transaction status"

Why we still see some reports of "could not access transaction status"

From

Tom Lane

Date:

13 October 2004, 17:18:17

Having seen a couple recent reports of "could not access status of
transaction" for old, not-obviously-corrupt transaction numbers, I went
looking to see if I could find a way that the system could truncate CLOG
before it's really marked all occurrences of old transaction numbers as
known-dead or known-good.

I found one.

The problem is that there are several places where a tqual.c routine is
called without checking to see if it changed the tuple's commit hint
bits, and without necessarily writing the page immediately after.  One
example is the code path in heap_update where we decide that we can't
update the tuple because a concurrent transaction did so.  If
HeapTupleSatisfiesUpdate had set the XMIN_COMMITTED or XMAX_COMMITTED
bits, those bits would remain set in the shared buffer, but *the buffer
would not get marked dirty*.

Before PG 7.2 this was not a bug, because the hint bits could always be
set again later.  But now, consider this scenario: while the buffer
remains in memory, VACUUM passes over the table.  It doesn't find any
changes needed in that page, so it doesn't write the page either.  At
completion of the vacuum, we check whether we can truncate CLOG,
discover we can, and do so.  At some later point, the in-memory buffer
is discarded, still without having been written.  When next read in,
the page contains an un-hinted transaction status that could easily
point to a transaction before the new CLOG boundary.  Ooops.

The odds of such a problem seem exceedingly small ... in other words,
just about right to explain the small numbers of reports we get.

I think what we ought to do to solve this problem permanently is to stop
making the callers of the HeapTupleSatisfiesFoo() routines responsible
for checking for hint bit updates.  It would be a lot safer, and AFAICS
not noticeably less efficient, for those routines to call
SetBufferCommitInfoNeedsSave for themselves.  This would require adding
to their parameter lists, because they aren't currently told which
buffer the tuple is in, but that's no big deal considering we get to
simplify the calling logic in all the places that are faithfully doing
the t_infomask update check.

Comments?
        regards, tom lane

Re: Why we still see some reports of "could not access transaction status"

From

"Michael Paesold"

Date:

14 October 2004, 07:18:05

Tom Lane wrote:

> Having seen a couple recent reports of "could not access status of
> transaction" for old, not-obviously-corrupt transaction numbers, I went
> looking to see if I could find a way that the system could truncate CLOG
> before it's really marked all occurrences of old transaction numbers as
> known-dead or known-good.
>
> I found one.

I was starting to wonder about those reports, too. Actually I was thinking 
about bringing this up as soon as I would find time. So I am glad you picked 
that up yourself -- and found a problem already.

> I think what we ought to do to solve this problem permanently is to stop
...
>
> Comments?

Well, I am not able to comment here, but I can say I usually trust your 
judgement.

Best Regards,
Michael Paesold

Re: Why we still see some reports of "could not access transaction status"

From

Alvaro Herrera

Date:

14 October 2004, 14:01:46

On Wed, Oct 13, 2004 at 12:18:08PM -0400, Tom Lane wrote:

> I think what we ought to do to solve this problem permanently is to stop
> making the callers of the HeapTupleSatisfiesFoo() routines responsible
> for checking for hint bit updates.  It would be a lot safer, and AFAICS
> not noticeably less efficient, for those routines to call
> SetBufferCommitInfoNeedsSave for themselves.  This would require adding
> to their parameter lists, because they aren't currently told which
> buffer the tuple is in, but that's no big deal considering we get to
> simplify the calling logic in all the places that are faithfully doing
> the t_infomask update check.
> 
> Comments?

I remember seeing this code when coding the phantom Xid idea and
wondering why such an error-prone style was used.  It never ocurred to
me to change it (or maybe have the guts to do it), but now that you
mention it it certainly seems a good idea.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Tulio: oh, para qué servirá este boton, Juan Carlos?
Policarpo: No, aléjense, no toquen la consola!
Juan Carlos: Lo apretaré una y otra vez.

Re: Why we still see some reports of "could not access transaction

From

Gaetano Mendola

Date:

15 October 2004, 17:49:39

Tom Lane wrote:
> Having seen a couple recent reports of "could not access status of
> transaction" for old, not-obviously-corrupt transaction numbers, I went
> looking to see if I could find a way that the system could truncate CLOG
> before it's really marked all occurrences of old transaction numbers as
> known-dead or known-good.
> 
> I found one.
> 

Are you going to fix it for the 8.0 and/or back patch it ?



Regards
Gaetano Mendola

Re: Why we still see some reports of "could not access

From

Neil Conway

Date:

16 October 2004, 14:45:27

Gaetano Mendola wrote:
> Are you going to fix it for the 8.0 and/or back patch it ?

http://archives.postgresql.org/pgsql-committers/2004-10/msg00229.php
http://archives.postgresql.org/pgsql-committers/2004-10/msg00191.php

plus backpatches to older branches (REL7_3_STABLE, REL7_2_STABLE).

Has there been any thought about putting out another 7.4 release with 
this fix?

-Neil

Re: Why we still see some reports of "could not access

From

Tom Lane

Date:

16 October 2004, 17:20:30

Neil Conway <neilc@samurai.com> writes:
> Has there been any thought about putting out another 7.4 release with 
> this fix?

There has, but there are some other open issues I'd like to deal with
first.

If anyone has any pending 7.4 fixes, getting them in in the next
few days would be a Good Plan.
        regards, tom lane

Re: 7.4 changes

From

Andrew Dunstan

Date:

17 October 2004, 13:26:24


Tom Lane wrote:

>
>If anyone has any pending 7.4 fixes, getting them in in the next
>few days would be a Good Plan.
>
>  
>


Do we want to backport tighter security for plperl? In particular, 
insisting on Safe.pm >= 2.09 and removing the :base_io set of ops?

cheers

andrew

Re: 7.4 changes

From

Andrew Dunstan

Date:

17 October 2004, 14:09:17

Andrew Dunstan wrote:

>
>
> Tom Lane wrote:
>
>>
>> If anyone has any pending 7.4 fixes, getting them in in the next
>> few days would be a Good Plan.
>>
>>  
>>
>
>
> Do we want to backport tighter security for plperl? In particular, 
> insisting on Safe.pm >= 2.09 and removing the :base_io set of ops?
>
>

And it would also be nice if we could add 
contrib/cube/expected/cube_1.out to the 7.4 branch, I think, so that 
more platforms could pass the contrib installcheck tests.

cheers

andrew

Re: 7.4 changes

From

Tom Lane

Date:

17 October 2004, 18:52:58

Andrew Dunstan <andrew@dunslane.net> writes:
> Do we want to backport tighter security for plperl? In particular, 
> insisting on Safe.pm >= 2.09 and removing the :base_io set of ops?

I'd vote not: 7.4.5 => 7.4.6 is not an update that people would expect
to break their plperl code ...
        regards, tom lane

Re: 7.4 changes

From

Andrew Dunstan

Date:

18 October 2004, 17:45:57

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>Do we want to backport tighter security for plperl? In particular, 
>>insisting on Safe.pm >= 2.09 and removing the :base_io set of ops?
>>    
>>
>
>I'd vote not: 7.4.5 => 7.4.6 is not an update that people would expect
>to break their plperl code ...
>
>
>  
>

*shrug* OK. Then plperl should probably not be regarded as being as 
"trusted" as we would like. Note that old versions of Safe.pm  have been 
the subject of security advisories such as this one 
http://www.securityfocus.com/bid/6111/info/ for some time.

cheers

andrew

Re: 7.4 changes

From

Neil Conway

Date:

19 October 2004, 08:01:17

On Tue, 2004-10-19 at 02:45, Andrew Dunstan wrote:
> *shrug* OK. Then plperl should probably not be regarded as being as 
> "trusted" as we would like. Note that old versions of Safe.pm  have been 
> the subject of security advisories such as this one 
> http://www.securityfocus.com/bid/6111/info/ for some time.

Perhaps a compromise would be to require the newer version of Safe.pm,
but leave the other changes for 8.0. Upgrading Safe.pm can presumably be
done without needing any changes to the rest of one's pl/perl code.

-Neil

Re: 7.4 changes

From

Andrew Dunstan

Date:

19 October 2004, 13:47:56

Neil Conway wrote:

>On Tue, 2004-10-19 at 02:45, Andrew Dunstan wrote:
>  
>
>>*shrug* OK. Then plperl should probably not be regarded as being as 
>>"trusted" as we would like. Note that old versions of Safe.pm  have been 
>>the subject of security advisories such as this one 
>>http://www.securityfocus.com/bid/6111/info/ for some time.
>>    
>>
>
>Perhaps a compromise would be to require the newer version of Safe.pm,
>but leave the other changes for 8.0. Upgrading Safe.pm can presumably be
>done without needing any changes to the rest of one's pl/perl code.
>
>
>  
>

s/the rest of/any of/

Indeed it can.

The other thing I suggested was removing the :base_io set of ops - I 
would regard plperl functions that did things like printing to STDOUT as 
broken to start with.

But maybe we can just live with what we have and advertise that 8.0's 
plperl is more secure.

cheers

andrew

Re: 7.4 changes

From

Alvaro Herrera

Date:

19 October 2004, 14:02:23

On Tue, Oct 19, 2004 at 08:47:20AM -0400, Andrew Dunstan wrote:

> But maybe we can just live with what we have and advertise that 8.0's 
> plperl is more secure.

The release notes should point out that 7.4's plperl is unsecure unless
the correct version of Safe.pm is installed.  Maybe it works to make it
croak if an unsafe version of Safe.pm is found?

I'm not sure about "living with" known security vulnerabilities.  What
about ISPs which give Pg hosting with plperl installed?  They surely
will want to know about this.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
One man's impedance mismatch is another man's layer of abstraction.
(Lincoln Yeoh)