Thread: Cause of missing pg_clog files

Cause of missing pg_clog files

From
Tom Lane
Date:
Yesterday I reported a WAL problem that could lead to tuples not being
marked as committed-good or committed-dead after we'd already removed
the pg_clog segment that had their transaction's commit status.
I wasn't completely satisfied with that, though, because on further
reflection it seemed a very low-probability mechanism.  I kept digging,
and finally came to the kind of bug that qualifies as a big DOH :-(

If you run a database-wide VACUUM (one with no specific target table
mentioned) as a non-superuser, then the VACUUM doesn't process tables
that don't belong to you.  But it will advance pg_database.datvacuumxid
anyway, which means that pg_clog could be truncated while old transaction
references still remain unmarked in those other tables.

In words of one syllable: running VACUUM as a non-superuser can cause
irrecoverable data loss in any 7.2.* release.

I think this qualifies as a "must fix" bug.  I recommend we back-patch
a fix for this into the REL7_2 branch and put out a 7.2.3 release.
We should also fix the "can't wait without a PROC" bug that was solved
a few days ago.
        regards, tom lane


Re: Cause of missing pg_clog files

From
Bruce Momjian
Date:
Tom Lane wrote:
> Yesterday I reported a WAL problem that could lead to tuples not being
> marked as committed-good or committed-dead after we'd already removed
> the pg_clog segment that had their transaction's commit status.
> I wasn't completely satisfied with that, though, because on further
> reflection it seemed a very low-probability mechanism.  I kept digging,
> and finally came to the kind of bug that qualifies as a big DOH :-(
> 
> If you run a database-wide VACUUM (one with no specific target table
> mentioned) as a non-superuser, then the VACUUM doesn't process tables
> that don't belong to you.  But it will advance pg_database.datvacuumxid
> anyway, which means that pg_clog could be truncated while old transaction
> references still remain unmarked in those other tables.
> 
> In words of one syllable: running VACUUM as a non-superuser can cause
> irrecoverable data loss in any 7.2.* release.
> 
> I think this qualifies as a "must fix" bug.  I recommend we back-patch
> a fix for this into the REL7_2 branch and put out a 7.2.3 release.
> We should also fix the "can't wait without a PROC" bug that was solved
> a few days ago.

Wow, you sure have found some good bugs in the past few days.  Nice job.

Yes, I agree we should push out a 7.2.3, and I think we are now ready
for beta3.  I will work on docs and packaging now.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Cause of missing pg_clog files

From
Bruce Momjian
Date:
OK, we need a decision on whether we are going to do a 7.2,3 or just
have it in beta3.  If it is in 7.2.3, I would not mention it in the
beta3 release notes.

---------------------------------------------------------------------------

Tom Lane wrote:
> Yesterday I reported a WAL problem that could lead to tuples not being
> marked as committed-good or committed-dead after we'd already removed
> the pg_clog segment that had their transaction's commit status.
> I wasn't completely satisfied with that, though, because on further
> reflection it seemed a very low-probability mechanism.  I kept digging,
> and finally came to the kind of bug that qualifies as a big DOH :-(
> 
> If you run a database-wide VACUUM (one with no specific target table
> mentioned) as a non-superuser, then the VACUUM doesn't process tables
> that don't belong to you.  But it will advance pg_database.datvacuumxid
> anyway, which means that pg_clog could be truncated while old transaction
> references still remain unmarked in those other tables.
> 
> In words of one syllable: running VACUUM as a non-superuser can cause
> irrecoverable data loss in any 7.2.* release.
> 
> I think this qualifies as a "must fix" bug.  I recommend we back-patch
> a fix for this into the REL7_2 branch and put out a 7.2.3 release.
> We should also fix the "can't wait without a PROC" bug that was solved
> a few days ago.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Cause of missing pg_clog files

From
Andrew Sullivan
Date:
On Fri, Sep 27, 2002 at 08:30:38PM -0400, Bruce Momjian wrote:
> 
> OK, we need a decision on whether we are going to do a 7.2,3 or just
> have it in beta3.  If it is in 7.2.3, I would not mention it in the
> beta3 release notes.

If there won't be any 7.2.3, could a note be put up on the website at
least?  This is a pretty serious problem, and new users right now
will be using 7.2.2, which has this bug.  They should be warned.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8                                        +1 416 646 3304
x110



Re: Cause of missing pg_clog files

From
Justin Clift
Date:
Bruce Momjian wrote:
> 
> OK, we need a decision on whether we are going to do a 7.2,3 or just
> have it in beta3.  If it is in 7.2.3, I would not mention it in the
> beta3 release notes.

We definitely should have a 7.2.3.  If we can release a 7.2.2 to fix
bugs and a security flaw, then we should definitely have a 7.2.3 to
ensure the usability of the 7.2.x series.

Some places will still be using 7.2.x for 2 years to come, just because
7.2.x was what their projects started developing against, etc.

:-)

Regards and best wishes,

Justin Clift

> ---------------------------------------------------------------------------
> 
> Tom Lane wrote:
> > Yesterday I reported a WAL problem that could lead to tuples not being
> > marked as committed-good or committed-dead after we'd already removed
> > the pg_clog segment that had their transaction's commit status.
> > I wasn't completely satisfied with that, though, because on further
> > reflection it seemed a very low-probability mechanism.  I kept digging,
> > and finally came to the kind of bug that qualifies as a big DOH :-(
> >
> > If you run a database-wide VACUUM (one with no specific target table
> > mentioned) as a non-superuser, then the VACUUM doesn't process tables
> > that don't belong to you.  But it will advance pg_database.datvacuumxid
> > anyway, which means that pg_clog could be truncated while old transaction
> > references still remain unmarked in those other tables.
> >
> > In words of one syllable: running VACUUM as a non-superuser can cause
> > irrecoverable data loss in any 7.2.* release.
> >
> > I think this qualifies as a "must fix" bug.  I recommend we back-patch
> > a fix for this into the REL7_2 branch and put out a 7.2.3 release.
> > We should also fix the "can't wait without a PROC" bug that was solved
> > a few days ago.
> >
> >                       regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> > http://archives.postgresql.org
> >
> 
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: Cause of missing pg_clog files

From
Bruce Momjian
Date:
Justin Clift wrote:
> Bruce Momjian wrote:
> > 
> > OK, we need a decision on whether we are going to do a 7.2,3 or just
> > have it in beta3.  If it is in 7.2.3, I would not mention it in the
> > beta3 release notes.
> 
> We definitely should have a 7.2.3.  If we can release a 7.2.2 to fix
> bugs and a security flaw, then we should definitely have a 7.2.3 to
> ensure the usability of the 7.2.x series.
> 
> Some places will still be using 7.2.x for 2 years to come, just because
> 7.2.x was what their projects started developing against, etc.

There will be a 7.2.3.  Tom is going to back-port his fixes, then I am
going to brand it, and then Marc will release it;  it is in-process.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


7.2.3 fixes (was Re: Cause of missing pg_clog files)

From
Tom Lane
Date:
Andrew Sullivan <andrew@libertyrms.info> writes:
> On Fri, Sep 27, 2002 at 08:30:38PM -0400, Bruce Momjian wrote:
>> OK, we need a decision on whether we are going to do a 7.2,3 or just
>> have it in beta3.  If it is in 7.2.3, I would not mention it in the
>> beta3 release notes.

> If there won't be any 7.2.3,

There will be; I will backport the fixes today, and Marc promised to
roll the tarball tonight.

One thing I am undecided about: I am more than half tempted to put in
the fix that makes us able to cope with mktime's broken-before-1970
behavior in recent glibc versions (e.g., Red Hat 7.3).  This seems like
a good idea considering that other Linux distros will surely be updating
glibc soon too.  On the other hand, it's hard to call it a critical bug
fix --- it ain't on a par with the vacuum/clog problem, for sure.  And
the patch has received only limited testing (basically just whatever
use 7.3beta1 has had).  On the third hand, the patch only does something
if mktime() has already failed, so it's hard to see how it could make
life worse even if it's buggy.

Any votes on whether to fix that or leave it alone in 7.2.3?  I need
some input in the next few hours ...
        regards, tom lane


Re: 7.2.3 fixes (was Re: Cause of missing pg_clog files)

From
Bruce Momjian
Date:
Tom Lane wrote:
> Andrew Sullivan <andrew@libertyrms.info> writes:
> > On Fri, Sep 27, 2002 at 08:30:38PM -0400, Bruce Momjian wrote:
> >> OK, we need a decision on whether we are going to do a 7.2,3 or just
> >> have it in beta3.  If it is in 7.2.3, I would not mention it in the
> >> beta3 release notes.
> 
> > If there won't be any 7.2.3,
> 
> There will be; I will backport the fixes today, and Marc promised to
> roll the tarball tonight.
> 
> One thing I am undecided about: I am more than half tempted to put in
> the fix that makes us able to cope with mktime's broken-before-1970
> behavior in recent glibc versions (e.g., Red Hat 7.3).  This seems like
> a good idea considering that other Linux distros will surely be updating
> glibc soon too.  On the other hand, it's hard to call it a critical bug
> fix --- it ain't on a par with the vacuum/clog problem, for sure.  And
> the patch has received only limited testing (basically just whatever
> use 7.3beta1 has had).  On the third hand, the patch only does something
> if mktime() has already failed, so it's hard to see how it could make
> life worse even if it's buggy.
> 
> Any votes on whether to fix that or leave it alone in 7.2.3?  I need
> some input in the next few hours ...

I think it should be put in.  You work for Red Hat, and that's the least
we can do for them.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: 7.2.3 fixes (was Re: Cause of missing pg_clog files)

From
Joe Conway
Date:
Tom Lane wrote:
> One thing I am undecided about: I am more than half tempted to put in
> the fix that makes us able to cope with mktime's broken-before-1970
> behavior in recent glibc versions (e.g., Red Hat 7.3).  This seems like
> a good idea considering that other Linux distros will surely be updating
> glibc soon too.  On the other hand, it's hard to call it a critical bug
> fix --- it ain't on a par with the vacuum/clog problem, for sure.  And
> the patch has received only limited testing (basically just whatever
> use 7.3beta1 has had).  On the third hand, the patch only does something
> if mktime() has already failed, so it's hard to see how it could make
> life worse even if it's buggy.
> 
> Any votes on whether to fix that or leave it alone in 7.2.3?  I need
> some input in the next few hours ...
> 

+1 for fixing it

Joe



Re: 7.2.3 fixes (was Re: Cause of missing pg_clog files)

From
Justin Clift
Date:
Tom Lane wrote:
<snip>
> Any votes on whether to fix that or leave it alone in 7.2.3?  I need
> some input in the next few hours ...

Including it sounds like a good idea.

'Yes' from me.

:)

Regards and best wishes,

Justin Clift

>                         regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: 7.2.3 fixes (was Re: Cause of missing pg_clog files)

From
"Marc G. Fournier"
Date:
Nothing against including it from me ...


On Mon, 30 Sep 2002, Tom Lane wrote:

> Andrew Sullivan <andrew@libertyrms.info> writes:
> > On Fri, Sep 27, 2002 at 08:30:38PM -0400, Bruce Momjian wrote:
> >> OK, we need a decision on whether we are going to do a 7.2,3 or just
> >> have it in beta3.  If it is in 7.2.3, I would not mention it in the
> >> beta3 release notes.
>
> > If there won't be any 7.2.3,
>
> There will be; I will backport the fixes today, and Marc promised to
> roll the tarball tonight.
>
> One thing I am undecided about: I am more than half tempted to put in
> the fix that makes us able to cope with mktime's broken-before-1970
> behavior in recent glibc versions (e.g., Red Hat 7.3).  This seems like
> a good idea considering that other Linux distros will surely be updating
> glibc soon too.  On the other hand, it's hard to call it a critical bug
> fix --- it ain't on a par with the vacuum/clog problem, for sure.  And
> the patch has received only limited testing (basically just whatever
> use 7.3beta1 has had).  On the third hand, the patch only does something
> if mktime() has already failed, so it's hard to see how it could make
> life worse even if it's buggy.
>
> Any votes on whether to fix that or leave it alone in 7.2.3?  I need
> some input in the next few hours ...
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>



Re: 7.2.3 fixes (was Re: Cause of missing pg_clog files)

From
Andrew Sullivan
Date:
On Mon, Sep 30, 2002 at 11:18:27AM -0400, Tom Lane wrote:
> use 7.3beta1 has had).  On the third hand, the patch only does something
> if mktime() has already failed, so it's hard to see how it could make
> life worse even if it's buggy.

On those grounds alone, it seems worth putting in.  As you say, it's
hard to see how it can be worse than not putting it in, even if it
turns out to be buggy.  It's probably worth noting prominently in the
release notes that it has received minimal testing, though.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8                                        +1 416 646 3304
x110