Thread: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

"Devrim GUNDUZ"

Date:

14 September 2012, 14:23:34

Hi,

Does this mean we need to wrap new tarballs soon, because of the data loss
bug?

Regards,
Devrim

14 Eylül 2012, Cuma, 4:42 pm tarihinde, Robert Haas yazmış:
> Properly set relpersistence for fake relcache entries.
>
> This can result in buffers failing to be properly flushed at
> checkpoint time, leading to data loss.
>
> Report, diagnosis, and patch by Jeff Davis.
>
> Branch
> ------
> REL9_1_STABLE
>
> Details
> -------
> http://git.postgresql.org/pg/commitdiff/fef2c17807e095a04441e4f1fe05f75d5578ead2
>
> Modified Files
> --------------
> src/backend/access/transam/xlogutils.c |    5 +++++
> src/backend/storage/buffer/bufmgr.c    |    2 ++
> 2 files changed, 7 insertions(+), 0 deletions(-)
>
>
> --
> Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-committers
>


--
Temporarily using a webmail program.

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Tom Lane

Date:

14 September 2012, 16:28:14

"Devrim GUNDUZ" <devrim@gunduz.org> writes:
> Does this mean we need to wrap new tarballs soon, because of the data loss
> bug?

I'll defer to Robert on whether this bug is serious enough to merit a
near-term release on its own.  But historically, we've wanted to push
out a .1 update two or three weeks after a .0 release, to mop up
whatever new bugs are nasty enough for people to trip over right away.
Maybe we should be thinking in terms of a set of releases around the
end of September.
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Simon Riggs

Date:

14 September 2012, 17:17:42

On 14 September 2012 17:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Devrim GUNDUZ" <devrim@gunduz.org> writes:
>> Does this mean we need to wrap new tarballs soon, because of the data loss
>> bug?
>
> I'll defer to Robert on whether this bug is serious enough to merit a
> near-term release on its own.  But historically, we've wanted to push
> out a .1 update two or three weeks after a .0 release, to mop up
> whatever new bugs are nasty enough for people to trip over right away.
> Maybe we should be thinking in terms of a set of releases around the
> end of September.

The bug itself is not major, but the extent and user impact is serious.

In my opinion we should issue a new release of 9.1 immediately.

9.2 might have different treatment but does suffer the same problem.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Robert Haas

Date:

15 September 2012, 13:05:41

On Sep 14, 2012, at 12:17 PM, Simon Riggs <simon@2ndQuadrant.com> wrote:
> The bug itself is not major, but the extent and user impact is serious.

I don't think I understand how you're using the word major there.  I seem to recall some previous disputation between
youand I about the use of that term, so maybe it would be good to get that cleared up.  To me major and serious mean
aboutthe same thing, so it can't for me be one but not the other. 

Definitions aside, I think it's a pretty scary issue. It basically means that if you have a recovery (crash or archive)
duringwhich you read a buffer into memory, the buffer won't be checkpointed.  So if, before the buffer is next evicted,
youhave a crash, and if at least one checkpoint has intervened between the most recent WAL-logged operation on the
bufferand the crash, you're hosed.  That's not a terribly unlikely scenario. 

While I can't claim to understand exactly what our standards for forcing an immediate minor release are, I think this
ispretty darn bad. I certainly don't want my customers running with this for a minute longer than necessary, and I feel
reallybad for letting it get into a release, let alone go undetected for this long. :-( 

...Robert

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Tom Lane

Date:

15 September 2012, 16:29:34

Robert Haas <robertmhaas@gmail.com> writes:
> Definitions aside, I think it's a pretty scary issue. It basically means that if you have a recovery (crash or
archive)during which you read a buffer into memory, the buffer won't be checkpointed. So if, before the buffer is next
evicted,you have a crash, and if at least one checkpoint has intervened between the most recent WAL-logged operation on
thebuffer and the crash, you're hosed. That's not a terribly unlikely scenario.

This is only an issue on standby slaves or when doing a PITR recovery, no?
As far as I can tell from the discussion, it would *not* affect crash
recovery, because we don't do restartpoints during crash recovery.

> While I can't claim to understand exactly what our standards for forcing an immediate minor release are, I think this
ispretty darn bad. I certainly don't want my customers running with this for a minute longer than necessary, and I feel
reallybad for letting it get into a release, let alone go undetected for this long. :-(

There's been some discussion about it among -core already. The earliest
we could possibly do anything would be a release this coming week (that
is, wrap Thursday for release Monday 9/24). However, considering that
a lot of key people will be attending PG Open between now and Thursday,
I'm not sure how practical that really would be. Waiting a week might
be better, and it would give more time for initial bug reports against
9.2.0 to filter in.
regards, tom lane

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Andres Freund

Date:

15 September 2012, 19:00:28

On Saturday, September 15, 2012 06:29:25 PM Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Definitions aside, I think it's a pretty scary issue. It basically means
> > that if you have a recovery (crash or archive) during which you read a
> > buffer into memory, the buffer won't be checkpointed.  So if, before the
> > buffer is next evicted, you have a crash, and if at least one checkpoint
> > has intervened between the most recent WAL-logged operation on the
> > buffer and the crash, you're hosed.  That's not a terribly unlikely
> > scenario.
> 
> This is only an issue on standby slaves or when doing a PITR recovery, no?
> As far as I can tell from the discussion, it would not affect crash
> recovery, because we don't do restartpoints during crash recovery.
I think unfortunately it does. At the end of recovery we perform a 
END_OF_RECOVERY checkpoint that seems to suffer from these issues. While 
CreateCheckPoint() itself treats that kind of checkpoint similarly to a 
shutdown checkpoint it doesn't pass that similarity to BufferSync (via 
CheckPointGuts->CheckPointBuffers).

I hope I missed something ...

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Robert Haas

Date:

16 September 2012, 02:40:11

On Sep 15, 2012, at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Definitions aside, I think it's a pretty scary issue. It basically means that if you have a recovery (crash or
archive)during which you read a buffer into memory, the buffer won't be checkpointed.  So if, before the buffer is next
evicted,you have a crash, and if at least one checkpoint has intervened between the most recent WAL-logged operation on
thebuffer and the crash, you're hosed.  That's not a terribly unlikely scenario. 
>
> This is only an issue on standby slaves or when doing a PITR recovery, no?
> As far as I can tell from the discussion, it would *not* affect crash
> recovery, because we don't do restartpoints during crash recovery.

No, I think it does affect crash recovery. Whether or not restartspoints happen during recovery doesn't matter; what
doesmatter is that after recovery there may be shared buffers that are erroneously not marked as permanent. Such
bufferswon't be checkpointed except at shutdown time, which is wrong. 

...Robert

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Tom Lane

Date:

16 September 2012, 04:32:31

Robert Haas <robertmhaas@gmail.com> writes:
> On Sep 15, 2012, at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This is only an issue on standby slaves or when doing a PITR recovery, no?
>> As far as I can tell from the discussion, it would *not* affect crash
>> recovery, because we don't do restartpoints during crash recovery.

> No, I think it does affect crash recovery. Whether or not restartspoints happen during recovery doesn't matter; what
doesmatter is that after recovery there may be shared buffers that are erroneously not marked as permanent. Such
bufferswon't be checkpointed except at shutdown time, which is wrong.
 

Right, but we do a shutdown checkpoint at the end of crash recovery.

I could believe that this case gets missed, but it's not clear from
what's been said --- and if that case *is* broken, there should have
been a whole lot more corruption reports recently than what we've seen.
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Robert Haas

Date:

16 September 2012, 12:10:50

On Sep 15, 2012, at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Right, but we do a shutdown checkpoint at the end of crash recovery.

Yes, but that only writes the buffers that are dirty. It doesn't fix the lack of a BM_PERMANENT flag on a buffer that
oughtto have had one. So that page can now get modified AGAIN, after recovery, and not checkpointed. 

> I could believe that this case gets missed, but it's not clear from
> what's been said --- and if that case *is* broken, there should have
> been a whole lot more corruption reports recently than what we've seen.

It's pretty clear from Jeff's example at the top of the thread, which involves two crashes and no archive recovery.

...Robert

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Tom Lane

Date:

17 September 2012, 05:35:14

Robert Haas <robertmhaas@gmail.com> writes:
> On Sep 15, 2012, at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Right, but we do a shutdown checkpoint at the end of crash recovery.

> Yes, but that only writes the buffers that are dirty. It doesn't fix the lack of a BM_PERMANENT flag on a buffer that
oughtto have had one. So that page can now get modified AGAIN, after recovery, and not checkpointed.
 

Ugh.  Yeah, we need to fix this ASAP.  I've notified pgsql-packagers to
expect a release this week.
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Andres Freund

Date:

17 September 2012, 06:44:59

On Monday, September 17, 2012 07:35:06 AM Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Sep 15, 2012, at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Right, but we do a shutdown checkpoint at the end of crash recovery.
(as noted somewhere else and tackled by Simon, a END_OF_RECOVERY didn't sync 
those before)

> > Yes, but that only writes the buffers that are dirty. It doesn't fix the
> > lack of a BM_PERMANENT flag on a buffer that ought to have had one. So
> > that page can now get modified AGAIN, after recovery, and not
> > checkpointed.
> 
> Ugh.  Yeah, we need to fix this ASAP.  I've notified pgsql-packagers to
> expect a release this week.
Btw, I played with this some more on Saturday and I think, while definitely a 
bad bug, the actual consequences aren't as bad as at least I initially feared.

Fake relcache entries are currently set in 3 scenarios during recovery:
1. removal of ALL_VISIBLE in heapam.c
2. incomplete splits and incomplete deletions in nbtxlog.c
3. incomplete splits in ginxlog.c

As Jeff nicely showed its easy to corrupt the visibilitymap with this. 
Fortunately in < 9.2 that doesn't have too bad consequences and will be fixed 
by the next vacuum. In 9.2 that obviously can result in wrong query results but 
will still be fixed by a later (probably full table) vacuum.

To hit 2) and 3) the server needs to have crashed (or a strange 
recovery_target_* set) while doing a multilevel operation. I have only 
cursorily looked at gin but it looks to me like in both, nbtree and gin, the 
window during logging the multiple steps in a split/deletion is fairly short 
and its not likely that we crashed exactly during that. Unless we haven't read 
the complete multistep operation during recovery we won't ever create fake 
relcache entries for those relations/buffers! And even if that edge case is hit 
it seems somewhat likely that the few pages that are read with the fake entry 
are still in the cache (the incomplete operation has to have been soon before) 
and thus won't get the bad relpersistence flag set.

So I think while that bug had the possibility of being really bad we were 
pretty lucky...

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Simon Riggs

Date:

17 September 2012, 08:44:48

On 17 September 2012 07:44, Andres Freund <andres@2ndquadrant.com> wrote:

> So I think while that bug had the possibility of being really bad we were
> pretty lucky...

Yes, agreed. The impact is not as severe as I originally thought.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Tom Lane

Date:

17 September 2012, 13:58:51

Andres Freund <andres@2ndquadrant.com> writes:
> Btw, I played with this some more on Saturday and I think, while definitely a 
> bad bug, the actual consequences aren't as bad as at least I initially feared.

> Fake relcache entries are currently set in 3 scenarios during recovery:
> 1. removal of ALL_VISIBLE in heapam.c
> 2. incomplete splits and incomplete deletions in nbtxlog.c
> 3. incomplete splits in ginxlog.c
> [ #1 doesn't really hurt in 9.1, and the others are low probability ]

OK, that explains why we've not seen a blizzard of trouble reports.
Still seems like a good idea to fix it ASAP, though.
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Tom Lane

Date:

19 September 2012, 17:47:39

BTW, what should our advice be for recovering from corruption due to
this bug?  As far as the btree and GIN problems go, we can tell people
that REINDEX will fix it.  And in 9.1, you don't really need to worry
about the visibility map being bad.  But what do you do in 9.2, if
you have a bad visibility map?  Is there any fix short of VACUUM FULL?
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Simon Riggs

Date:

19 September 2012, 18:21:39

On 19 September 2012 18:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> BTW, what should our advice be for recovering from corruption due to
> this bug?  As far as the btree and GIN problems go, we can tell people
> that REINDEX will fix it.  And in 9.1, you don't really need to worry
> about the visibility map being bad.  But what do you do in 9.2, if
> you have a bad visibility map?  Is there any fix short of VACUUM FULL?

SET vacuum_freeze_table_age = 0;
VACUUM;

I'm writing the full notes out now.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Andres Freund

Date:

20 September 2012, 21:55:11

On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Btw, I played with this some more on Saturday and I think, while
> > definitely a bad bug, the actual consequences aren't as bad as at least
> > I initially feared.
> > 
> > Fake relcache entries are currently set in 3 scenarios during recovery:
> > 1. removal of ALL_VISIBLE in heapam.c
> > 2. incomplete splits and incomplete deletions in nbtxlog.c
> > 3. incomplete splits in ginxlog.c
> > [ #1 doesn't really hurt in 9.1, and the others are low probability ]
> 
> OK, that explains why we've not seen a blizzard of trouble reports.
> Still seems like a good idea to fix it ASAP, though.
Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the 
IRC Channel that had symptoms matching this bug.

Situation was that he started to get very high IO and xid wraparound shutdown 
warnings due to never finishing and not canceleable autovacuums. After some 
investigation it turned out that btree indexes were processed at that time. We 
found they had cyclic btpo_next pointers leading to an endless loop in 
_bt_pagedel.
We solved the issue by forcing leftsib = P_NONE inside the
while (P_ISDELETED(opaque) || opaque->btpo_next != target)
which let a queue DROP INDEX get the necessary locks.

Unfortuantely this was on a busy production system with a nearing shutdown, so 
not much was kept for further diagnosis.

After this bug was discovered I asked the user and indeed they previously 
shutdown the database twice in quick succession during heavy activity with -m 
immediate which could exactly lead to such a problem due to incompletely 
processed page splits.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Marko Tiikkaja

Date:

21 September 2012, 13:30:40

On 9/20/12 11:55 PM, Andres Freund wrote:
> On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
>> OK, that explains why we've not seen a blizzard of trouble reports.
>> Still seems like a good idea to fix it ASAP, though.
> Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the
> IRC Channel that had symptoms matching this bug.

Another such user reporting in. :-(

Our slave started accumulating WAL files and ran out of disk space 
yesterday.  After investigation from Andres and Andrew, it turns out 
that we were most likely hit by this very same bug.

Here's what they have to say:
"If the db crashes between logging the split and the parent-node insert, 
then in recovery, since relpersistence is not initialized correctly, 
when the recovery process tries to complete the operation, no xlog 
record is written for the insert.  If there's a slave server, then the 
missing xlog record for the insert means that the slave's 
incomplete_actions queue never becomes empty, therefore the slave can no 
longer do recovery restartpoints."

Some relevant information:

[cur:92/314BC870, xid:76872047, rmid:10(Heap), len/tot_len:91/123, 
info:0, prev:92/314BB890] insert: s/d/r:1663/408841/415746 
blk/off:13904/65 header: t_infomask2 8 t_infomask 2050 t_hoff 24
[cur:92/314BC8F0, xid:76872047, rmid:11(Btree), len/tot_len:702/734, 
info:64, prev:92/314BC870] split_r: s/d/r:1663/408841/475676 leftsib 2896
[cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0, 
prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; tli 1; nextxid 
76872048;  nextoid 764990; nextmulti 62062; nextoffset 132044; shutdown 
at 2012-09-11 14:26:26 CEST

2012-09-11 14:26:26.719 CEST,,,44620,,504f2df2.ae4c,5,,2012-09-11 
14:26:26 CEST,,0,LOG,00000,"redo done at 
92/314BC8F0",,,,,,,,"StartupXLOG, xlog.c:6641",""

And apparently the relpersistence check in RelationNeedsWAL() call in 
_bt_insertonpg had a role in this as well.

Regards,
Marko Tiikkaja

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Andres Freund

Date:

21 September 2012, 14:41:20

On Friday, September 21, 2012 03:30:31 PM Marko Tiikkaja wrote:
> On 9/20/12 11:55 PM, Andres Freund wrote:
> > On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
> >> OK, that explains why we've not seen a blizzard of trouble reports.
> >> Still seems like a good idea to fix it ASAP, though.
> > 
> > Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user
> > in the IRC Channel that had symptoms matching this bug.
> 
> Another such user reporting in. :-(
> 
> Our slave started accumulating WAL files and ran out of disk space
> yesterday.  After investigation from Andres and Andrew, it turns out
> that we were most likely hit by this very same bug.
> 
> Here's what they have to say:
> "If the db crashes between logging the split and the parent-node insert,
> then in recovery, since relpersistence is not initialized correctly,
> when the recovery process tries to complete the operation, no xlog
> record is written for the insert.  If there's a slave server, then the
> missing xlog record for the insert means that the slave's
> incomplete_actions queue never becomes empty, therefore the slave can no
> longer do recovery restartpoints."
> 
> Some relevant information:
> 
> [cur:92/314BC870, xid:76872047, rmid:10(Heap), ... insert: ...
> [cur:92/314BC8F0, xid:76872047, rmid:11(Btree), ... split_r: ...
> [cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0,
> prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; ... shutdown
> ... "redo done at 92/314BC8F0",,,,,,,,"StartupXLOG, xlog.c:6641",""
Which means that an insert into the heap, triggered a btree split. At that 
point the database crashed. During recovery the split was supposed to be 
finished by the btree cleanup code.

> And apparently the relpersistence check in RelationNeedsWAL() call in
> _bt_insertonpg had a role in this as well.
When detecting an incomplete split the nbtree cleanup code calls 
_bt_insert_parent, which calls _bt_insertonpg. Which finishes the split. BUT: 
it doesn't log that it finished because RelationNeedsWal() says it doesn't need 
to.

That means:
* indexes on stanbys will *definitely* be corrupted
* a standby won't perform any restartpoints anymore till restarted
* if the primary crashes corruption is likely.

Hrm. I retract my earlier statement about the low likelihood of corruption due 
to this.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Robert Haas

Date:

26 September 2012, 00:42:41

On Fri, Sep 21, 2012 at 10:41 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hrm. I retract my earlier statement about the low likelihood of corruption due
> to this.

Yeah.  :-(

We've recently had at least one report of autovacuum failing to
terminate due to a series of index pages forming a circular loop, and
at least one case where it appears that the data became not-unique on
a column upon which a unique index existed, in releases that contain
this bug.

It seems therefore that REINDEX + VACUUM with
vacuum_freeze_table_age=0 is not quite sufficient to recover from this
problem.  If your index has come to contain a circularity, vacuum will
fail to terminate, and you'll need to drop it completely to recover.
And if you were relying on your index to enforce a unique constraint
and it didn't, you'll need to do manual data repair before it will be
possible to rebuild or replace that index.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Виктор Егоров

Date:

26 September 2012, 05:57:11

I'm afraid I'm exactly in this situation now.

Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) was:

INFO: "meta_version_chunks": found 55363 removable, 32566245 nonremovable row versions in 450292 out of 450292 pages

DETAIL: 0 dead row versions cannot be removed yet.

There were 588315 unused item pointers.

0 pages are entirely empty.

CPU 2.44s/5.77u sec elapsed 2150.18 sec.

INFO: vacuuming "pg_toast.pg_toast_16582"

And here're are the locks held by the VACCUM backend:

select oid,relname,relkind,relpages,reltuples::numeric(15,0),reltoastrelid,reltoastidxid

from pg_class

where oid in (select relation from pg_locks where pid = 1380);

-------+----------------------+---------+----------+-----------+---------------+---------------

16585 | pg_toast_16582 | t | 16460004 | 58161600 | 0 | 16587

16587 | pg_toast_16582_index | i | 188469 | 58161600 | 0 | 0

16582 | meta_version_chunks | r | 450292 | 32566200 | 16585 | 0

I will not touch anything and would like to get some recommendations on how to proceed.

2012/9/26 Robert Haas <robertmhaas@gmail.com>

On Fri, Sep 21, 2012 at 10:41 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hrm. I retract my earlier statement about the low likelihood of corruption due
> to this.

Yeah. :-(

We've recently had at least one report of autovacuum failing to
terminate due to a series of index pages forming a circular loop, and
at least one case where it appears that the data became not-unique on
a column upon which a unique index existed, in releases that contain
this bug.

It seems therefore that REINDEX + VACUUM with
vacuum_freeze_table_age=0 is not quite sufficient to recover from this
problem. If your index has come to contain a circularity, vacuum will
fail to terminate, and you'll need to drop it completely to recover.
And if you were relying on your index to enforce a unique constraint
and it didn't, you'll need to do manual data repair before it will be
possible to rebuild or replace that index.

--
Victor Y. Yegorov

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Виктор Егоров

Date:

26 September 2012, 06:12:41

Forget to mention, that:
- VACUUM is running on the master;
- current state is unchanged for 20 hours.

2012/9/26 Виктор Егоров <vyegorov@gmail.com>:
> I'm afraid I'm exactly in this situation now.
>
> Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) was:
> INFO:  "meta_version_chunks": found 55363 removable, 32566245 nonremovable
> row versions in 450292 out of 450292 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 588315 unused item pointers.
> 0 pages are entirely empty.
> CPU 2.44s/5.77u sec elapsed 2150.18 sec.
> INFO:  vacuuming "pg_toast.pg_toast_16582"

--
Victor Y. Yegorov

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Devrim GÜNDÜZ

Date:

26 September 2012, 06:46:38

Hi,

On Tue, 2012-09-25 at 20:42 -0400, Robert Haas wrote:

> It seems therefore that REINDEX + VACUUM with
> vacuum_freeze_table_age=0 is not quite sufficient to recover from this
> problem.  If your index has come to contain a circularity, vacuum will
> fail to terminate, and you'll need to drop it completely to recover.

What is the difference between REINDEX and dropping and recreating
index? (or say, creating the same index with another name, dropping old
one, and renaming the new one with old one?)

Regards,
--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Andres Freund

Date:

26 September 2012, 08:46:03

Hi,

On Wednesday, September 26, 2012 07:57:06 AM Виктор Егоров wrote:
> I'm afraid I'm exactly in this situation now.
:(
> Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE)
It recommended doing a REINDEX first though? I guess you didn't do that?

> was: INFO:  "meta_version_chunks": found 55363 removable, 32566245
> nonremovable row versions in 450292 out of 450292 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 588315 unused item pointers.
> 0 pages are entirely empty.
> CPU 2.44s/5.77u sec elapsed 2150.18 sec.
> INFO:  vacuuming "pg_toast.pg_toast_16582"
>
> And here're are the locks held by the VACCUM backend:
> select
> oid,relname,relkind,relpages,reltuples::numeric(15,0),reltoastrelid,reltoas
> tidxid from pg_class
>  where oid in (select relation from pg_locks where pid = 1380);
>   oid  |       relname        | relkind | relpages | reltuples |
> reltoastrelid | reltoastidxid
> -------+----------------------+---------+----------+-----------+-----------
> ----+--------------- 16585 | pg_toast_16582       | t       | 16460004 |
> 58161600 |
>   0 |         16587
>  16587 | pg_toast_16582_index | i       |   188469 |  58161600 |
>   0 |             0
>  16582 | meta_version_chunks  | r       |   450292 |  32566200 |
> 16585 |             0
>
> I will not touch anything and would like to get some recommendations on how
> to proceed.

On Wednesday, September 26, 2012 08:12:37 AM Виктор Егоров wrote:
> Forget to mention, that:
> - VACUUM is running on the master;
> - current state is unchanged for 20 hours.
I guess you cannot cancel the vacuum? Last time it was in a cycle without
checking interrupts inbetween.

Can you restart the server?

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From

Виктор Егоров

Date:

26 September 2012, 08:59:52

You're right, REINDEX was not done.

I've stopped the VACUUM, did a proper server restart (pg_ctl -m fast -w restart)
and will work on rebuilding relations.
Seems like I have another issue with a bunch of bloated tables on my way also.

Thanks for the support.

2012/9/26 Andres Freund <andres@2ndquadrant.com>:
>> Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE)
> It recommended doing a REINDEX first though? I guess you didn't do that?
>
> ...
>
> I guess you cannot cancel the vacuum? Last time it was in a cycle without
> checking interrupts inbetween.
>
> Can you restart the server?


-- 
Victor Y. Yegorov