Re: Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum
Date
Msg-id 4D710ABD.5090103@enterprisedb.com
Whole thread Raw
In response to Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum
List pgsql-hackers
On 04.03.2011 11:00, daveg wrote:
> Thanks, I've applied both patches to one host. I'll probably have to back
> down on the debugging logging soon, as the output is pretty voluminious,
> it is producing 100MB of message log every few minutes. I'll try Merlins
> patch to get the case setting the bit first though.
>
> Anyway, here is a snippit of log with the setting and unsetting of the
> bit on one page. Unfortunately, we don't know which of a dozen odd databases
> the page belongs to, but the timestamps are so close it seems likely to be
> the same one. I've added dbname to the patch and will get that next time
> I can switch binaries.
>
> 2011-03-03 23:28:34.170 PST  2039  WARNING:  debugging: setting PD_ALL_VISIBLE in relation "pg_statistic" on page
5963(OldestXmin 331848998)
 
> ...
> /cv/logs/production_03-20110303_232519.log.gz:2011-03-03 23:29:34.194 PST  2115  WARNING:  PD_ALL_VISIBLE flag was
incorrectlyset in relation "pg_statistic" page 5963 (OldestXmin 331677178)
 
> ...
> 2011-03-03 23:42:38.323 PST  2808  WARNING:  debugging: setting PD_ALL_VISIBLE in relation "pg_attribute" on page
5963(OldestXmin 331677178)
 

Hmm, if these all came from the same database, then it looks OldestXmin 
has moved backwards. That would explain the warnings. First one vacuum 
determines that all the tuples are visible to everyone and sets the 
flag. Then another vacuum runs with an older OldestXmin, and determines 
that there's a tuple on the page with an xmin that is not yet visible to 
everyone, hence it thinks that the flag should not have been set yet.

Looking at the code, I don't see how that situation could arise, though. 
The value calculated by GetOldestXmin() should never move backwards. And 
GetOldestXmin() is called in lazy_vacuum_rel(), after it has acquired a 
lock on the table, which should protect from a race condition where two 
vacuums could run on the table one after another, in a way where the 
later vacuum runs with an OldestXmin calculated before the first vacuum.

Hmm, fiddling with vacuum_defer_cleanup_age on the fly could cause that, 
though. You don't do that, do you?

> Also, I've attached the relevent page image.

Thanks. There seems to be two tuples on the page, both of were HOT 
updated at some point, but now there's only one version of each left:

postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax,  t_ctid, t_infomask, t_infomask2, t_hoff
FROM heap_page_items(loread(lo_open(29924, 262144), 8192)) WHERE 
lp_flags <> 0; lp | lp_off | lp_flags | lp_len |  t_xmin   | t_xmax |  t_ctid  | 
t_infomask | t_infomask2 | t_hoff
----+--------+----------+--------+-----------+--------+----------+------------+-------------+--------  1 |   7608 |
  1 |    580 | 331250141 |      0 | (5963,1) |  10499 |      -32747 |     32  3 |      1 |        2 |      0 |
|        |          |        |             |  4 |   7528 |        1 |     76 | 331735553 |      0 | (5963,4) |  10497 |
    -32747 |     32 19 |      4 |        2 |      0 |           |        |          |        |             |
 
(4 rows)

Deciphering those infomasks, the first tuple at lp 1 has these flags set:
HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_XMIN_COMMITTED | HEAP_HASNULL | 
HEAP_HASVARWIDTH

And the 2nd one at lp 4:
HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_XMIN_COMMITTED | HEAP_HASNULL

So, both of those tuples are live.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: ALTER TABLE deadlock with concurrent INSERT
Next
From: Robert Haas
Date:
Subject: Re: Quick Extensions Question