Re: Reviewing freeze map code - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Reviewing freeze map code
Date
Msg-id CA+TgmobcpoZQo78nSh2aa8or4phY6rk+tmJtWdJ9QsNWuyhb1A@mail.gmail.com
Whole thread Raw
In response to Re: Reviewing freeze map code  (Andres Freund <andres@anarazel.de>)
Responses Re: Reviewing freeze map code  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Reviewing freeze map code  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Fri, Jun 10, 2016 at 1:59 PM, Andres Freund <andres@anarazel.de> wrote:
>> Master, but with an existing standby. So it could be related to
>> hot_standby_feedback or such.
>
> I just managed to trigger it again.
>
>
> #1  0x00007fa1a73778da in __GI_abort () at abort.c:89
> #2  0x00007f9f1395e59c in record_corrupt_item (items=items@entry=0x2137be0, tid=0x7f9fb8681c0c)
>     at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:612
> #3  0x00007f9f1395ead5 in collect_corrupt_items (relid=relid@entry=29449, all_visible=all_visible@entry=0 '\000',
all_frozen=all_frozen@entry=1'\001')
 
>     at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:572
> #4  0x00007f9f1395f476 in pg_check_frozen (fcinfo=0x7ffe5343a200) at
/home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:292
> #5  0x00000000005fdbec in ExecMakeTableFunctionResult (funcexpr=0x2168630, econtext=0x2168320, argContext=<optimized
out>,expectedDesc=0x2168ef0,
 
>     randomAccess=0 '\000') at /home/andres/src/postgresql/src/backend/executor/execQual.c:2211
> #6  0x0000000000616992 in FunctionNext (node=node@entry=0x2168210) at
/home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:94
> #7  0x00000000005ffdcb in ExecScanFetch (recheckMtd=0x6166f0 <FunctionRecheck>, accessMtd=0x616700 <FunctionNext>,
node=0x2168210)
>     at /home/andres/src/postgresql/src/backend/executor/execScan.c:95
> #8  ExecScan (node=node@entry=0x2168210, accessMtd=accessMtd@entry=0x616700 <FunctionNext>,
recheckMtd=recheckMtd@entry=0x6166f0<FunctionRecheck>)
 
>     at /home/andres/src/postgresql/src/backend/executor/execScan.c:145
> #9  0x00000000006169e4 in ExecFunctionScan (node=node@entry=0x2168210) at
/home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:268
>
> the error happened just after I restarted a standby, so it's not
> unlikely to be related to hot_standby_feedback.

After some off-list discussion and debugging, Andres and I have
managed to identify three issues here (so far).  Two are issues in the
testing, and one is a data-corrupting bug in the freeze map code.

1. pg_check_visible keeps on using the same OldestXmin for all its
checks even though the real OldestXmin may advance in the meantime.
This can lead to spurious problem reports: pg_check_visible() thinks
that the tuple isn't all visible yet and reports it as corruption, but
in reality there's no problem.

2. pg_check_visible includes the same check for heap-xmin-committed
that vacuumlazy.c uses, but hint bits aren't crash safe, so this could
lead to a spurious trouble report in a scenario involving a crash.

3. vacuumlazy.c includes this code:
               if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
MultiXactCutoff,&frozen[nfrozen]))                   frozen[nfrozen++].offset = offnum;               else if
(heap_tuple_needs_eventual_freeze(tuple.t_data))                  all_frozen = false;
 

That's wrong, because a "true" return value from
heap_prepare_freeze_tuple() means only that it has done *some*
freezing work on the tuple, not that it's done all of the freezing
work that will ever need to be done.  So, if the tuple's xmin can be
frozen and is aborted but not older than vacuum_freeze_min_age, then
heap_prepare_freeze_tuple() won't free xmax, but the page will still
be marked all-frozen, which is bad.  I think it normally won't matter
because the xmax will probably be hinted invalid anyway, since we just
pruned the page which should have set hint bits everywhere, but if
those hint bits were lost then we'd eventually end up with an
accessible xmax pointing off into space.

My first thought was to just delete the "else" but that would be bad
because we'd fail to set all-frozen immediately in a lot of cases
where we should.  This needs a bit more thought than I have time to
give it right now.

(I will update on the status of this open item again no later than
Monday; probably sooner.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: PL/Pythonu - function ereport
Next
From: Robert Haas
Date:
Subject: Re: parallel.c is not marked as test covered