Thread: Potential vacuum bug?

Potential vacuum bug?

From
Tom Lane
Date:
While chasing the VACUUM problem reported by Stephen Birch, I noticed
something that looks like a potential trouble spot.  Vacuum's initial
scan routine, vc_scanheap, runs through the table to mark tuples as
known committed or known dead, if possible (consulting the transaction
log for tuples not yet so marked).  It does the right things as far as
marking committed/dead if it sees a tuple marked HEAP_MOVED_OFF or
HEAP_MOVED_IN, which could only be there if a prior VACUUM failed
partway through.  But it doesn't *clear* those bits.  Seems to me that
that will screw up the subsequent vc_rpfheap procedure --- in
particular, leaving a HEAP_MOVED_IN flag set will cause vc_rpfheap to
complain (correctly!) about 'HEAP_MOVED_IN not expected', whereas
leaving HEAP_MOVED_OFF set will confuse vc_rpfheap because it will
think it moved the tuple itself.

In short, if we really want to recover from a failed VACUUM then we'd
better clear those bits during vc_scanheap.  I am thinking that the
code starting at about line 720 ought to look like
           if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))           {               if
(tuple.t_data->t_infomask& HEAP_XMIN_INVALID)                   tupgone = true;               else if
(tuple.t_data->t_infomask& HEAP_MOVED_OFF)               {                   // mark tuple commited or invalid as
appropriate,                  // same as before
 
add >>>             tuple.t_data->t_infomask &= ~HEAP_MOVED_OFF;               }               else if
(tuple.t_data->t_infomask& HEAP_MOVED_IN)               {                   // mark tuple commited or invalid as
appropriate,                  // same as before
 
add >>>             tuple.t_data->t_infomask &= ~HEAP_MOVED_IN;               }               else               {
            // other cases same as before               }           }
 

add >>>     if (tuple.t_data->t_infomask & (HEAP_MOVED_OFF | HEAP_MOVED_IN))
add >>>     {
add >>>         elog(NOTICE, "Clearing unexpected HEAP_MOVED flag");
add >>>         tuple.t_data->t_infomask &= ~(HEAP_MOVED_OFF | HEAP_MOVED_IN);
add >>>         pgchanged = true;
add >>>     }

Comments anyone?
        regards, tom lane


RE: [HACKERS] Potential vacuum bug?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane
> 
> While chasing the VACUUM problem reported by Stephen Birch, I noticed
> something that looks like a potential trouble spot.  Vacuum's initial
> scan routine, vc_scanheap, runs through the table to mark tuples as
> known committed or known dead, if possible (consulting the transaction
> log for tuples not yet so marked).  It does the right things as far as
> marking committed/dead if it sees a tuple marked HEAP_MOVED_OFF or
> HEAP_MOVED_IN, which could only be there if a prior VACUUM failed
> partway through.  But it doesn't *clear* those bits.  Seems to me that
> that will screw up the subsequent vc_rpfheap procedure --- in
> particular, leaving a HEAP_MOVED_IN flag set will cause vc_rpfheap to
> complain (correctly!) about 'HEAP_MOVED_IN not expected', whereas
> leaving HEAP_MOVED_OFF set will confuse vc_rpfheap because it will
> think it moved the tuple itself.
>

I'm for your change.
Anyway it's not good to hold useless flags unnecessarily.

However I could hardly find the case that would cause a trouble.
It may occur in the following rare cases though I'm not sure.

HEAP_MOVED_OFF and (neither HEAP_XMIN_COMMITTED nor
HEAP_XMIN_INVALID) and the tuple was recently delete/updated.

This means that the previous VACUUM couldn't remove the tuple
because old transactions were running then and moreover the
VACUUM half successed(i.e aborted between internal commit and
external commit). Now VACUUM marks this tuple as tupgone once
but would turn it off later if old transctions are still running.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] Potential vacuum bug?

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> I'm for your change.
> However I could hardly find the case that would cause a trouble.
> It may occur in the following rare cases though I'm not sure.

> HEAP_MOVED_OFF and (neither HEAP_XMIN_COMMITTED nor
> HEAP_XMIN_INVALID) and the tuple was recently delete/updated.

I'm not sure if HEAP_MOVED_OFF is really dangerous, but I am sure
that HEAP_MOVED_IN is dangerous --- vc_rpfheap will error out if
it hits a tuple marked that way.  So, if a VACUUM fails partway
through vc_rpfheap (I guess this would have to happen after the
internal commit), it'd be possible that later VACUUMs wouldn't
work anymore.
        regards, tom lane


RE: [HACKERS] Potential vacuum bug?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > I'm for your change.
> > However I could hardly find the case that would cause a trouble.
> > It may occur in the following rare cases though I'm not sure.
> 
> > HEAP_MOVED_OFF and (neither HEAP_XMIN_COMMITTED nor
> > HEAP_XMIN_INVALID) and the tuple was recently delete/updated.
> 
> I'm not sure if HEAP_MOVED_OFF is really dangerous, but I am sure
> that HEAP_MOVED_IN is dangerous --- vc_rpfheap will error out if
> it hits a tuple marked that way.  So, if a VACUUM fails partway
> through vc_rpfheap (I guess this would have to happen after the
> internal commit), it'd be possible that later VACUUMs wouldn't
> work anymore.
>

IIRC,there's no HEAP_MOVED_INd and not HEAP_XMIN_COMMITTED
tuples when vc_rpfheap()  is called because such tuples has already
been marked unsued in vc_scanheap().

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp