Thread: Potential vacuum bug?
While chasing the VACUUM problem reported by Stephen Birch, I noticed something that looks like a potential trouble spot. Vacuum's initial scan routine, vc_scanheap, runs through the table to mark tuples as known committed or known dead, if possible (consulting the transaction log for tuples not yet so marked). It does the right things as far as marking committed/dead if it sees a tuple marked HEAP_MOVED_OFF or HEAP_MOVED_IN, which could only be there if a prior VACUUM failed partway through. But it doesn't *clear* those bits. Seems to me that that will screw up the subsequent vc_rpfheap procedure --- in particular, leaving a HEAP_MOVED_IN flag set will cause vc_rpfheap to complain (correctly!) about 'HEAP_MOVED_IN not expected', whereas leaving HEAP_MOVED_OFF set will confuse vc_rpfheap because it will think it moved the tuple itself. In short, if we really want to recover from a failed VACUUM then we'd better clear those bits during vc_scanheap. I am thinking that the code starting at about line 720 ought to look like if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED)) { if (tuple.t_data->t_infomask& HEAP_XMIN_INVALID) tupgone = true; else if (tuple.t_data->t_infomask& HEAP_MOVED_OFF) { // mark tuple commited or invalid as appropriate, // same as before add >>> tuple.t_data->t_infomask &= ~HEAP_MOVED_OFF; } else if (tuple.t_data->t_infomask& HEAP_MOVED_IN) { // mark tuple commited or invalid as appropriate, // same as before add >>> tuple.t_data->t_infomask &= ~HEAP_MOVED_IN; } else { // other cases same as before } } add >>> if (tuple.t_data->t_infomask & (HEAP_MOVED_OFF | HEAP_MOVED_IN)) add >>> { add >>> elog(NOTICE, "Clearing unexpected HEAP_MOVED flag"); add >>> tuple.t_data->t_infomask &= ~(HEAP_MOVED_OFF | HEAP_MOVED_IN); add >>> pgchanged = true; add >>> } Comments anyone? regards, tom lane
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane > > While chasing the VACUUM problem reported by Stephen Birch, I noticed > something that looks like a potential trouble spot. Vacuum's initial > scan routine, vc_scanheap, runs through the table to mark tuples as > known committed or known dead, if possible (consulting the transaction > log for tuples not yet so marked). It does the right things as far as > marking committed/dead if it sees a tuple marked HEAP_MOVED_OFF or > HEAP_MOVED_IN, which could only be there if a prior VACUUM failed > partway through. But it doesn't *clear* those bits. Seems to me that > that will screw up the subsequent vc_rpfheap procedure --- in > particular, leaving a HEAP_MOVED_IN flag set will cause vc_rpfheap to > complain (correctly!) about 'HEAP_MOVED_IN not expected', whereas > leaving HEAP_MOVED_OFF set will confuse vc_rpfheap because it will > think it moved the tuple itself. > I'm for your change. Anyway it's not good to hold useless flags unnecessarily. However I could hardly find the case that would cause a trouble. It may occur in the following rare cases though I'm not sure. HEAP_MOVED_OFF and (neither HEAP_XMIN_COMMITTED nor HEAP_XMIN_INVALID) and the tuple was recently delete/updated. This means that the previous VACUUM couldn't remove the tuple because old transactions were running then and moreover the VACUUM half successed(i.e aborted between internal commit and external commit). Now VACUUM marks this tuple as tupgone once but would turn it off later if old transctions are still running. Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > I'm for your change. > However I could hardly find the case that would cause a trouble. > It may occur in the following rare cases though I'm not sure. > HEAP_MOVED_OFF and (neither HEAP_XMIN_COMMITTED nor > HEAP_XMIN_INVALID) and the tuple was recently delete/updated. I'm not sure if HEAP_MOVED_OFF is really dangerous, but I am sure that HEAP_MOVED_IN is dangerous --- vc_rpfheap will error out if it hits a tuple marked that way. So, if a VACUUM fails partway through vc_rpfheap (I guess this would have to happen after the internal commit), it'd be possible that later VACUUMs wouldn't work anymore. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > I'm for your change. > > However I could hardly find the case that would cause a trouble. > > It may occur in the following rare cases though I'm not sure. > > > HEAP_MOVED_OFF and (neither HEAP_XMIN_COMMITTED nor > > HEAP_XMIN_INVALID) and the tuple was recently delete/updated. > > I'm not sure if HEAP_MOVED_OFF is really dangerous, but I am sure > that HEAP_MOVED_IN is dangerous --- vc_rpfheap will error out if > it hits a tuple marked that way. So, if a VACUUM fails partway > through vc_rpfheap (I guess this would have to happen after the > internal commit), it'd be possible that later VACUUMs wouldn't > work anymore. > IIRC,there's no HEAP_MOVED_INd and not HEAP_XMIN_COMMITTED tuples when vc_rpfheap() is called because such tuples has already been marked unsued in vc_scanheap(). Regards. Hiroshi Inoue Inoue@tpf.co.jp