Thread: Why does VACUUM FULL bother locking pages?
Hackers, I'm reading the vacuum code and I just noticed that the routines move_plain_tuple and move_chain_tuple expect the dest and source blocks to be locked, and unlock them at exit. The only caller of both is repair_frag, whose only caller in turn is full_vacuum_rel. Same thing for update_hint_bits. So, the only callers of both has already acquired appropiate locks at the relation level -- nobody is going to be modifying the blocks while they proceed. So why bother locking the pages at all? Is there a reason or is this an historical accident? -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Puedes vivir solo una vez, pero si lo haces bien, una vez es suficiente"
Was it relcache related?
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
On 9/16/05, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Hackers,
I'm reading the vacuum code and I just noticed that the routines
move_plain_tuple and move_chain_tuple expect the dest and source blocks
to be locked, and unlock them at exit. The only caller of both is
repair_frag, whose only caller in turn is full_vacuum_rel. Same thing
for update_hint_bits.
So, the only callers of both has already acquired appropiate locks at
the relation level -- nobody is going to be modifying the blocks while
they proceed. So why bother locking the pages at all? Is there a
reason or is this an historical accident?
--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
"Puedes vivir solo una vez, pero si lo haces bien, una vez es suficiente"
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
On Fri, Sep 16, 2005 at 04:41:39PM -0400, Jonah H. Harris wrote: > Was it relcache related? I don't see how -- any user of a relcache entry needs to heap_open() or relation_open() the table and acquire an appropiate lock. Any such call would block because of the lock that VACUUM FULL acquires on the relation. > On 9/16/05, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > Hackers, > > > > I'm reading the vacuum code and I just noticed that the routines > > move_plain_tuple and move_chain_tuple expect the dest and source blocks > > to be locked, and unlock them at exit. The only caller of both is > > repair_frag, whose only caller in turn is full_vacuum_rel. Same thing > > for update_hint_bits. > > > > So, the only callers of both has already acquired appropiate locks at > > the relation level -- nobody is going to be modifying the blocks while > > they proceed. So why bother locking the pages at all? Is there a > > reason or is this an historical accident? -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Now I have my system running, not a byte was off the shelf; It rarely breaks and when it does I fix the code myself. It's stable, clean and elegant, and lightning fast as well, And it doesn't cost a nickel, so Bill Gates can go to hell."
I'm probably wrong, but I thought vacuum may invalidate stuff which semi-required the cache to be flushed. :) I'll go take a look through as-well but it's hard to imagine this being overlooked for so long.
Sorry Alvaro, I haven't gone out to look at vacuum in awhile so I ain't much help.
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
Sorry Alvaro, I haven't gone out to look at vacuum in awhile so I ain't much help.
On 9/16/05, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On Fri, Sep 16, 2005 at 04:41:39PM -0400, Jonah H. Harris wrote:
> Was it relcache related?
I don't see how -- any user of a relcache entry needs to heap_open() or
relation_open() the table and acquire an appropiate lock. Any such call
would block because of the lock that VACUUM FULL acquires on the relation.
> On 9/16/05, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > Hackers,
> >
> > I'm reading the vacuum code and I just noticed that the routines
> > move_plain_tuple and move_chain_tuple expect the dest and source blocks
> > to be locked, and unlock them at exit. The only caller of both is
> > repair_frag, whose only caller in turn is full_vacuum_rel. Same thing
> > for update_hint_bits.
> >
> > So, the only callers of both has already acquired appropiate locks at
> > the relation level -- nobody is going to be modifying the blocks while
> > they proceed. So why bother locking the pages at all? Is there a
> > reason or is this an historical accident?
--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
"Now I have my system running, not a byte was off the shelf;
It rarely breaks and when it does I fix the code myself.
It's stable, clean and elegant, and lightning fast as well,
And it doesn't cost a nickel, so Bill Gates can go to hell."
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
> Alvaro Herrera wrote > The only caller of both is > repair_frag, whose only caller in turn is full_vacuum_rel. ...bgwriter still needs to access blocks. The WAL system relies on the locking behaviour for recoverability, see comments in LockBuffer() and SyncOneBuffer(). ...I do think there's lots still to optimise in VACUUM FULL though... Best Regards, Simon Riggs
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > So, the only callers of both has already acquired appropiate locks at > the relation level -- nobody is going to be modifying the blocks while > they proceed. So why bother locking the pages at all? Is there a > reason or is this an historical accident? No, because operations such as checkpointing and bgwriter will feel free to write out pages that aren't exclusive-locked; they don't try to get a lock at the table level. Failing to lock the buffer would risk allowing an invalid page state to be written to disk --- which, if we then crashed before writing the WAL record for the vacuum operation, would represent unrecoverable corruption. regards, tom lane
On Fri, Sep 16, 2005 at 11:50:21PM -0700, Simon Riggs wrote: > > Alvaro Herrera wrote > > The only caller of both is > > repair_frag, whose only caller in turn is full_vacuum_rel. > > ...bgwriter still needs to access blocks. The WAL system relies on the > locking behaviour for recoverability, see comments in LockBuffer() and > SyncOneBuffer(). Oh, certainly! In this case, may I point out that scan_heap() does not bother locking pages, mentioning that "we assume that holding exclusive lock on the relation will keep other backends from looking at the page". In particular, it calls PageRepairFragmentation which runs with the page unlocked AFAICT. Seems like a bug to me. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Now I have my system running, not a byte was off the shelf; It rarely breaks and when it does I fix the code myself. It's stable, clean and elegant, and lightning fast as well, And it doesn't cost a nickel, so Bill Gates can go to hell."
On Thu, Sep 22, 2005 at 10:36:41AM -0400, Alvaro Herrera wrote: > On Fri, Sep 16, 2005 at 11:50:21PM -0700, Simon Riggs wrote: > > > Alvaro Herrera wrote > > > The only caller of both is > > > repair_frag, whose only caller in turn is full_vacuum_rel. > > > > ...bgwriter still needs to access blocks. The WAL system relies on the > > locking behaviour for recoverability, see comments in LockBuffer() and > > SyncOneBuffer(). > > Oh, certainly! In this case, may I point out that scan_heap() does not > bother locking pages, mentioning that "we assume that holding exclusive > lock on the relation will keep other backends from looking at the page". > In particular, it calls PageRepairFragmentation which runs with the page > unlocked AFAICT. Looking again, PageRepairFragmentation is called on a copy of the page, not on the page itself, so this is not a problem. The page is only modified to exchange old Xids for FrozenTransactionId, or to set some hint bits, so this really shouldn't be too much of a problem. I still think it would be better to lock the page beforehand. -- Alvaro Herrera Architect, http://www.EnterpriseDB.com "Uno puede defenderse de los ataques; contra los elogios se esta indefenso"
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Oh, certainly! In this case, may I point out that scan_heap() does not > bother locking pages, mentioning that "we assume that holding exclusive > lock on the relation will keep other backends from looking at the page". > In particular, it calls PageRepairFragmentation which runs with the page > unlocked AFAICT. > Seems like a bug to me. I agree --- and a pretty silly one considering that there are LockBuffer calls elsewhere in vacuum.c. Wonder how old that code is ... regards, tom lane
On Thu, 2005-09-22 at 10:36 -0400, Alvaro Herrera wrote: > Seems like a bug to me. Well done. This wins the award for best bug found during beta; shame it wasn't 8.0 beta! Just as well we recommend only doing VACUUM FULL when the system is quiet.... Best Regards, Simon Riggs
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Looking again, PageRepairFragmentation is called on a copy of the page, > not on the page itself, so this is not a problem. The page is only > modified to exchange old Xids for FrozenTransactionId, or to set some > hint bits, so this really shouldn't be too much of a problem. I still > think it would be better to lock the page beforehand. Actually, the case that's a bit worrisome is the PageIsNew path: it'd be possible for a partially-valid page header to be written out. This wouldn't result in data loss, exactly, since there's nothing on the page ... but we might have a problem using the page later. The FrozenTransactionId update case is already presumed to be atomic by vacuumlazy.c, so I don't feel too bad about it, but it surely needs a comment at least. On the whole it seems like we might as well just take the exclusive buffer lock and not try to be cute. AFAICT the other routines in vacuum.c all do proper locking when they are modifying pages, so it's just this one place that is taking a short cut. regards, tom lane