Thread: Order of operations in lazy_vacuum_rel

Order of operations in lazy_vacuum_rel

From
Tom Lane
Date:
I see that lazy_vacuum_rel() truncates the heap before it does vacuuming
of the free space map.  Once upon a time this wouldn't have mattered,
but now it means that cancel interrupts are likely to be ignored for
the duration of FreeSpaceMapVacuum().  Is that really necessary?
Would it be okay to swap the two steps?
        regards, tom lane


Re: Order of operations in lazy_vacuum_rel

From
Alvaro Herrera
Date:
Tom Lane wrote:
> I see that lazy_vacuum_rel() truncates the heap before it does vacuuming
> of the free space map.  Once upon a time this wouldn't have mattered,
> but now it means that cancel interrupts are likely to be ignored for
> the duration of FreeSpaceMapVacuum().  Is that really necessary?
> Would it be okay to swap the two steps?

How would it matter?  Interrupts are not enabled until the transaction
is committed anyway, which must happen after both things have finished ..

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Order of operations in lazy_vacuum_rel

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> I see that lazy_vacuum_rel() truncates the heap before it does vacuuming
>> of the free space map.  Once upon a time this wouldn't have mattered,
>> but now it means that cancel interrupts are likely to be ignored for
>> the duration of FreeSpaceMapVacuum().  Is that really necessary?
>> Would it be okay to swap the two steps?

> How would it matter?  Interrupts are not enabled until the transaction
> is committed anyway, which must happen after both things have finished ..

The point would be to not disable interrupts till after doing the FSM
vacuuming.  Ignoring CANCEL for longer than we must is bad.

I'm also looking at not disabling the interrupt until lazy_truncate_heap
determines that it's actually going to do RelationTruncate.  The current
coding disables interrupts without any need in a large fraction of cases.
        regards, tom lane


Re: Order of operations in lazy_vacuum_rel

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> I see that lazy_vacuum_rel() truncates the heap before it does vacuuming
> >> of the free space map.  Once upon a time this wouldn't have mattered,
> >> but now it means that cancel interrupts are likely to be ignored for
> >> the duration of FreeSpaceMapVacuum().  Is that really necessary?
> >> Would it be okay to swap the two steps?
> 
> > How would it matter?  Interrupts are not enabled until the transaction
> > is committed anyway, which must happen after both things have finished ..
> 
> The point would be to not disable interrupts till after doing the FSM
> vacuuming.  Ignoring CANCEL for longer than we must is bad.

Oh, I see.  I guess the answer is that it depends on what happens if you
interrupt after vacuuming the FSM.  I have no idea what that is supposed
to accomplish so I'm of no help here.  FreeSpaceMapVacuum says it's
about fixing inconsistencies in the FSM, so I'm guessing that it's not
critical.

> I'm also looking at not disabling the interrupt until lazy_truncate_heap
> determines that it's actually going to do RelationTruncate.  The current
> coding disables interrupts without any need in a large fraction of cases.

Hmm, yeah ... that moves the code to the innards of lazy_truncate_heap.
Seems reasonable.

FWIW I notice that RelationTruncate contains an outdated comment at the
top about the 'fsm' function argument which is nowadays no longer an
argument.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Order of operations in lazy_vacuum_rel

From
Heikki Linnakangas
Date:
Alvaro Herrera wrote:
> Tom Lane wrote:
>> The point would be to not disable interrupts till after doing the FSM
>> vacuuming.  Ignoring CANCEL for longer than we must is bad.
> 
> Oh, I see.  I guess the answer is that it depends on what happens if you
> interrupt after vacuuming the FSM.  I have no idea what that is supposed
> to accomplish so I'm of no help here.  FreeSpaceMapVacuum says it's
> about fixing inconsistencies in the FSM, so I'm guessing that it's not
> critical.

Yeah, interrupting FreeSpaceMapVacuum (or right after it) is harmless.

> FWIW I notice that RelationTruncate contains an outdated comment at the
> top about the 'fsm' function argument which is nowadays no longer an
> argument.

Thanks, fixed.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Order of operations in lazy_vacuum_rel

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> The point would be to not disable interrupts till after doing the FSM
>> vacuuming.  Ignoring CANCEL for longer than we must is bad.

> Oh, I see.  I guess the answer is that it depends on what happens if you
> interrupt after vacuuming the FSM.  I have no idea what that is supposed
> to accomplish so I'm of no help here.  FreeSpaceMapVacuum says it's
> about fixing inconsistencies in the FSM, so I'm guessing that it's not
> critical.

Actually, after thinking about this some more, I realize that this code
has got a significantly bigger problem than just whether it will respond
to CANCEL promptly.  If we truncate the table, and then get an error
sometime before commit, the relcache inval message will not be sent,
leaving other backends at significant risk of strange errors due to
having rd_targblock pointing somewhere past the end of the table.
So we should reorder these operations just to reduce the risk window,
and I've done so.

It might be a good idea to develop a nontransactional signaling method
for causing other backends to reset rd_targblock --- perhaps we could
tie it to the smgr inval signaling that already happens on a truncate?
That would probably require moving rd_targblock down to the smgr
level, but offhand I see no reason that that'd be a bad thing to do.

I'm not going to panic about it right now, since the code has been like
this for a long time and we've had few if any complaints.  But it seems
like something to fix sometime.
        regards, tom lane


Re: Order of operations in lazy_vacuum_rel

From
Alvaro Herrera
Date:
Tom Lane wrote:

> Actually, after thinking about this some more, I realize that this code
> has got a significantly bigger problem than just whether it will respond
> to CANCEL promptly.  If we truncate the table, and then get an error
> sometime before commit, the relcache inval message will not be sent,
> leaving other backends at significant risk of strange errors due to
> having rd_targblock pointing somewhere past the end of the table.
> So we should reorder these operations just to reduce the risk window,
> and I've done so.

Err, that problem was exactly why I added the interrupt holdoff in
there, so if you've got a better/more invasive solution, it's very
welcome.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Order of operations in lazy_vacuum_rel

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> Actually, after thinking about this some more, I realize that this code
>> has got a significantly bigger problem than just whether it will respond
>> to CANCEL promptly.

> Err, that problem was exactly why I added the interrupt holdoff in
> there, so if you've got a better/more invasive solution, it's very
> welcome.

Well, that's a pretty incomplete solution :-(.  Maybe we should do
something about this.  There wasn't any obvious solution before,
but now that we have the nontransactional smgr-level sinval messages
being sent on drops and truncates, it seems like tying rd_targblock
clearing to those would fix the problem.  The easiest way to do that
would involve moving rd_targblock down to the SMgrRelation struct.
Probably rd_fsm_nblocks and rd_vm_nblocks too.  Comments?
        regards, tom lane


Re: Order of operations in lazy_vacuum_rel

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> Actually, after thinking about this some more, I realize that this code
> >> has got a significantly bigger problem than just whether it will respond
> >> to CANCEL promptly.
> 
> > Err, that problem was exactly why I added the interrupt holdoff in
> > there, so if you've got a better/more invasive solution, it's very
> > welcome.
> 
> Well, that's a pretty incomplete solution :-(.

Yeah, we were well aware of that :-)  It solved our problem (which was
related to interrupts from autovac)

> Maybe we should do
> something about this.  There wasn't any obvious solution before,
> but now that we have the nontransactional smgr-level sinval messages
> being sent on drops and truncates, it seems like tying rd_targblock
> clearing to those would fix the problem.

Hmm, sounds good, though I confess not having heard about
nontransactional sinval messages before.

> The easiest way to do that
> would involve moving rd_targblock down to the SMgrRelation struct.
> Probably rd_fsm_nblocks and rd_vm_nblocks too.  Comments?

Can't say it doesn't look like a modularity violation from here --
insertion target block doesn't really belong into smgr, does it?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Order of operations in lazy_vacuum_rel

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> Maybe we should do
>> something about this.  There wasn't any obvious solution before,
>> but now that we have the nontransactional smgr-level sinval messages
>> being sent on drops and truncates, it seems like tying rd_targblock
>> clearing to those would fix the problem.

> Hmm, sounds good, though I confess not having heard about
> nontransactional sinval messages before.

Hey, they've been in there almost a week ;-)
http://archives.postgresql.org/pgsql-committers/2010-02/msg00026.php

>> The easiest way to do that
>> would involve moving rd_targblock down to the SMgrRelation struct.
>> Probably rd_fsm_nblocks and rd_vm_nblocks too.  Comments?

> Can't say it doesn't look like a modularity violation from here --
> insertion target block doesn't really belong into smgr, does it?

It never belonged in relcache, either.  Trying to keep dynamic state
(not backed by a catalog entry) in the relcache has always been a
pretty klugy thing.  smgr at least has a reasonable excuse for trying
to keep track of physical truncation events, which is the thing we need
here.
        regards, tom lane