Re: Heap truncation without AccessExclusiveLock (9.4) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Heap truncation without AccessExclusiveLock (9.4)
Date
Msg-id 5195DE69.7010301@vmware.com
Whole thread Raw
In response to Re: Heap truncation without AccessExclusiveLock (9.4)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Heap truncation without AccessExclusiveLock (9.4)
List pgsql-hackers
On 16.05.2013 00:18, Robert Haas wrote:
> On Wed, May 15, 2013 at 11:35 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com>  wrote:
>> Shared memory space is limited, but we only need the watermarks for any
>> in-progress truncations. Let's keep them in shared memory, in a small
>> fixed-size array. That limits the number of concurrent truncations that can
>> be in-progress, but that should be ok.
>
> Would it only limit the number of concurrent transactions that can be
> in progress *due to vacuum*?  Or would it limit the total number of
> TOTAL concurrent truncations?  Because a table could have arbitrarily
> many inheritance children, and you might try to truncate the whole
> thing at once...

It would only limit the number of concurrent *truncations*. Vacuums in 
general would not count, only vacuums at the end of the vacuum process, 
trying to truncate the heap.

>> To not slow down common backend
>> operations, the values (or lack thereof) are cached in relcache. To sync the
>> relcache when the values change, there will be a new shared cache
>> invalidation event to force backends to refresh the cached watermark values.
>> A backend (vacuum) can ensure that all backends see the new value by first
>> updating the value in shared memory, sending the sinval message, and waiting
>> until everyone has received it.
>
> AFAIK, the sinval mechanism isn't really well-designed to ensure that
> these kinds of notifications arrive in a timely fashion.  There's no
> particular bound on how long you might have to wait.  Pretty much all
> inner loops have CHECK_FOR_INTERRUPTS(), but they definitely do not
> all have AcceptInvalidationMessages(), nor would that be safe or
> practical.  The sinval code sends catchup interrupts, but only for the
> purpose of preventing sinval overflow, not for timely receipt.

Currently, vacuum will have to wait for all transactions that have 
touched the relation to finish, to get the AccessExclusiveLock. If we 
don't change anything in the sinval mechanism, the wait would be similar 
- until all currently in-progress transactions have finished. It's not 
quite the same; you'd have to wait for all in-progress transactions to 
finish, not only those that have actually touched the relation. But on 
the plus-side, you would not block new transactions from accessing the 
relation, so it's not too bad if it takes a long time.

If we could use the catchup interrupts to speed that up though, that 
would be much better. I think vacuum could simply send a catchup 
interrupt, and wait until everyone has caught up. That would 
significantly increase the traffic of sinval queue and catchup 
interrupts, compared to what it is today, but I think it would still be 
ok. It would still only be a few sinval messages and catchup interrupts 
per truncation (ie. per vacuum).

> Another problem is that sinval resets are bad for performance, and
> anything we do that pushes more messages through sinval will increase
> the frequency of resets.  Now if those are operations are things that
> are relatively uncommon, it's not worth worrying about - but if it's
> something that happens on every relation extension, I think that's
> likely to cause problems.

It would not be on every relation extension, only on truncation.

>> With the watermarks, truncation works like this:
>>
>> 1. Set soft watermark to the point where we think we can truncate the
>> relation. Wait until everyone sees it (send sinval message, wait).
>
> I'm also concerned about how you plan to synchronize access to this
> shared memory arena.

I was thinking of a simple lwlock, or perhaps one lwlock per slot in the 
arena. It would not be accessed very frequently, because the watermark 
values would be cached in the relcache. It would only need to be 
accessed when:

1. Truncating the relation, by vacuum, to set the watermark values
2. By backends, to update the relcache, when it receives the sinval 
message sent by vacuum.
3. By backends, when writing above the cached watermark value. IOW, when 
extending a relation that's being truncated at the same time.

In particular, it would definitely not be accessed every time a backend 
currently needs to do an lseek. Nor everytime a backend needs to extend 
a relation.

- Heikki



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Extent Locks
Next
From: Heikki Linnakangas
Date:
Subject: Re: Heap truncation without AccessExclusiveLock (9.4)