Thread: heap vacuum & cleanup locks

heap vacuum & cleanup locks

From
Robert Haas
Date:
We've occasionally seen problems with VACUUM getting stuck for failure
to acquire a cleanup lock due to, for example, a cursor holding a pin
on the buffer page.  In the worst case, this can cause an undetected
deadlock, if the backend holding the buffer pin blocks trying to
acquire a heavyweight lock that is in turn blocked by VACUUM.  A while
back, someone (Greg Stark? me?) floated the idea of not waiting for
the cleanup lock.  If we can't get it immediately, or within some
short period of time, then we just skip the page and continue on.

Today I had what might be a better idea: don't try to acquire a
cleanup lock at all.  Instead, acquire an exclusive lock.  After
having done so, observe the pin count.  If there are no other buffer
pins, that means our exclusive lock is actually a cleanup lock, and we
proceed as now.  If other buffer pins do exist, then we can't
defragment the page, but that doesn't mean no useful work can be done:
we can still mark used line pointers dead, or dead line pointers
unused.  We cannot defragment, but that can be done either by the next
VACUUM or by a HOT cleanup.  We can even arrange - using existing
mechanism - to leave behind a hint that the page is a good candidate
for a HOT cleanup, by setting pd_prune_xid to, say, FrozenXID.

Like the idea of skipping pages on which we can't acquire a cleanup
lock altogether, this should prevent VACUUM from getting stuck trying
to lock a heap page.  While buffer pins can be held for extended
periods of time, I don't think there is any operation that holds a
buffer content lock more than very briefly.  Furthermore, unlike the
idea of skipping the page altogether, we could use this approach even
during an anti-wraparound vacuum.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Itagaki Takahiro
Date:
On Sun, Jun 5, 2011 at 12:03, Robert Haas <robertmhaas@gmail.com> wrote:
> If other buffer pins do exist, then we can't
> defragment the page, but that doesn't mean no useful work can be done:
> we can still mark used line pointers dead, or dead line pointers
> unused.  We cannot defragment, but that can be done either by the next
> VACUUM or by a HOT cleanup.

This is just an idea -- Is it possible to have copy-on-write techniques?
VACUUM allocates a duplicated page for the pinned page, and copy valid
tuples into the new page. Following buffer readers after the VACUUM will
see the cloned page instead of the old pinned one.

Of course, copy-on-writing is more complex than skipping pinned pages,
but I wonder we cannot vacuum at all in some edge cases with the
skipping method.

--
Itagaki Takahiro


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> On Sun, Jun 5, 2011 at 12:03, Robert Haas <robertmhaas@gmail.com> wrote:
>> If other buffer pins do exist, then we can't
>> defragment the page, but that doesn't mean no useful work can be done:
>> we can still mark used line pointers dead, or dead line pointers
>> unused.  We cannot defragment, but that can be done either by the next
>> VACUUM or by a HOT cleanup.
>
> This is just an idea -- Is it possible to have copy-on-write techniques?
> VACUUM allocates a duplicated page for the pinned page, and copy valid
> tuples into the new page. Following buffer readers after the VACUUM will
> see the cloned page instead of the old pinned one.

Heikki suggested the same thing, and it's not a bad idea, but I think
it would be more work to implement than what I proposed.  The caller
would need to be aware that, if it tries to re-acquire a content lock
on the same page, the offset of the tuple within the page might
change.  I'm not sure how much work would be required to cope with
that possibility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Jim Nasby
Date:
On Jun 6, 2011, at 1:00 AM, Robert Haas wrote:
> On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro
> <itagaki.takahiro@gmail.com> wrote:
>> On Sun, Jun 5, 2011 at 12:03, Robert Haas <robertmhaas@gmail.com> wrote:
>>> If other buffer pins do exist, then we can't
>>> defragment the page, but that doesn't mean no useful work can be done:
>>> we can still mark used line pointers dead, or dead line pointers
>>> unused.  We cannot defragment, but that can be done either by the next
>>> VACUUM or by a HOT cleanup.
>>
>> This is just an idea -- Is it possible to have copy-on-write techniques?
>> VACUUM allocates a duplicated page for the pinned page, and copy valid
>> tuples into the new page. Following buffer readers after the VACUUM will
>> see the cloned page instead of the old pinned one.
>
> Heikki suggested the same thing, and it's not a bad idea, but I think
> it would be more work to implement than what I proposed.  The caller
> would need to be aware that, if it tries to re-acquire a content lock
> on the same page, the offset of the tuple within the page might
> change.  I'm not sure how much work would be required to cope with
> that possibility.

I've had a related idea that I haven't looked into... if you're scanning a relation (ie: index scan, seq scan) I've
wonderedif it would be more efficient to deal with the entire page at once, possibly be making a copy of it. This would
reducethe number of times you pin the page (often quite dramatically). I realize that means copying the entire page,
butI suspect that would occur entirely in the L1 cache, which would be fast. 

So perhaps instead of copy on write we should try for copy on read on all appropriate plan nodes.

On a related note, I've also wondered if it would be useful to allow nodes to deal with more than one tuple at a time;
theidea being that it's better to execute a smaller chunk of code over a bigger chunk of data instead of dribbling
tuplesthrough an entire execution tree one at a time. Perhaps that will only be useful if nodes are executing in
parallel...
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




Re: heap vacuum & cleanup locks

From
Pavan Deolasee
Date:
On Sun, Jun 5, 2011 at 8:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> We've occasionally seen problems with VACUUM getting stuck for failure
> to acquire a cleanup lock due to, for example, a cursor holding a pin
> on the buffer page.  In the worst case, this can cause an undetected
> deadlock, if the backend holding the buffer pin blocks trying to
> acquire a heavyweight lock that is in turn blocked by VACUUM.  A while
> back, someone (Greg Stark? me?) floated the idea of not waiting for
> the cleanup lock.  If we can't get it immediately, or within some
> short period of time, then we just skip the page and continue on.
>

Do we know if this is really a problem though ? The deadlock for
example, can happen only when a backend tries to get a table level
conflicting lock while holding the buffer pin and I am not sure if we
do that.

The contention issue would probably make sense for small tables
because for large to very large tables, the probability that a backend
and vacuum would process the same page would be quite low. With the
current default for vac_threshold, the small tables can get vacuumed
very frequently and if they are also heavily accessed, the cleanup
lock can become a bottleneck.

Another issue that might be worth paying attention to is the single
pass vacuum that I am currently working on. The design that we agreed
up on, assumes that the index vacuum must clear index pointers to all
the dead line pointers. If we skip any page, we must at least collect
the existing dead line pointers and remove those index pointers. If we
create dead line pointers and we want to vacuum them later, we store
the LSN in the page and that may require defrag. Of course, we can
work around that, but I think it will be useful if we do some tests to
show that the cleanup lock is indeed a major bottleneck.

Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com


Re: heap vacuum & cleanup locks

From
Heikki Linnakangas
Date:
On 06.06.2011 09:35, Jim Nasby wrote:
> I've had a related idea that I haven't looked into... if you're scanning a relation (ie: index scan, seq scan) I've
wonderedif it would be more efficient to deal with the entire page at once, possibly be making a copy of it. This would
reducethe number of times you pin the page (often quite dramatically). I realize that means copying the entire page,
butI suspect that would occur entirely in the L1 cache, which would be fast.
 

We already do that. When an index scan moves to an index page, the heap 
tid pointers of all the matching index tuples are copied to 
backend-private memory in one go, and the lock is released. And for a 
seqscan, the visibility of all the tuples on the page is checked in one 
go while holding the lock, then the lock is released but the pin is 
kept. The pin is only released after all the tuples have been read. 
There's no repeated pin-unpin for each tuple.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Sun, Jun 5, 2011 at 4:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> We've occasionally seen problems with VACUUM getting stuck for failure
> to acquire a cleanup lock due to, for example, a cursor holding a pin
> on the buffer page.  In the worst case, this can cause an undetected
> deadlock, if the backend holding the buffer pin blocks trying to
> acquire a heavyweight lock that is in turn blocked by VACUUM.  A while
> back, someone (Greg Stark? me?) floated the idea of not waiting for
> the cleanup lock.  If we can't get it immediately, or within some
> short period of time, then we just skip the page and continue on.
>
> Today I had what might be a better idea: don't try to acquire a
> cleanup lock at all.  Instead, acquire an exclusive lock.  After
> having done so, observe the pin count.  If there are no other buffer
> pins, that means our exclusive lock is actually a cleanup lock, and we
> proceed as now.  If other buffer pins do exist, then we can't
> defragment the page, but that doesn't mean no useful work can be done:
> we can still mark used line pointers dead, or dead line pointers
> unused.  We cannot defragment, but that can be done either by the next
> VACUUM or by a HOT cleanup.  We can even arrange - using existing
> mechanism - to leave behind a hint that the page is a good candidate
> for a HOT cleanup, by setting pd_prune_xid to, say, FrozenXID.
>
> Like the idea of skipping pages on which we can't acquire a cleanup
> lock altogether, this should prevent VACUUM from getting stuck trying
> to lock a heap page.  While buffer pins can be held for extended
> periods of time, I don't think there is any operation that holds a
> buffer content lock more than very briefly.  Furthermore, unlike the
> idea of skipping the page altogether, we could use this approach even
> during an anti-wraparound vacuum.
>
> Thoughts?


Not waiting seems like a good idea.

Not returning to the block while it is in RAM or not cleaning the
block at all would cause a different performance issues, which I would
wish to avoid.

Hot Standby has specific code to avoid this situation. Perhaps you
could copy that, not sure.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Mon, Jun 6, 2011 at 2:36 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> Do we know if this is really a problem though ? The deadlock for
> example, can happen only when a backend tries to get a table level
> conflicting lock while holding the buffer pin and I am not sure if we
> do that.

The deadlock isn't terribly common, because, as you say, you need the
process holding the buffer pin to try to take a lock on the relation
being vacuumed that is strong enough to conflict with
ShareUpdateExclusiveLock.  That's a slightly unusual thing to do.

But the problem of vacuum stalling out because it can't get the
cleanup lock is a very real one.  I've seen at least one customer hit
this in production, and it was pretty painful.  Now, granted, you need
some bad application design, too: you have to leave a cursor lying
around instead of running it to completion and then stopping.  But
supposing you do make that mistake, you might hope that it wouldn't
cause VACUUM starvation, which is what happens today.  IOW, I'm less
worried about whether the cleanup lock is slowing vacuum down than I
am about eliminating the pathological cases where an autovacuum
workers gets pinned down, stuck waiting for a cleanup lock that never
arrives.  Now the table doesn't get vacuumed (bad) and the system as a
whole is one AV worker short of what it's supposed to have (also bad).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of lun jun 06 08:06:10 -0400 2011:

> But the problem of vacuum stalling out because it can't get the
> cleanup lock is a very real one.  I've seen at least one customer hit
> this in production, and it was pretty painful.  Now, granted, you need
> some bad application design, too: you have to leave a cursor lying
> around instead of running it to completion and then stopping.  But
> supposing you do make that mistake, you might hope that it wouldn't
> cause VACUUM starvation, which is what happens today.  IOW, I'm less
> worried about whether the cleanup lock is slowing vacuum down than I
> am about eliminating the pathological cases where an autovacuum
> workers gets pinned down, stuck waiting for a cleanup lock that never
> arrives.  Now the table doesn't get vacuumed (bad) and the system as a
> whole is one AV worker short of what it's supposed to have (also bad).

One of the good things about your proposal is that (AFAICS) you can
freeze tuples without the cleanup lock, so the antiwraparound cleanup
would still work.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Mon, Jun 6, 2011 at 12:49 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of lun jun 06 08:06:10 -0400 2011:
>
>> But the problem of vacuum stalling out because it can't get the
>> cleanup lock is a very real one.  I've seen at least one customer hit
>> this in production, and it was pretty painful.  Now, granted, you need
>> some bad application design, too: you have to leave a cursor lying
>> around instead of running it to completion and then stopping.  But
>> supposing you do make that mistake, you might hope that it wouldn't
>> cause VACUUM starvation, which is what happens today.  IOW, I'm less
>> worried about whether the cleanup lock is slowing vacuum down than I
>> am about eliminating the pathological cases where an autovacuum
>> workers gets pinned down, stuck waiting for a cleanup lock that never
>> arrives.  Now the table doesn't get vacuumed (bad) and the system as a
>> whole is one AV worker short of what it's supposed to have (also bad).
>
> One of the good things about your proposal is that (AFAICS) you can
> freeze tuples without the cleanup lock, so the antiwraparound cleanup
> would still work.

Yeah, I think that's a major selling point.  VACUUM getting stuck is
Bad.  Anti-wraparound VACUUM getting stuck is Really Bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Mon, Jun 6, 2011 at 8:02 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 06.06.2011 09:35, Jim Nasby wrote:
>>
>> I've had a related idea that I haven't looked into... if you're scanning a
>> relation (ie: index scan, seq scan) I've wondered if it would be more
>> efficient to deal with the entire page at once, possibly be making a copy of
>> it. This would reduce the number of times you pin the page (often quite
>> dramatically). I realize that means copying the entire page, but I suspect
>> that would occur entirely in the L1 cache, which would be fast.
>
> We already do that. When an index scan moves to an index page, the heap tid
> pointers of all the matching index tuples are copied to backend-private
> memory in one go, and the lock is released. And for a seqscan, the
> visibility of all the tuples on the page is checked in one go while holding
> the lock, then the lock is released but the pin is kept. The pin is only
> released after all the tuples have been read. There's no repeated pin-unpin
> for each tuple.

But I think you've hit the important point here. The problem is not
whether VACUUM waits for the pin, its that the pins can be held for
extended periods.

It makes more sense to try to limit pin hold times than it does to
come up with pin avoidance techniques.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Greg Stark
Date:
On Mon, Jun 6, 2011 at 11:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> But I think you've hit the important point here. The problem is not
> whether VACUUM waits for the pin, its that the pins can be held for
> extended periods.

Yes

> It makes more sense to try to limit pin hold times than it does to
> come up with pin avoidance techniques.

Well it's super-exclusive-vacuum-lock avoidance techniques. Why
shouldn't it make more sense to try to reduce the frequency and impact
of the single-purpose outlier in a non-critical-path instead of
burdening every other data reader with extra overhead?

I think Robert's plan is exactly right though I would phrase it
differently. We should get the exclusive lock, freeze/kill any xids
and line pointers, then if the pin-count is 1 do the compaction.

I'm really wishing we had more bits in the vm. It looks like we could use:- contains not-all-visible tuples- contains
not-frozenxids- in need of compaction
 

I'm sure we could find a use for one more page-level vm bit too.



-- 
greg


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Tue, Jun 7, 2011 at 8:24 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Mon, Jun 6, 2011 at 11:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> But I think you've hit the important point here. The problem is not
>> whether VACUUM waits for the pin, its that the pins can be held for
>> extended periods.
>
> Yes
>
>> It makes more sense to try to limit pin hold times than it does to
>> come up with pin avoidance techniques.
>
> Well it's super-exclusive-vacuum-lock avoidance techniques. Why
> shouldn't it make more sense to try to reduce the frequency and impact
> of the single-purpose outlier in a non-critical-path instead of
> burdening every other data reader with extra overhead?
>
> I think Robert's plan is exactly right though I would phrase it
> differently. We should get the exclusive lock, freeze/kill any xids
> and line pointers, then if the pin-count is 1 do the compaction.

Would that also be possible during recovery?

A similar problem exists with Hot Standby, so I'm worried fixing just
VACUUMs would be a kluge.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Tue, Jun 7, 2011 at 3:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jun 7, 2011 at 8:24 PM, Greg Stark <gsstark@mit.edu> wrote:
>> On Mon, Jun 6, 2011 at 11:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> But I think you've hit the important point here. The problem is not
>>> whether VACUUM waits for the pin, its that the pins can be held for
>>> extended periods.
>>
>> Yes
>>
>>> It makes more sense to try to limit pin hold times than it does to
>>> come up with pin avoidance techniques.
>>
>> Well it's super-exclusive-vacuum-lock avoidance techniques. Why
>> shouldn't it make more sense to try to reduce the frequency and impact
>> of the single-purpose outlier in a non-critical-path instead of
>> burdening every other data reader with extra overhead?
>>
>> I think Robert's plan is exactly right though I would phrase it
>> differently. We should get the exclusive lock, freeze/kill any xids
>> and line pointers, then if the pin-count is 1 do the compaction.
>
> Would that also be possible during recovery?
>
> A similar problem exists with Hot Standby, so I'm worried fixing just
> VACUUMs would be a kluge.

We have to do the same operation on both the master and standby, so if
the master decides to skip the compaction then the slave will skip it
as well (and need not worry about waiting for pin-count 1).  But if
the master does the compaction then the slave will have to get a
matching cleanup lock, just as now.

Your idea of somehow adjusting things so that we don't hold the buffer
pin for a long period of time would be better in that regard, but I'm
not sure how to do it.  Presumably we could rejigger things to copy
the tuples instead of holding a pin, but that would carry a
performance penalty for the (very common) case where there is no
conflict with VACUUM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Tue, Jun 7, 2011 at 3:24 PM, Greg Stark <gsstark@mit.edu> wrote:
> Well it's super-exclusive-vacuum-lock avoidance techniques. Why
> shouldn't it make more sense to try to reduce the frequency and impact
> of the single-purpose outlier in a non-critical-path instead of
> burdening every other data reader with extra overhead?
>
> I think Robert's plan is exactly right though I would phrase it
> differently. We should get the exclusive lock, freeze/kill any xids
> and line pointers, then if the pin-count is 1 do the compaction.

I wrote a really neat patch to do this today...  and then, as I
thought about it some more, I started to think that it's probably
unsafe.  Here's the trouble: with this approach, we assume that it's
OK to change the contents of the line pointer while holding only an
exclusive lock on the buffer.  But there is a good deal of code out
there that thinks it's OK to examine a line pointer with only a pin on
the buffer (no lock).  You need a content lock to scan the item
pointer array, but once you've identified an item of interest, you're
entitled to assume that it won't be modified while you hold a buffer
pin.  Now, if you've identified a particular tuple as being visible to
your scan, then you might think that VACUUM shouldn't be removing it
anyway.  But I think that's only true for MVCC scans - for example,
what happens under SnapshotNow semantics?  But then then on third
thought, if you've also got an MVCC snapshot taken before the start of
the SnapshotNow scan, you are probably OK, because your advertised
xmin should prevent anyone from removing anything anyway, so how do
you actually provoke a failure?

Anyway, I'm attaching the patch, in case anyone has any ideas on where
to go with this.

> I'm really wishing we had more bits in the vm. It looks like we could use:
>  - contains not-all-visible tuples
>  - contains not-frozen xids
>  - in need of compaction
>
> I'm sure we could find a use for one more page-level vm bit too.

We've got plenty of room for more page-level bits, if we need them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Sun, Jun 5, 2011 at 4:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> We've occasionally seen problems with VACUUM getting stuck for failure
> to acquire a cleanup lock due to, for example, a cursor holding a pin
> on the buffer page.  In the worst case, this can cause an undetected
> deadlock, if the backend holding the buffer pin blocks trying to
> acquire a heavyweight lock that is in turn blocked by VACUUM.

Those deadlocks can be detected in exactly the same way as is used for
Hot Standby.

Cleanup waiter registers interest in pin, anyone with a lock request
that must wait checks to see if they hold a pin that would cause
deadlock.

I'll look at doing a patch for that. Shouldn't take long.

> A while
> back, someone (Greg Stark? me?) floated the idea of not waiting for
> the cleanup lock.  If we can't get it immediately, or within some
> short period of time, then we just skip the page and continue on.

Separately, that sounds like a great idea and it's simple to implement
- patch attached.

Enhancements to that are that I don't see any particular reason why
the heap pages need to be vacuumed in exactly sequential order. If
they are on disk, reading sequentially is useful, in which case nobody
has a pin and so we will continue. But if the blocks are already in
shared_buffers, then the sequential order doesn't matter at all. So we
could skip pages and then return to them later on.

Also, ISTM that LockBufferForCleanup() waits for just a single buffer,
but it could equally well wait for multiple buffers at the same time.
By this, we would then be able to simply register our interest in
multiple buffers and get woken as soon as one of them were free. That
way we could read the blocks sequentially, but lock and clean them out
of sequence if necessary. Do this in chunks, so it plays nicely with
buffer strategy. (Patch doesn't do that yet).

(Not sure if the patch handles vacuum map correctly if we skip the
page, but its a reasonable prototype for discussion).

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Thu, Nov 3, 2011 at 7:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> A while
>> back, someone (Greg Stark? me?) floated the idea of not waiting for
>> the cleanup lock.  If we can't get it immediately, or within some
>> short period of time, then we just skip the page and continue on.
>
> Separately, that sounds like a great idea and it's simple to implement
> - patch attached.

Oh, that's kind of clever.  I was thinking that you'd have to disable
this entirely for anti-wraparound vacuum, but the way you've done it
avoids that.  You'll still have to wait if there's a competing pin on
a buffer that contains tuples actually in need of freezing, but that
should be relatively rare.

> Enhancements to that are that I don't see any particular reason why
> Also, ISTM that LockBufferForCleanup() waits for just a single buffer,
> but it could equally well wait for multiple buffers at the same time.
> By this, we would then be able to simply register our interest in
> multiple buffers and get woken as soon as one of them were free. That
> way we could read the blocks sequentially, but lock and clean them out
> of sequence if necessary. Do this in chunks, so it plays nicely with
> buffer strategy. (Patch doesn't do that yet).

I doubt this would help much.  The real issue is with open cursors,
and those can easily be left open for long enough that those
optimizations won't help.  I think the patch as it stands is probably
gets just about all of the benefit that can be had from this approach
while still being reasonably simple.

> (Not sure if the patch handles vacuum map correctly if we skip the
> page, but its a reasonable prototype for discussion).

Yeah.  I think that should be OK, but:

- It looks to me like you haven't done anything about the second heap
pass.  That should probably get a similar fix.
- I think that this is going to screw up the reltuples calculation
unless we fudge it somehow.  The number of scanned pages has already
been incremented by the time your new code is reached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> I think that should be OK, but:
>
> - It looks to me like you haven't done anything about the second heap
> pass.  That should probably get a similar fix.

I was assuming this worked with Pavan's patch to remove second pass.

Not in any rush to commit this, so will wait till that is thru.

> - I think that this is going to screw up the reltuples calculation
> unless we fudge it somehow.  The number of scanned pages has already
> been incremented by the time your new code is reached.

Yeh, I'll have a look at that in more detail. Thanks for the review.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Thu, Nov 3, 2011 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I think that should be OK, but:
>>
>> - It looks to me like you haven't done anything about the second heap
>> pass.  That should probably get a similar fix.
>
> I was assuming this worked with Pavan's patch to remove second pass.

It's not entirely certain that will make it into 9.2, so I would
rather get this done first.  If you want I can pick up what you've
done and send you back a version that addresses this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Thu, Nov 3, 2011 at 2:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 3, 2011 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>> I think that should be OK, but:
>>>
>>> - It looks to me like you haven't done anything about the second heap
>>> pass.  That should probably get a similar fix.
>>
>> I was assuming this worked with Pavan's patch to remove second pass.
>
> It's not entirely certain that will make it into 9.2, so I would
> rather get this done first.  If you want I can pick up what you've
> done and send you back a version that addresses this.

OK, that seems efficient. Thanks.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Thu, Nov 3, 2011 at 12:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Thu, Nov 3, 2011 at 2:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Nov 3, 2011 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>
>>>> I think that should be OK, but:
>>>>
>>>> - It looks to me like you haven't done anything about the second heap
>>>> pass.  That should probably get a similar fix.
>>>
>>> I was assuming this worked with Pavan's patch to remove second pass.
>>
>> It's not entirely certain that will make it into 9.2, so I would
>> rather get this done first.  If you want I can pick up what you've
>> done and send you back a version that addresses this.
>
> OK, that seems efficient. Thanks.

Here's a new version.  I fixed the second pass as discussed (which
turned out to be trivial).  To address the concern about relpages, I
moved this pre-existing line to after we get the buffer lock:

+               vacrelstats->scanned_pages++;

That appears to do the right thing.

I found it kind of confusing that lazy_scan_page_for_wraparound()
returns with the pin either held or not held depending on the return
value, so I rearranged things very slightly so that it doesn't need to
do that.  I'm wondering whether we should rename that function to
something like lazy_check_needs_freeze().

I tested this out and discovered that "VACUUM FREEZE" doesn't set the
for_wraparound flag.  On further review, I think that we should
probably conditionalize the behavior on the scan_all flag that
lazy_vacuum_rel sets, rather than for_wraparound.  Otherwise, there's
no way for the user to manually force relfrozenxid to advance, which
doesn't seem good.  I haven't made that change in this version,
though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Fri, Nov 4, 2011 at 6:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> Here's a new version.  I fixed the second pass as discussed (which
> turned out to be trivial).  To address the concern about relpages, I
> moved this pre-existing line to after we get the buffer lock:
>
> +               vacrelstats->scanned_pages++;
>
> That appears to do the right thing.

I think we need to count skipped pages also. I don't like the idea
that vacuum would just report less pages than there are in the table.
We'll just get requests to explain that.

> I found it kind of confusing that lazy_scan_page_for_wraparound()
> returns with the pin either held or not held depending on the return
> value, so I rearranged things very slightly so that it doesn't need to
> do that.  I'm wondering whether we should rename that function to
> something like lazy_check_needs_freeze().

OK

> I tested this out and discovered that "VACUUM FREEZE" doesn't set the
> for_wraparound flag.  On further review, I think that we should
> probably conditionalize the behavior on the scan_all flag that
> lazy_vacuum_rel sets, rather than for_wraparound.  Otherwise, there's
> no way for the user to manually force relfrozenxid to advance, which
> doesn't seem good.  I haven't made that change in this version,
> though.

Agreed, separate patch.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Fri, Nov 4, 2011 at 3:12 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, Nov 4, 2011 at 6:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> Here's a new version.  I fixed the second pass as discussed (which
>> turned out to be trivial).  To address the concern about relpages, I
>> moved this pre-existing line to after we get the buffer lock:
>>
>> +               vacrelstats->scanned_pages++;
>>
>> That appears to do the right thing.
>
> I think we need to count skipped pages also. I don't like the idea
> that vacuum would just report less pages than there are in the table.
> We'll just get requests to explain that.

It's been skipping pages due to the visibility map since 8.4.  This
seems like small potatoes by comparison, but we could add some
counters if you like.

>> I found it kind of confusing that lazy_scan_page_for_wraparound()
>> returns with the pin either held or not held depending on the return
>> value, so I rearranged things very slightly so that it doesn't need to
>> do that.  I'm wondering whether we should rename that function to
>> something like lazy_check_needs_freeze().
>
> OK
>
>> I tested this out and discovered that "VACUUM FREEZE" doesn't set the
>> for_wraparound flag.  On further review, I think that we should
>> probably conditionalize the behavior on the scan_all flag that
>> lazy_vacuum_rel sets, rather than for_wraparound.  Otherwise, there's
>> no way for the user to manually force relfrozenxid to advance, which
>> doesn't seem good.  I haven't made that change in this version,
>> though.
>
> Agreed, separate patch.

Doing that actually makes the patch simpler, so I went ahead and did
that in the attached version, along with the renaming mentioned above.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Fri, Nov 4, 2011 at 3:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Doing that actually makes the patch simpler, so I went ahead and did
> that in the attached version, along with the renaming mentioned above.

Hearing no objections, I went ahead and committed this version.

Thanks for coding this up; I think this is a very useful improvement.

It would still be nice to fix the case where we need to freeze a tuple
that is on a page someone else has pinned, but I don't have any good
ideas for how to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Tue, Nov 8, 2011 at 2:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> It would still be nice to fix the case where we need to freeze a tuple
> that is on a page someone else has pinned, but I don't have any good
> ideas for how to do that.

I think we need to avoid long pin hold times generally.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Tue, Nov 8, 2011 at 2:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Nov 8, 2011 at 2:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It would still be nice to fix the case where we need to freeze a tuple
>> that is on a page someone else has pinned, but I don't have any good
>> ideas for how to do that.
>
> I think we need to avoid long pin hold times generally.

In the case of a suspended sequential scan, which is the case where
this has most recently bitten me on a production system, it actually
seems rather unnecessary to hold the pin for a long period of time.
If we release the buffer pin, then someone could vacuum the buffer.  I
haven't looked in detail at the issues, but in theory that doesn't
seem like a huge problem: just remember which TIDs you've already
looked at and, when you re-acquire the buffer, pick up where you left
off.  Any tuples that have been vacuumed away meanwhile weren't going
to be visible to your scan anyway.

But there's an efficiency argument against doing it that way.  First,
if we release the pin then we'll have to reacquire the buffer, which
means taking and releasing a BufMappingLock, the buffer header
spinlock, and the buffer content lock.  Second, instead of returning a
pointer to the data in the page, we'll have to copy the data out of
the buffer before releasing the pin.

The situation is similar (perhaps even simpler) for index-only scans.
We could easily release the heap buffer pin after returning a tuple,
but it will make things much slower if the next heap fetch hits the
same page.

I wonder if we could arrange for a vacuum that's waiting for a cleanup
lock to signal the backends that could possibly be holding a
conflicting pin.  Sort of like what the startup process does during
Hot Standby, except that instead of killing the people holding the
pins, we'd send them a signal that says "if at all possible, could you
please release those buffer pins right away?", and then the backends
would try to comply.  Actually making that work though seems a bit
tricky, though, and getting it wrong would mean very, very rare,
nearly unreproducible bugs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Tue, Nov 8, 2011 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> But there's an efficiency argument against doing it that way.  First,
> if we release the pin then we'll have to reacquire the buffer, which
> means taking and releasing a BufMappingLock, the buffer header
> spinlock, and the buffer content lock.  Second, instead of returning a
> pointer to the data in the page, we'll have to copy the data out of
> the buffer before releasing the pin.

The only way I can see this working is to optimise this in the
planner, so that when we have a nested loop within a loop, we avoid
having the row on the outer loop pinned while we perform the inner
loop.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Tue, Nov 8, 2011 at 10:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Nov 8, 2011 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> But there's an efficiency argument against doing it that way.  First,
>> if we release the pin then we'll have to reacquire the buffer, which
>> means taking and releasing a BufMappingLock, the buffer header
>> spinlock, and the buffer content lock.  Second, instead of returning a
>> pointer to the data in the page, we'll have to copy the data out of
>> the buffer before releasing the pin.
>
> The only way I can see this working is to optimise this in the
> planner, so that when we have a nested loop within a loop, we avoid
> having the row on the outer loop pinned while we perform the inner
> loop.

Hmm.  I've actually never run into a problem that involved that
particular situation.

In any case, I think the issues are basically the same: keeping the
pin improves performance; dropping it helps VACUUM.  Both are
important.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Nov 8, 2011 at 2:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I think we need to avoid long pin hold times generally.

> In the case of a suspended sequential scan, which is the case where
> this has most recently bitten me on a production system, it actually
> seems rather unnecessary to hold the pin for a long period of time.
> If we release the buffer pin, then someone could vacuum the buffer.

This seems unlikely to be a productive line of thought.  The only way
you could release buffer pin is if you first copied all the tuples you
need out of the page, and that seems like an unacceptable performance
hit.  We should not be penalizing foreground query operations for the
benefit of background maintenance like VACUUM.  (The fact that we do
an equivalent thing in btree index scans isn't an argument for doing
it here, because the tradeoffs are very different.  In the index case,
the amount of data to be copied is a great deal less; the length of
time the lock would have to be held is often a great deal more; and
releasing the lock quickly gives a performance benefit for other
foreground operations, not only background maintenance.)

It strikes me that the only case where vacuum now has to wait is where
it needs to freeze an old XID.  Couldn't it do that without insisting on
exclusive access?  We only need exclusive access if we're going to move
data around, but we could have a code path in vacuum that just replaces
old XIDs with FrozenXID without moving/deleting anything.
        regards, tom lane


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Tue, Nov 8, 2011 at 10:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It strikes me that the only case where vacuum now has to wait is where
> it needs to freeze an old XID.  Couldn't it do that without insisting on
> exclusive access?  We only need exclusive access if we're going to move
> data around, but we could have a code path in vacuum that just replaces
> old XIDs with FrozenXID without moving/deleting anything.

Interesting idea.  I think in general we insist that you must have a
buffer content lock to inspect the tuple visibility info, in which
case that would be safe.  But I'm not sure we do that absolutely
everywhere.  For instance, just last night I noticed this:
                       /*                        * If xmin isn't what we're expecting, the
slot must have been                        * recycled and reused for an unrelated tuple.This implies that
        * the latest version of the row was deleted, 
so we need do                        * nothing.  (Should be safe to examine xmin
without getting                        * buffer's content lock, since xmin never
changes in an existing                        * tuple.)                        */                       if
(!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
 priorXmax))                       {                               ReleaseBuffer(buffer);
returnNULL;                       } 

Maybe we can convince ourselves that that case is OK, or fixable; I'm
not sure whether there are any others.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Interesting idea.  I think in general we insist that you must have a
> buffer content lock to inspect the tuple visibility info, in which
> case that would be safe.  But I'm not sure we do that absolutely
> everywhere.  For instance, just last night I noticed this:

>                         /*
>                          * If xmin isn't what we're expecting, the
> slot must have been
>                          * recycled and reused for an unrelated tuple.
>  This implies that
>                          * the latest version of the row was deleted,
> so we need do
>                          * nothing.  (Should be safe to examine xmin
> without getting
>                          * buffer's content lock, since xmin never
> changes in an existing
>                          * tuple.)
>                          */
>                         if

Hmm ... I think that code is OK but the comment needs work.  Here we are
necessarily looking for a pretty recent value of xmin (it has to be
later than GlobalXmin), so there's no need to worry that it might get
changed to FrozenXID.
        regards, tom lane


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Tue, Nov 8, 2011 at 10:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Interesting idea.  I think in general we insist that you must have a
>> buffer content lock to inspect the tuple visibility info, in which
>> case that would be safe.  But I'm not sure we do that absolutely
>> everywhere.  For instance, just last night I noticed this:
>
>>                         /*
>>                          * If xmin isn't what we're expecting, the
>> slot must have been
>>                          * recycled and reused for an unrelated tuple.
>>  This implies that
>>                          * the latest version of the row was deleted,
>> so we need do
>>                          * nothing.  (Should be safe to examine xmin
>> without getting
>>                          * buffer's content lock, since xmin never
>> changes in an existing
>>                          * tuple.)
>>                          */
>>                         if
>
> Hmm ... I think that code is OK but the comment needs work.  Here we are
> necessarily looking for a pretty recent value of xmin (it has to be
> later than GlobalXmin), so there's no need to worry that it might get
> changed to FrozenXID.

OK.  Here's another possible concern: what happens if the page we're
freezing contains a dead tuple?  It looks to me like
heap_freeze_tuple() is written so as not to require a cleanup lock -
indeed, the comments claim it's called when holding only a share lock
on the buffer, which doesn't appear to match what lazy_scan_heap() is
actually doing.  But it does seem to assume that any tuples that still
exist are all-visible, which only works if vacuum has already pruned
the page.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Wed, Nov 9, 2011 at 3:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Holding buffer pins for a long time is a problem in Hot Standby also,
> not just vacuum.

Agreed.

> AFAIK seq scans already work page at a time for normal tables. So the
> issue is when we *aren't* using a seq scan, e.g. nested loops joins.
>
> Is there a way to solve that?

Well, I'm not sure of the details of how page-at-a-time mode works for
seq scans, but I am absolutely 100% sure that you can reproduce this
problem using a cursor over a sequential scan.  Just do this:

create table test (a text);
insert into test values ('aaa'), ('bbb');
delete from test where a = 'aaa';
begin;
declare x cursor for select * from test;
fetch next from x;

Then switch to another session and run "VACUUM test".  Prior to commit
bbb6e559c4ea0fb4c346beda76736451dc24eb4e, this would hang.  Now, it
doesn't.  But "VACUUM FREEZE test" still does.

As for what to do about all this, I think Tom's idea would work for
good tuples, but the current freezing code can't handle dead tuples;
it counts on those having been already removed.  I wonder if we could
just set xmin = InvalidTransactionId and set HEAP_XMIN_INVALID, or
something like that.  I'm worried that there might be code out there
that thinks InvalidTransactionId can never appear in a real tuple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Tue, Nov 8, 2011 at 3:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Nov 8, 2011 at 2:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> I think we need to avoid long pin hold times generally.
>
>> In the case of a suspended sequential scan, which is the case where
>> this has most recently bitten me on a production system, it actually
>> seems rather unnecessary to hold the pin for a long period of time.
>> If we release the buffer pin, then someone could vacuum the buffer.
>
> This seems unlikely to be a productive line of thought.  The only way
> you could release buffer pin is if you first copied all the tuples you
> need out of the page, and that seems like an unacceptable performance
> hit.  We should not be penalizing foreground query operations for the
> benefit of background maintenance like VACUUM.  (The fact that we do
> an equivalent thing in btree index scans isn't an argument for doing
> it here, because the tradeoffs are very different.  In the index case,
> the amount of data to be copied is a great deal less; the length of
> time the lock would have to be held is often a great deal more; and
> releasing the lock quickly gives a performance benefit for other
> foreground operations, not only background maintenance.)
>
> It strikes me that the only case where vacuum now has to wait is where
> it needs to freeze an old XID.  Couldn't it do that without insisting on
> exclusive access?  We only need exclusive access if we're going to move
> data around, but we could have a code path in vacuum that just replaces
> old XIDs with FrozenXID without moving/deleting anything.


Holding buffer pins for a long time is a problem in Hot Standby also,
not just vacuum.

AFAIK seq scans already work page at a time for normal tables. So the
issue is when we *aren't* using a seq scan, e.g. nested loops joins.

Is there a way to solve that?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Wed, Nov 9, 2011 at 9:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> Well, I'm not sure of the details of how page-at-a-time mode works for
> seq scans, but I am absolutely 100% sure that you can reproduce this
> problem using a cursor over a sequential scan.  Just do this:
>
> create table test (a text);
> insert into test values ('aaa'), ('bbb');
> delete from test where a = 'aaa';
> begin;
> declare x cursor for select * from test;
> fetch next from x;

That's a bug. heapam.c line 1202 says
/* * we can use page-at-a-time mode if it's an MVCC-safe snapshot */scan->rs_pageatatime = IsMVCCSnapshot(snapshot);

So either the comment or the code is wrong.

Can't see where, as yet.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Wed, Nov 9, 2011 at 9:48 PM, simon <simon@2ndQuadrant.com> wrote:
> On Wed, Nov 9, 2011 at 9:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> Well, I'm not sure of the details of how page-at-a-time mode works for
>> seq scans, but I am absolutely 100% sure that you can reproduce this
>> problem using a cursor over a sequential scan.  Just do this:
>>
>> create table test (a text);
>> insert into test values ('aaa'), ('bbb');
>> delete from test where a = 'aaa';
>> begin;
>> declare x cursor for select * from test;
>> fetch next from x;
>
> That's a bug.

No, I'm wrong. It's not a bug at all.

heapgetpage() gets a page and a pin, but holds the pin until it reads
the next page. Wow!

That is both annoying and very dumb. It should hold the pin long
enough to copy the data and then release the pin.

It looks inefficient from a memory access viewpoint and from a pin
longevity viewpoint. If we copied out relevant data it would be more
cache efficient without affecting visibility. Looking at a patch.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> As for what to do about all this, I think Tom's idea would work for
> good tuples, but the current freezing code can't handle dead tuples;
> it counts on those having been already removed.

I have not gone back to look at the code, but are you worried about the
fact that it doesn't consider replacing xmax with FrozenTransactionId?
Surely we could do that if we wanted.  It just never seemed necessary
before --- but if vacuum is to be allowed to punt repeatedly on the same
page, maybe we do need to cover the case.
        regards, tom lane


Re: heap vacuum & cleanup locks

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> heapgetpage() gets a page and a pin, but holds the pin until it reads
> the next page. Wow!

> That is both annoying and very dumb. It should hold the pin long
> enough to copy the data and then release the pin.

I don't find that anywhere near as obvious as you seem to.  I think you
are trying to optimize for the wrong set of conditions.

I will also note that the behavior of holding pin for as long as we are
stopped on a particular tuple is not specific to seqscans.
        regards, tom lane


Re: heap vacuum & cleanup locks

From
Simon Riggs
Date:
On Wed, Nov 9, 2011 at 10:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> heapgetpage() gets a page and a pin, but holds the pin until it reads
>> the next page. Wow!
>
>> That is both annoying and very dumb. It should hold the pin long
>> enough to copy the data and then release the pin.
>
> I don't find that anywhere near as obvious as you seem to.  I think you
> are trying to optimize for the wrong set of conditions.

ISTM we should optimise to access the cachelines in the buffer once.
Holding a pin and re-accessing the buffer via main memory seems pretty
bad plan to me. Which conditions are being optimised by doing that?

> I will also note that the behavior of holding pin for as long as we are
> stopped on a particular tuple is not specific to seqscans.

Agreed. Bad things may happen in more than one place.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Wed, Nov 9, 2011 at 5:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> As for what to do about all this, I think Tom's idea would work for
>> good tuples, but the current freezing code can't handle dead tuples;
>> it counts on those having been already removed.
>
> I have not gone back to look at the code, but are you worried about the
> fact that it doesn't consider replacing xmax with FrozenTransactionId?
> Surely we could do that if we wanted.  It just never seemed necessary
> before --- but if vacuum is to be allowed to punt repeatedly on the same
> page, maybe we do need to cover the case.

No, I'm worried about the fact that that it does this:
       xid = HeapTupleHeaderGetXmin(tuple);       if (TransactionIdIsNormal(xid) &&
TransactionIdPrecedes(xid,cutoff_xid))       {               if (buf != InvalidBuffer)               {
    /* trade in share lock for exclusive lock */                       LockBuffer(buf, BUFFER_LOCK_UNLOCK);
         LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);                       buf = InvalidBuffer;               }
 HeapTupleHeaderSetXmin(tuple, FrozenTransactionId); 

Note that the ONLY thing we're checking with respect to the tuple xmin
is that it's a normal XID that precedes the cutoff.  We are not
checking whether it's committed.  So there had better not be any
tuples left on the page that are from inserting transactions that
aborted, or this is going to go boom.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Wed, Nov 9, 2011 at 6:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Nov 9, 2011 at 10:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> heapgetpage() gets a page and a pin, but holds the pin until it reads
>>> the next page. Wow!
>>
>>> That is both annoying and very dumb. It should hold the pin long
>>> enough to copy the data and then release the pin.
>>
>> I don't find that anywhere near as obvious as you seem to.  I think you
>> are trying to optimize for the wrong set of conditions.
>
> ISTM we should optimise to access the cachelines in the buffer once.
> Holding a pin and re-accessing the buffer via main memory seems pretty
> bad plan to me. Which conditions are being optimised by doing that?

I believe it reduces memory copying.  But we can certainly test some
alternative you may have in mind and see how it shakes out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: heap vacuum & cleanup locks

From
Greg Stark
Date:
Zombie thread arise!

I was searching for old threads on a specific problem and came across
this patch that was dropped due to concerns about SnapshotNow:

On Wed, Nov 2, 2011 at 3:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jun 7, 2011 at 3:24 PM, Greg Stark <gsstark@mit.edu> wrote:
>> Well it's super-exclusive-vacuum-lock avoidance techniques. Why
>> shouldn't it make more sense to try to reduce the frequency and impact
>> of the single-purpose outlier in a non-critical-path instead of
>> burdening every other data reader with extra overhead?
>>
>> I think Robert's plan is exactly right though I would phrase it
>> differently. We should get the exclusive lock, freeze/kill any xids
>> and line pointers, then if the pin-count is 1 do the compaction.
>
> I wrote a really neat patch to do this today...  and then, as I
> thought about it some more, I started to think that it's probably
> unsafe.  Here's the trouble: with this approach, we assume that it's
> OK to change the contents of the line pointer while holding only an
> exclusive lock on the buffer.  But there is a good deal of code out
> there that thinks it's OK to examine a line pointer with only a pin on
> the buffer (no lock).  You need a content lock to scan the item
> pointer array, but once you've identified an item of interest, you're
> entitled to assume that it won't be modified while you hold a buffer
> pin.  Now, if you've identified a particular tuple as being visible to
> your scan, then you might think that VACUUM shouldn't be removing it
> anyway.  But I think that's only true for MVCC scans - for example,
> what happens under SnapshotNow semantics?

Well now that you've done all that amazing work eliminating
SnapshotNow maybe this patch deserves another look? I see
anti-wraparound vacuums more and more often as database transaction
velocity increases and vacuum takes longer and longer as database
sizes increase. Having a faster vacuum that can skip vacuuming pages
but is still guaranteed to freeze every page is pretty tempting.

Of course the freeze epoch in the header obviating the need for
freezing is an even more attractive goal but AIUI we're not even sure
that's going to work and this is a nice easy win.

>  But then then on third
> thought, if you've also got an MVCC snapshot taken before the start of
> the SnapshotNow scan, you are probably OK, because your advertised
> xmin should prevent anyone from removing anything anyway, so how do
> you actually provoke a failure?
>
> Anyway, I'm attaching the patch, in case anyone has any ideas on where
> to go with this.
>
>> I'm really wishing we had more bits in the vm. It looks like we could use:
>>  - contains not-all-visible tuples
>>  - contains not-frozen xids
>>  - in need of compaction

Another idea would be to store the upper 2-4 bits of the oldest xid in
the page. That would let you determine whether it's in need of an
anti-wraparound vacuum.


-- 
greg



Re: heap vacuum & cleanup locks

From
Robert Haas
Date:
On Mon, Jun 30, 2014 at 7:55 PM, Greg Stark <stark@mit.edu> wrote:
> On Wed, Nov 2, 2011 at 3:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Jun 7, 2011 at 3:24 PM, Greg Stark <gsstark@mit.edu> wrote:
>>> Well it's super-exclusive-vacuum-lock avoidance techniques. Why
>>> shouldn't it make more sense to try to reduce the frequency and impact
>>> of the single-purpose outlier in a non-critical-path instead of
>>> burdening every other data reader with extra overhead?
>>>
>>> I think Robert's plan is exactly right though I would phrase it
>>> differently. We should get the exclusive lock, freeze/kill any xids
>>> and line pointers, then if the pin-count is 1 do the compaction.
>>
>> I wrote a really neat patch to do this today...  and then, as I
>> thought about it some more, I started to think that it's probably
>> unsafe.  Here's the trouble: with this approach, we assume that it's
>> OK to change the contents of the line pointer while holding only an
>> exclusive lock on the buffer.  But there is a good deal of code out
>> there that thinks it's OK to examine a line pointer with only a pin on
>> the buffer (no lock).  You need a content lock to scan the item
>> pointer array, but once you've identified an item of interest, you're
>> entitled to assume that it won't be modified while you hold a buffer
>> pin.  Now, if you've identified a particular tuple as being visible to
>> your scan, then you might think that VACUUM shouldn't be removing it
>> anyway.  But I think that's only true for MVCC scans - for example,
>> what happens under SnapshotNow semantics?
>
> Well now that you've done all that amazing work eliminating
> SnapshotNow ...

Thank you.  :-)

> ... maybe this patch deserves another look? I see
> anti-wraparound vacuums more and more often as database transaction
> velocity increases and vacuum takes longer and longer as database
> sizes increase. Having a faster vacuum that can skip vacuuming pages
> but is still guaranteed to freeze every page is pretty tempting.
>
> Of course the freeze epoch in the header obviating the need for
> freezing is an even more attractive goal but AIUI we're not even sure
> that's going to work and this is a nice easy win.

Well, there's still SnapshotSelf, SnapshotAny, SnapshotDirty, ...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company