Thread: lazy_truncate_heap()

lazy_truncate_heap()

From
Simon Riggs
Date:
While watching WAL records float by I noticed some AccessExclusiveLocks
occurring unnecessarily during VACUUMs.

This is caused by lines 186-189 in lazy_vacuum_rel(), vacuumlazy.c
 possibly_freeable = vacrelstats->rel_pages -                 vacrelstats->nonempty_pages; if (possibly_freeable >=
REL_TRUNCATE_MINIMUM||     possibly_freeable >= vacrelstats->rel_pages /                 REL_TRUNCATE_FRACTION)
lazy_truncate_heap(onerel,vacrelstats);
 

If you look closely you'll see that if rel_pages is small then we will
attempt to truncate the heap even if possibly_freeable == 0. 

While looking at this some more, I notice there is another bug. When
VACUUM has nothing at all to do, then it appears that
vacrelstats->nonempty_pages is zero, so that possibly_freeable is always
set to vacrelstats->rel_pages. vacrelstats->nonempty_pages doesn't
appear to be used anywhere else, so nobody notices it is wrongly set.

Both of these bugs are minor, but the effect of either/both of them is
to cause more AccessExclusiveLocks than we might expect. 

For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
on relations, which would potentially lead to having queries cancelled
for no reason at all.

Does anybody think any of the above is intentional? Can I fix?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: lazy_truncate_heap()

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> While watching WAL records float by I noticed some AccessExclusiveLocks
> occurring unnecessarily during VACUUMs.
> 
> This is caused by lines 186-189 in lazy_vacuum_rel(), vacuumlazy.c
> 
>   possibly_freeable = vacrelstats->rel_pages -     
>                 vacrelstats->nonempty_pages;
>   if (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
>       possibly_freeable >= vacrelstats->rel_pages /
>                      REL_TRUNCATE_FRACTION)
>         lazy_truncate_heap(onerel, vacrelstats);
> 
> If you look closely you'll see that if rel_pages is small then we will
> attempt to truncate the heap even if possibly_freeable == 0. 

Good catch! And it goes all the way back to 7.4.

> While looking at this some more, I notice there is another bug. When
> VACUUM has nothing at all to do, then it appears that
> vacrelstats->nonempty_pages is zero, so that possibly_freeable is always
> set to vacrelstats->rel_pages. vacrelstats->nonempty_pages doesn't
> appear to be used anywhere else, so nobody notices it is wrongly set.

Hmm, this is a new issue with the visibility map. For pages at the end 
that are skipped, we don't know if they're empty or not. Currently we 
assume that they are, but perhaps it would be better to assume they're 
not? On the other hand, that makes it much less likely that we even try 
to truncate a relation, and when we do, we wouldn't truncate it as far 
as we could.

> Does anybody think any of the above is intentional? Can I fix?

No. Yes please.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: lazy_truncate_heap()

From
Greg Stark
Date:
On 31 Dec 2008, at 13:21, Simon Riggs <simon@2ndQuadrant.com> wrote:
>
> Both of these bugs are minor, but the effect of either/both of them is
> to cause more AccessExclusiveLocks than we might expect.
>
> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
> on relations, which would potentially lead to having queries cancelled
> for no reason at all.

Well by default it would just cause wal to pause briefly until the  
queries with those locks finish, no?


Re: lazy_truncate_heap()

From
Heikki Linnakangas
Date:
Greg Stark wrote:
> 
> On 31 Dec 2008, at 13:21, Simon Riggs <simon@2ndQuadrant.com> wrote:
>>
>> Both of these bugs are minor, but the effect of either/both of them is
>> to cause more AccessExclusiveLocks than we might expect.
>>
>> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
>> on relations, which would potentially lead to having queries cancelled
>> for no reason at all.
> 
> Well by default it would just cause wal to pause briefly until the 
> queries with those locks finish, no?

Wait a minute. Why does an AccessExclusiveLock lead to cancelled queries 
or pausing WAL application? I thought it'd just block other queries 
trying to acquire a conflicting lock in the standby, just like holding 
an AccessExclusiveLock on the primary does. It's unrelated to the xmin 
horizon issue.

There is a noteworthy point though. In the primary, vacuum trying to 
truncate takes AccessExclusiveLock conditionally, so that it doesn't 
disturb queries accessing the table, and only truncates the table if it 
got the lock. But in standby, we have to truncate the table, and 
therefore have to acquire the lock, waiting until we get it. I guess we 
have to stop applying WAL while waiting.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: lazy_truncate_heap()

From
Simon Riggs
Date:
On Wed, 2008-12-31 at 14:45 -0500, Greg Stark wrote:
> On 31 Dec 2008, at 13:21, Simon Riggs <simon@2ndQuadrant.com> wrote:
> >
> > Both of these bugs are minor, but the effect of either/both of them is
> > to cause more AccessExclusiveLocks than we might expect.
> >
> > For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
> > on relations, which would potentially lead to having queries cancelled
> > for no reason at all.
> 
> Well by default it would just cause wal to pause briefly until the  
> queries with those locks finish, no?

Yes, but why allow pointless actions in the first place?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: lazy_truncate_heap()

From
Simon Riggs
Date:
On Wed, 2008-12-31 at 21:45 +0200, Heikki Linnakangas wrote:
> > Can I fix?
>
> Yes please.

Fix attached.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

Re: lazy_truncate_heap()

From
Simon Riggs
Date:
On Thu, 2009-01-01 at 12:00 +0200, Heikki Linnakangas wrote:
> Greg Stark wrote:
> > 
> > On 31 Dec 2008, at 13:21, Simon Riggs <simon@2ndQuadrant.com> wrote:
> >>
> >> Both of these bugs are minor, but the effect of either/both of them is
> >> to cause more AccessExclusiveLocks than we might expect.
> >>
> >> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
> >> on relations, which would potentially lead to having queries cancelled
> >> for no reason at all.
> > 
> > Well by default it would just cause wal to pause briefly until the 
> > queries with those locks finish, no?
> 
> Wait a minute. Why does an AccessExclusiveLock lead to cancelled queries 
> or pausing WAL application? I thought it'd just block other queries 
> trying to acquire a conflicting lock in the standby, just like holding 
> an AccessExclusiveLock on the primary does. It's unrelated to the xmin 
> horizon issue.

Yes, it is unrelated to the xmin horizon issue. There are two reasons
for delaying WAL apply:
* locks
* xmin horizon

When a lock is acquired on the primary it almost always precedes an
action which cannot occur concurrently. For example, if VACUUM did
truncate a table then queries could get errors because parts of their
table disappear from under them. Others are drop table etc..

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: lazy_truncate_heap()

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-01-01 at 12:00 +0200, Heikki Linnakangas wrote:
>> Greg Stark wrote:
>>> On 31 Dec 2008, at 13:21, Simon Riggs <simon@2ndQuadrant.com> wrote:
>>>> Both of these bugs are minor, but the effect of either/both of them is
>>>> to cause more AccessExclusiveLocks than we might expect.
>>>>
>>>> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
>>>> on relations, which would potentially lead to having queries cancelled
>>>> for no reason at all.
>>> Well by default it would just cause wal to pause briefly until the 
>>> queries with those locks finish, no?
>> Wait a minute. Why does an AccessExclusiveLock lead to cancelled queries 
>> or pausing WAL application? I thought it'd just block other queries 
>> trying to acquire a conflicting lock in the standby, just like holding 
>> an AccessExclusiveLock on the primary does. It's unrelated to the xmin 
>> horizon issue.
> 
> Yes, it is unrelated to the xmin horizon issue. There are two reasons
> for delaying WAL apply:
> * locks
> * xmin horizon
> 
> When a lock is acquired on the primary it almost always precedes an
> action which cannot occur concurrently. For example, if VACUUM did
> truncate a table then queries could get errors because parts of their
> table disappear from under them. Others are drop table etc..

Have you implemented the query cancellation mechanism for that scenario 
too? (I'm cool either way, just curious..)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: lazy_truncate_heap()

From
Simon Riggs
Date:
On Sun, 2009-01-04 at 13:01 +0200, Heikki Linnakangas wrote:
> Why does an AccessExclusiveLock lead to cancelled queries 
> >> or pausing WAL application? I thought it'd just block other queries 
> >> trying to acquire a conflicting lock in the standby, just like holding 
> >> an AccessExclusiveLock on the primary does. It's unrelated to the xmin 
> >> horizon issue.
> > 
> > Yes, it is unrelated to the xmin horizon issue. There are two reasons
> > for delaying WAL apply:
> > * locks
> > * xmin horizon
> > 
> > When a lock is acquired on the primary it almost always precedes an
> > action which cannot occur concurrently. For example, if VACUUM did
> > truncate a table then queries could get errors because parts of their
> > table disappear from under them. Others are drop table etc..
> 
> Have you implemented the query cancellation mechanism for that scenario 
> too? (I'm cool either way, just curious..)

Yes, they both lead to a conflict between WAL and standby queries, so
are treated the same, currently: if conflict occurs, wait until
max_standby_delay expires, then cancel.

Logically, "xmin horizon" conflicts could be flexible/soft. That is, if
we implemented the idea to store a lastCleanedLSN for each buffer then
"xmin horizon" conflicts would be able to continue executing until they
see a buffer with buffer.lastCleanedLSN > conflictLSN. Whereas the lock
would be a hard limit beyond which a query could not progress.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: lazy_truncate_heap()

From
Zeugswetter Andreas OSB sIT
Date:
> Logically, "xmin horizon" conflicts could be flexible/soft.
> That is, if we implemented the idea to store a lastCleanedLSN for each buffer then
> "xmin horizon" conflicts would be able to continue executing until they
> see a buffer with buffer.lastCleanedLSN > conflictLSN.

I think the trouble is, that HOT can put extremely recent lastCleanedLSN's on pages.
It would need some knobs to avoid this, that most likely reduce efficiency of HOT.

What about using the page LSN after max_standby_delay ?
Using the page LSN cancels queries earlier than the lastCleanedLSN,
but probably in many cases later than an immediate cancel after max_standby_delay.
Of course that only helps when reading static parts of tables :-(

Instead of a cancel message, the replay would need to send (set in shmem) the first
LSN applied after max_standby_delay to the relevant backend for it's LSN checks
(if buffer.LSN >= received_max_delay_lsn cancel).

Andreas

Re: lazy_truncate_heap()

From
Simon Riggs
Date:
On Mon, 2009-01-05 at 18:24 +0100, Zeugswetter Andreas OSB sIT wrote:
> > Logically, "xmin horizon" conflicts could be flexible/soft. 
> > That is, if we implemented the idea to store a lastCleanedLSN for each buffer then
> > "xmin horizon" conflicts would be able to continue executing until they
> > see a buffer with buffer.lastCleanedLSN > conflictLSN.
> 
> I think the trouble is, that HOT can put extremely recent lastCleanedLSN's on pages.
> It would need some knobs to avoid this, that most likely reduce efficiency of HOT.
> 
> What about using the page LSN after max_standby_delay ?
> Using the page LSN cancels queries earlier than the lastCleanedLSN,
> but probably in many cases later than an immediate cancel after max_standby_delay.
> Of course that only helps when reading static parts of tables :-(
> 
> Instead of a cancel message, the replay would need to send (set in shmem) the first 
> LSN applied after max_standby_delay to the relevant backend for it's LSN checks
> (if buffer.LSN >= received_max_delay_lsn cancel).

I like your train of thought there: If HOT is the problem then
lastCleanedLSN approx= LSN on HOT blocks, so having lastCleanedLSN
doesn't help much.

OK, so I'll skip that idea and go with what you suggest.

Design:

When conflict occurs we set a RecoveryConflictLSN on the Proc of the
backend to be cancelled. Every time we read a block in recovery we check
MyProc.RecoveryConflictLSN against the LSN on the block and backend will
commit suicide (ERROR) if block LSN is equal or greater.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: lazy_truncate_heap()

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Wed, 2008-12-31 at 21:45 +0200, Heikki Linnakangas wrote:
>>> Can I fix?
>> Yes please.
> 
> Fix attached.

> --- 183,192 ----
>        * number of pages.  Otherwise, the time taken isn't worth it.
>        */
>       possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
> !     if (vacrelstats->tuples_deleted > 0 &&
> !         (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
> !          (possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION &&
> !           possibly_freeable > 0)))
>           lazy_truncate_heap(onerel, vacrelstats);
>   

Where did that "tuples_deleted > 0" condition come from? It seems 
counter-productive; if a previous vacuum failed to acquire the lock, 
subsequent vacuums wouldn't even try if they don't remove any tuples. 
How about simply:

***************
*** 183,190 ****       * number of pages.  Otherwise, the time taken isn't worth it.       */      possibly_freeable =
vacrelstats->rel_pages- 
 
vacrelstats->nonempty_pages;
!     if (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
!         possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION)          lazy_truncate_heap(onerel,
vacrelstats);
      /* Vacuum the Free Space Map */
--- 183,191 ----       * number of pages.  Otherwise, the time taken isn't worth it.       */      possibly_freeable =
vacrelstats->rel_pages- 
 
vacrelstats->nonempty_pages;
!     if (possibly_freeable > 0 &&
!         (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
!          possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION))          lazy_truncate_heap(onerel,
vacrelstats);
      /* Vacuum the Free Space Map */


--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: lazy_truncate_heap()

From
Simon Riggs
Date:
On Tue, 2009-01-06 at 15:48 +0200, Heikki Linnakangas wrote:
> How about simply:

OK

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: lazy_truncate_heap()

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Tue, 2009-01-06 at 15:48 +0200, Heikki Linnakangas wrote:
>> How about simply:
> 
> OK

Committed and backpatched all the way back to 7.4 stable.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com