Re: vac truncation scan problems - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: vac truncation scan problems
Date
Msg-id CAMkU=1y10tLBS=34iMh6N8VAjKHrcQOM10OrfCDp59sVnihsmQ@mail.gmail.com
Whole thread Raw
In response to Re: vac truncation scan problems  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: vac truncation scan problems
List pgsql-hackers
On Tue, Mar 31, 2015 at 1:29 AM, Michael Paquier <michael.paquier@gmail.com> wrote:


On Tue, Mar 31, 2015 at 3:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Mar 30, 2015 at 8:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
After freeing up the rows at the end of the table so it is eligible for truncation, then running a manual VACUUM to actually release the space, I kept running into the problem that the truncation scan was consistently suspended and then aborted due to a conflicting lock requested/held.  

But the perversity is that that conflicting lock request can only be coming, as far as I can tell, from the autovac process.  I'm not sure how this happens, as I thought autovac never waited for locks but only obtained one if it were instantaneously available, but that it is the only explanation I can think of.

I'm not seeing this in 9.4, but I'm not sure how deterministic it is so maybe that is just luck.


It looks like the culprit is this:

commit 0d831389749a3baaced7b984205b9894a82444b9
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Wed Mar 18 11:52:33 2015 -0300

    Rationalize vacuuming options and parameters

I'd guess the autovac nature of the autovac process is getting lost in there where, but I don't see where.

Hm. I ran a couple of tests and am noticing that a manual VACUUM is not able to truncate all the pages (it should, no?)... For example with your test case on REL9_4_STABLE VACUUM VERBOSE reports that all the pages are truncated:
INFO:  00000: "pgbench_accounts": truncated 16394 to 0 pages

OK, on HEAD this does not seem to work:
INFO:  00000: "pgbench_accounts": truncated 16394 to 13554 pages

But if I try as well with 4559167c (0d831389~1) I am getting a similar result:
INFO:  00000: "pgbench_accounts": truncated 16394 to 3309 pages

I was not seeing that on commits before 0d831389--although often once the truncation scan had run for a while without problem, I would abort the session and call it good, as my failures were always quick ones.  

Did it tell you why?  If it surrendered the lock to a competing process, it should report that as previous INFO messages. If it doesn't give one of those, then it probably just thinks there are some tuples it can't remove yet somewhere.  What did it give earlier up in the verbose output, for the number of removed and nonremovable tuples?
 
Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: vac truncation scan problems
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Bug #10432 failed to re-find parent key in index