Thread: Vaccum Stalling

Vaccum Stalling

From
Brad Nicholson
Date:
Version 7.4.12
AIX 5.3

Scenario - a large table was not being vacuumed correctly, there now ~
15 million dead tuples that account for approximately 20%-25% of the
table.  Vacuum appears to be stalling - ran for approximately 10 hours
before I killed it.  I hooked up to the process with gdb and this looks
a bit suspicious to me.

gdb attach 979354
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "powerpc-ibm-aix5.1.0.0"...attach: No such
file or directory.

Attaching to process 979354
[Switching to Thread 1]
0xd033ce1c in read () from /usr/lib/libc.a(shr.o)
(gdb) bt
#0  0xd033ce1c in read () from /usr/lib/libc.a(shr.o)
#1  0x10034398 in FileRead (file=805585096, buffer=0xb38d0e40 "",
amount=8192) at fd.c:973
#2  0x1009eea8 in mdread (reln=0x301aa618, blocknum=160246,
buffer=0xb38d0e40 "") at md.c:434
#3  0x1009d3d0 in smgrread (which=8192, reln=0x301aa618,
blocknum=160246,
    buffer=0xffffffff <Address 0xffffffff out of bounds>) at smgr.c:316
#4  0x1002c0a8 in ReadBufferInternal (reln=0x301aa618, blockNum=160246,
bufferLockHeld=84 'T') at bufmgr.c:230
#5  0x101e5080 in _bt_getbuf (rel=0x301aa618, blkno=4294967295,
access=1) at nbtpage.c:401
#6  0x101e41c4 in btbulkdelete (fcinfo=0xffffffff) at nbtree.c:692
#7  0x10048670 in OidFunctionCall3 (functionId=8192, arg1=807052824,
arg2=805496980, arg3=806803160)
    at fmgr.c:1260
#8  0x100dad04 in index_bulk_delete (indexRelation=0x301aa618,
    callback=@0x3002e894: 0x10194150 <lazy_tid_reaped>,
callback_state=0x3016d6d8) at indexam.c:618
#9  0x10193fcc in lazy_vacuum_index (indrel=0x301aa618,
vacrelstats=0x3016d6d8) at vacuumlazy.c:712
#10 0x10194b04 in lazy_vacuum_rel (onerel=0x302357b8,
vacstmt=0xffffffff) at vacuumlazy.c:279
#11 0x10192da4 in vacuum_rel (relid=807622584, vacstmt=0x3016b558,
expected_relkind=114 'r') at vacuum.c:855
#12 0x10192fb0 in vacuum (vacstmt=0x3016b558) at vacuum.c:290
#13 0x10141670 in ProcessUtility (parsetree=0x3016b558,
dest=0x3016b5a8,
completionTag=0x2ff1d0c4 "")
    at utility.c:831
#14 0x10121b20 in PortalRunUtility (portal=0x301f3768,
query=0x3016b5f8,
dest=0x3016b5a8,
    completionTag=0x2ff1d0c4 "") at pquery.c:772
#15 0x1012204c in PortalRun (portal=0x301f3768, count=806794920,
dest=0x3016b5a8, altdest=0x3016b5a8,
    completionTag=0x2ff1d0c4 "") at pquery.c:836
#16 0x10010034 in exec_simple_query (query_string=0x3016b278 "VACUUM
ANALYZE v\erbos\e;") at postgres.c:914
---Type <return> to continue, or q <return> to quit---
#17 0x100119f8 in PostgresMain (argc=4, argv=0x30086e78,
username=0x30086db8 "slony") at postgres.c:2973
#18 0x101b79e4 in ServerLoop () at postmaster.c:2564
#19 0x101b95a0 in PostmasterMain (argc=1, argv=0x300853c8) at
postmaster.c:897
#20 0x1000053c in main (argc=1, argv=0x2ff22c40) at main.c:222
#21 0x10000204 in __start ()

Ideas?
--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Vaccum Stalling

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> Scenario - a large table was not being vacuumed correctly, there now ~
> 15 million dead tuples that account for approximately 20%-25% of the
> table.  Vacuum appears to be stalling - ran for approximately 10 hours
> before I killed it.  I hooked up to the process with gdb and this looks
> a bit suspicious to me.

Looks perfectly normal to me.  Reads in btbulkdelete are exactly where
I'd expect 7.4's vacuum to be spending the bulk of its wait time on a
large table, because that's a logical-order traversal of the index, and
cannot benefit from any sequential-access advantage.  (As of 8.2 we
are able to do this with a physical-order traversal, which can be a
whole lot faster.)

If you can afford to lock the table against writes for awhile,
reindexing might help by bringing the index back into physical order.

            regards, tom lane

Re: Vaccum Stalling

From
Brad Nicholson
Date:
On Tue, 2007-07-10 at 11:19 -0400, Tom Lane wrote:
> Oh, I forgot to mention --- you did check that vacuum_mem is set to
> a pretty high value, no?  Else you might be doing a lot more
> btbulkdelete scans than you need to.
>
>             regards, tom lane

What would you define as high for 7.4?  I bumped it up to ~ 245mbs

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Vaccum Stalling

From
Tom Lane
Date:
Oh, I forgot to mention --- you did check that vacuum_mem is set to
a pretty high value, no?  Else you might be doing a lot more
btbulkdelete scans than you need to.

            regards, tom lane

Re: Vaccum Stalling

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> On Tue, 2007-07-10 at 11:19 -0400, Tom Lane wrote:
>> Oh, I forgot to mention --- you did check that vacuum_mem is set to
>> a pretty high value, no?  Else you might be doing a lot more
>> btbulkdelete scans than you need to.

> What would you define as high for 7.4?  I bumped it up to ~ 245mbs

That sounds like plenty --- you only need 6 bytes per dead tuple,
so that should be enough to handle all your 15-20M dead tuples in
one scan.

How big is this index again?

            regards, tom lane

Re: Vaccum Stalling

From
Brad Nicholson
Date:
On Tue, 2007-07-10 at 11:31 -0400, Tom Lane wrote:
> Brad Nicholson <bnichols@ca.afilias.info> writes:
> > On Tue, 2007-07-10 at 11:19 -0400, Tom Lane wrote:
> >> Oh, I forgot to mention --- you did check that vacuum_mem is set to
> >> a pretty high value, no?  Else you might be doing a lot more
> >> btbulkdelete scans than you need to.
>
> > What would you define as high for 7.4?  I bumped it up to ~ 245mbs
>
> That sounds like plenty --- you only need 6 bytes per dead tuple,
> so that should be enough to handle all your 15-20M dead tuples in
> one scan.
>
> How big is this index again?

Not sure which one it's working on - there are 6 of them each are ~
2.5GB

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Vaccum Stalling

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> On Tue, 2007-07-10 at 11:31 -0400, Tom Lane wrote:
>> How big is this index again?

> Not sure which one it's working on - there are 6 of them each are ~
> 2.5GB

OK, about 300K pages each ... so even assuming the worst case that
each page requires a physical disk seek, it should take less than an
hour to vacuum each one.  So 10 hours is beginning to sound a bit
suspicious to me too, though it's not beyond the threshold of
incredulity quite yet.

It's conceivable that that index has been corrupted in such a way
that there's a loop of pages whose right-links point back to each other,
which would cause the btbulkdelete scan to never terminate.  If that's
the case then the best fix is to REINDEX.  But I think I'd counsel
letting the VACUUM run awhile longer first, just in case it will finish;
unless you have clear evidence that it won't, like previous runs having
also gone until killed.  One thing you could try is strace'ing the
vacuum for awhile to see if you can detect any evidence of fetching the
same pages over and over.  (This would also help you find out which
index it's working on.)

            regards, tom lane

Re: Vaccum Stalling

From
"Pavel Stehule"
Date:
Hello

I have similar problem with vacuum on 8.1

I have 256M table. pgstattuple reports 128M free. I stopped vacuum
after 1hour (maintenance_work_mem = 160M). I had not more time.

Regards
Pavel Stehule



2007/7/10, Tom Lane <tgl@sss.pgh.pa.us>:
> Brad Nicholson <bnichols@ca.afilias.info> writes:
> > On Tue, 2007-07-10 at 11:19 -0400, Tom Lane wrote:
> >> Oh, I forgot to mention --- you did check that vacuum_mem is set to
> >> a pretty high value, no?  Else you might be doing a lot more
> >> btbulkdelete scans than you need to.
>
> > What would you define as high for 7.4?  I bumped it up to ~ 245mbs
>
> That sounds like plenty --- you only need 6 bytes per dead tuple,
> so that should be enough to handle all your 15-20M dead tuples in
> one scan.
>
> How big is this index again?
>
>                         regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>

Re: Vaccum Stalling

From
"Pavel Stehule"
Date:
> Hello
>
> I have similar problem with vacuum on 8.1
>
> I have 256M table. pgstattuple reports 128M free. I stopped vacuum
> after 1hour (maintenance_work_mem = 160M). I had not more time.
>

I test it on 8.3 with random data. Vacuum from 190M to 94M neded
30sec. It's much better. It isn't 100% comparable, but it is one from
more arguments for upgrade.

Regards
Pavel Stehule


>
>
> 2007/7/10, Tom Lane <tgl@sss.pgh.pa.us>:
> > Brad Nicholson <bnichols@ca.afilias.info> writes:
> > > On Tue, 2007-07-10 at 11:19 -0400, Tom Lane wrote:
> > >> Oh, I forgot to mention --- you did check that vacuum_mem is set to
> > >> a pretty high value, no?  Else you might be doing a lot more
> > >> btbulkdelete scans than you need to.
> >
> > > What would you define as high for 7.4?  I bumped it up to ~ 245mbs
> >
> > That sounds like plenty --- you only need 6 bytes per dead tuple,
> > so that should be enough to handle all your 15-20M dead tuples in
> > one scan.
> >
> > How big is this index again?
> >
> >                         regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> >                http://archives.postgresql.org/
> >
>