Thread: vacuum timings

vacuum timings

From
Bruce Momjian
Date:
I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER);  Table is
400MB and index is 160MB.

With index on the single in4 column, I got: 78 seconds for a vacuum121 seconds for vacuum after deleting a single
row662seconds for vacuum after deleting the entire table
 

With no index, I got: 43 seconds for a vacuum 43 seconds for vacuum after deleting a single row 43 seconds for vacuum
afterdeleting the entire table
 

I find this quite interesting.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: vacuum timings

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER);  Table is
> 400MB and index is 160MB.

> With index on the single in4 column, I got:
>      78 seconds for a vacuum
>     121 seconds for vacuum after deleting a single row
>     662 seconds for vacuum after deleting the entire table

> With no index, I got:
>      43 seconds for a vacuum
>      43 seconds for vacuum after deleting a single row
>      43 seconds for vacuum after deleting the entire table

> I find this quite interesting.

How long does it take to create the index on your setup --- ie,
if vacuum did a drop/create index, would it be competitive?
        regards, tom lane


Re: [HACKERS] vacuum timings

From
Vadim Mikheev
Date:
Bruce Momjian wrote:
> 
> I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER);  Table is
> 400MB and index is 160MB.
> 
> With index on the single in4 column, I got:
>          78 seconds for a vacuum
>         121 seconds for vacuum after deleting a single row
>         662 seconds for vacuum after deleting the entire table
> 
> With no index, I got:
>          43 seconds for a vacuum
>          43 seconds for vacuum after deleting a single row
>          43 seconds for vacuum after deleting the entire table

Wi/wo -F ?

Vadim


RE: [HACKERS] vacuum timings

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian
> 
> I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER);  Table is
> 400MB and index is 160MB.
> 
> With index on the single in4 column, I got:
>      78 seconds for a vacuum    vc_vaconeind() is called once

>     121 seconds for vacuum after deleting a single row    vc_vaconeind() is called twice

Hmmm,vc_vaconeind() takes pretty long time even if it does little. 

>     662 seconds for vacuum after deleting the entire table
>

How about half of the rows deleted case ?
It would take longer time.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] vacuum timings

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> > 
> > I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER);  Table is
> > 400MB and index is 160MB.
> > 
> > With index on the single in4 column, I got:
> >          78 seconds for a vacuum
> >         121 seconds for vacuum after deleting a single row
> >         662 seconds for vacuum after deleting the entire table
> > 
> > With no index, I got:
> >          43 seconds for a vacuum
> >          43 seconds for vacuum after deleting a single row
> >          43 seconds for vacuum after deleting the entire table
> 
> Wi/wo -F ?

With no -F.

I can get you -F times tomorrow.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] vacuum timings

From
Vadim Mikheev
Date:
Hiroshi Inoue wrote:
> 
> >
> > With index on the single in4 column, I got:
> >        78 seconds for a vacuum
>                 vc_vaconeind() is called once                                   ^^^^^^
  not called ?
 
> 
> >       121 seconds for vacuum after deleting a single row
>                 vc_vaconeind() is called twice
> 
> Hmmm,vc_vaconeind() takes pretty long time even if it does little.

It reads all index leaf pages in any case...

Vadim


RE: [HACKERS] vacuum timings

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim
> Mikheev
> 
> Hiroshi Inoue wrote:
> > 
> > >
> > > With index on the single in4 column, I got:
> > >        78 seconds for a vacuum
> >                 vc_vaconeind() is called once
>                                     ^^^^^^
>                                     not called ?

Oops,you are right.
vc_scanoneind() is called once.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: vacuum timings

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER);  Table is
> > 400MB and index is 160MB.
> 
> > With index on the single in4 column, I got:
> >      78 seconds for a vacuum
> >     121 seconds for vacuum after deleting a single row
> >     662 seconds for vacuum after deleting the entire table
> 
> > With no index, I got:
> >      43 seconds for a vacuum
> >      43 seconds for vacuum after deleting a single row
> >      43 seconds for vacuum after deleting the entire table
> 
> > I find this quite interesting.
> 
> How long does it take to create the index on your setup --- ie,
> if vacuum did a drop/create index, would it be competitive?

OK, new timings with -F enabled:
index    no index519    same    load    247    "    first vacuum40    "    other vacuums1222    X    index creation90
X    first vacuum80    X    other vacuums<1    90    delete one row121    38    vacuum after delete 1 row346    344
deleteall rows440    44    first vacuum20    <1    other vacuums(index is still same size)
 

Conclusions:
o  indexes never get smallero  drop/recreate index is slower than vacuum of indexes

What other conclusions can be made?

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: vacuum timings

From
The Hermit Hacker
Date:
On Fri, 21 Jan 2000, Bruce Momjian wrote:

> OK, new timings with -F enabled:
> 
>     index    no index
>     519    same    load    
>     247    "    first vacuum
>     40    "    other vacuums
>     
>     1222    X    index creation
>     90    X    first vacuum
>     80    X    other vacuums
>     
>     <1    90    delete one row
>     121    38    vacuum after delete 1 row
>     
>     346    344    delete all rows
>     440    44    first vacuum
>     20    <1    other vacuums(index is still same size)
> 
> Conclusions:
> 
>     o  indexes never get smaller

this one, I thought, was a known?  if I remember right, Vadim changed it
so that space was reused, but index never shrunk in size ... no?

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: vacuum timings

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Conclusions:
>     o  indexes never get smaller

Which we knew...

>     o  drop/recreate index is slower than vacuum of indexes

Quite a few people have reported finding the opposite in practice.
You should probably try vacuuming after deleting or updating some
fraction of the rows, rather than just the all or none cases.
        regards, tom lane


Re: [HACKERS] Re: vacuum timings

From
Dmitry Samersoff
Date:
Tom Lane wrote:
> 
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Conclusions:
> >       o  indexes never get smaller
> 
> Which we knew...
> 
> >       o  drop/recreate index is slower than vacuum of indexes
> 
> Quite a few people have reported finding the opposite in practice.

I'm one of them. On 1,5 GB table with three indices it about twice
slowly.
Probably becouse vacuuming indices brakes system cache policy.
(FreeBSD 3.3)



-- 
Dmitry Samersoff, DM\S
dms@wplus.net http://devnull.wplus.net
* there will come soft rains


Re: [HACKERS] Re: vacuum timings

From
Bruce Momjian
Date:
[Charset koi8-r unsupported, filtering to ASCII...]
> Tom Lane wrote:
> > 
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Conclusions:
> > >       o  indexes never get smaller
> > 
> > Which we knew...
> > 
> > >       o  drop/recreate index is slower than vacuum of indexes
> > 
> > Quite a few people have reported finding the opposite in practice.
> 
> I'm one of them. On 1,5 GB table with three indices it about twice
> slowly.
> Probably becouse vacuuming indices brakes system cache policy.
> (FreeBSD 3.3)

OK, we are researching what things can be done to improve this.  We are
toying with:
lock table for less duration, or read lockcreating another copy of heap/indexes, and rename() over old filesimproving
heapvacuum speedimproving index vacuum speedmoving analyze out of vacuum
 

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: vacuum timings

From
The Hermit Hacker
Date:
On Fri, 21 Jan 2000, Bruce Momjian wrote:

> [Charset koi8-r unsupported, filtering to ASCII...]
> > Tom Lane wrote:
> > > 
> > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > > Conclusions:
> > > >       o  indexes never get smaller
> > > 
> > > Which we knew...
> > > 
> > > >       o  drop/recreate index is slower than vacuum of indexes
> > > 
> > > Quite a few people have reported finding the opposite in practice.
> > 
> > I'm one of them. On 1,5 GB table with three indices it about twice
> > slowly.
> > Probably becouse vacuuming indices brakes system cache policy.
> > (FreeBSD 3.3)
> 
> OK, we are researching what things can be done to improve this.  We are
> toying with:
> 
>     lock table for less duration, or read lock

if there is some way that we can work around the bug that I believe Tom
found with removing the lock altogether (ie. makig use of MVCC), I think
that would be the best option ... if not possible, at least get things
down to a table lock vs the whole database?

a good example is the udmsearch that we are using on the site ... it uses
multiple tables to store the dictionary, each representing words of X size
... if I'm searching on a 4 letter word, and the whole database is locked
while it is working on the dictionary with 8 letter words, I'm sitting
there idle ... at least if we only locked the 8 letter table, everyone not
doing 8 letter searches can go on their merry way ...

Slightly longer vacuum's, IMHO, are acceptable if, to the end users, its
as transparent as possible ... locking per table would be slightly slower,
I think, because once a table is finished, the next table would need to
have an exclusive lock put on it before starting, so you'd have to
possibly wait for that...?

>     creating another copy of heap/indexes, and rename() over old files

sounds to me like introducing a large potential for error here ...

>     moving analyze out of vacuum

I think that should be done anyway ... if we ever get to the point that
we're able to re-use rows in tables, then that would eliminate the
immediate requirement for vacuum, but still retain a requirement for a
periodic analyze ... no?

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Re: vacuum timings

From
Mike Mascari
Date:
Dmitry Samersoff wrote:
> 
> Tom Lane wrote:
> >
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Conclusions:
> > >       o  indexes never get smaller
> >
> > Which we knew...
> >
> > >       o  drop/recreate index is slower than vacuum of indexes
> >
> > Quite a few people have reported finding the opposite in practice.
> 
> I'm one of them. On 1,5 GB table with three indices it about twice
> slowly.
> Probably becouse vacuuming indices brakes system cache policy.

I'm another. Do the times increase linearly with each index
added? Do the times increase linearly for each index for each
field in a composite index? Does the field type being indexed
have any affect (varchar vs int)? 

Mike Mascari


Re: [HACKERS] Re: vacuum timings

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
>> lock table for less duration, or read lock

> if there is some way that we can work around the bug that I believe Tom
> found with removing the lock altogether (ie. makig use of MVCC), I think
> that would be the best option ... if not possible, at least get things
> down to a table lock vs the whole database?

Huh?  VACUUM only requires an exclusive lock on the table it is
currently vacuuming; there's no database-wide lock.

Even a single-table exclusive lock is bad, of course, if it's a large
table that's critical to a 24x7 application.  Bruce was talking about
the possibility of having VACUUM get just a write lock on the table;
other backends could still read it, but not write it, during the vacuum
process.  That'd be a considerable step forward for 24x7 applications,
I think.

It looks like that could be done if we rewrote the table as a new file
(instead of compacting-in-place), but there's a problem when it comes
time to rename the new files into place.  At that point you'd need to
get an exclusive lock to ensure all the readers are out of the table too
--- and upgrading from a plain lock to an exclusive lock is a well-known
recipe for deadlocks.  Not sure if this can be solved.
        regards, tom lane


Re: [HACKERS] Re: vacuum timings

From
The Hermit Hacker
Date:
On Fri, 21 Jan 2000, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> >> lock table for less duration, or read lock
> 
> > if there is some way that we can work around the bug that I believe Tom
> > found with removing the lock altogether (ie. makig use of MVCC), I think
> > that would be the best option ... if not possible, at least get things
> > down to a table lock vs the whole database?
> 
> Huh?  VACUUM only requires an exclusive lock on the table it is
> currently vacuuming; there's no database-wide lock.
> 
> Even a single-table exclusive lock is bad, of course, if it's a large
> table that's critical to a 24x7 application.  Bruce was talking about
> the possibility of having VACUUM get just a write lock on the table;
> other backends could still read it, but not write it, during the vacuum
> process.  That'd be a considerable step forward for 24x7 applications,
> I think.
> 
> It looks like that could be done if we rewrote the table as a new file
> (instead of compacting-in-place), but there's a problem when it comes
> time to rename the new files into place.  At that point you'd need to
> get an exclusive lock to ensure all the readers are out of the table too
> --- and upgrading from a plain lock to an exclusive lock is a well-known
> recipe for deadlocks.  Not sure if this can be solved.

What would it take to re-use space vs compacting/truncating the file?  

Right now, ppl vacuum the database to clear out old, deleted records, and
truncate the tables ... if we were to change things so that an
insert/update were to find the next largest contiguous free block in the
table and re-used it, then, theoretically, you would eventually hit a
fixed table size assuming no new inserts, and only updates/deletes, right?

Eventually, you'd have "holes" in the table, where an inserted record was
smaller then the "next largest contiguous free block", but what's left
over is too small for any further additions ... but I would think that
that would greatly reduce how often you'd have to do a vacuum, and, if we
split out ANALYZE, you could use that to update statistics ...

To speed up the search for the "next largest contiguous free block", a
special table.FAT could be used similar to an index?

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: vacuum timings

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Conclusions:
>     o  drop/recreate index is slower than vacuum of indexes

BTW, I did some profiling of CREATE INDEX this evening (quite
unintentionally actually; I was interested in COPY IN, but the pg_dump
script I used as driver happened to create some indexes too).  I was
startled to discover that 60% of the runtime of CREATE INDEX is spent in
_bt_invokestrat (which is called from tuplesort.c's comparetup_index,
and exists only to figure out which specific comparison routine to call).
Of this, a whopping 4% was spent in the useful subroutine, int4gt.  All
the rest went into lookup and validation checks that by rights should be
done once per index creation, not once per comparison.

In short: a fairly straightforward bit of optimization will eliminate
circa 50% of the CPU time consumed by CREATE INDEX.  All we need is to
figure out where to cache the lookup results.  The optimization would
improve insertions and lookups in indexes, as well, if we can cache
the lookup results in those scenarios.

This was for a table small enough that tuplesort.c could do the sort
entirely in memory, so I'm sure the gains would be smaller for a large
table that requires a disk-based sort.  Still, it seems worth looking
into...
        regards, tom lane


Re: vacuum timings

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Conclusions:
> >     o  drop/recreate index is slower than vacuum of indexes
> 
> BTW, I did some profiling of CREATE INDEX this evening (quite
> unintentionally actually; I was interested in COPY IN, but the pg_dump
> script I used as driver happened to create some indexes too).  I was
> startled to discover that 60% of the runtime of CREATE INDEX is spent in
> _bt_invokestrat (which is called from tuplesort.c's comparetup_index,
> and exists only to figure out which specific comparison routine to call).
> Of this, a whopping 4% was spent in the useful subroutine, int4gt.  All
> the rest went into lookup and validation checks that by rights should be
> done once per index creation, not once per comparison.

Good job, Tom.  Clearly a huge win.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] Re: vacuum timings

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: owner-pgsql-hackers@postgresql.org
> [mailto:owner-pgsql-hackers@postgresql.org]On Behalf Of Tom Lane
> 
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Conclusions:
> >     o  indexes never get smaller
> 
> Which we knew...
> 
> >     o  drop/recreate index is slower than vacuum of indexes
> 
> Quite a few people have reported finding the opposite in practice.
> You should probably try vacuuming after deleting or updating some
> fraction of the rows, rather than just the all or none cases.
>

Vacuum after delelting all rows isn't a worst case.
There's no moving in that case and vacuum doesn't need to call
index_insert() corresponding to the moving of heap tuples.

Vacuum after deleting half of rows may be one of the worst case.
In this case,index_delete() is called as many times as 'delete all'
case and expensive index_insert() is called for moved_in tuples.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp 


Re: [HACKERS] Re: vacuum timings

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> Vacuum after deleting half of rows may be one of the worst case.

Or equivalently, vacuum after updating all the rows.
        regards, tom lane


Re: [HACKERS] Re: vacuum timings

From
Bruce Momjian
Date:
> > Quite a few people have reported finding the opposite in practice.
> > You should probably try vacuuming after deleting or updating some
> > fraction of the rows, rather than just the all or none cases.
> >
> 
> Vacuum after delelting all rows isn't a worst case.
> There's no moving in that case and vacuum doesn't need to call
> index_insert() corresponding to the moving of heap tuples.
> 
> Vacuum after deleting half of rows may be one of the worst case.
> In this case,index_delete() is called as many times as 'delete all'
> case and expensive index_insert() is called for moved_in tuples.

I will test that.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: vacuum timings

From
Hiroshi Inoue
Date:
Bruce Momjian wrote:

> > > Quite a few people have reported finding the opposite in practice.
> > > You should probably try vacuuming after deleting or updating some
> > > fraction of the rows, rather than just the all or none cases.
> > >
> >
> > Vacuum after delelting all rows isn't a worst case.
> > There's no moving in that case and vacuum doesn't need to call
> > index_insert() corresponding to the moving of heap tuples.
> >
> > Vacuum after deleting half of rows may be one of the worst case.
> > In this case,index_delete() is called as many times as 'delete all'
> > case and expensive index_insert() is called for moved_in tuples.
>
> I will test that.
>

I tried my test case in less scale than Bruce.

CREATE TABLE t (id int4, dt int4);
for (i=0; i < 2500000; i++)   insert into t values ( i, (i * 1009) % 2500000);
delete from t where id < 1250000;

1) vacuum after create index on t(id)   405sec
2) vacuum after create index on t(dt)   > 3600sec   I gave up to continue execution.
3) vacuum and create index on t(id) and t(dt)   90sec + 114sec + 143sec = 347sec.

Seems random index insert is painful for vacuum.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp