Thread: Deleting indexes before vacuum?

Deleting indexes before vacuum?

From
Matthew Hagerty
Date:
Greetings,

I read a post sometime back where someone said their normal routine for
running a vacuum was to delete the indexes first, then recreate them after
the vacuum.  Is this necessary?  If the indexes are gone does vacuum
analyze still make sense?

I am concerned about this because I never removed the indexes prior to
vacuum, but today I executed this simple query:

select note_id from appnotes where note_id=6068;

note_id
------- 17768
(1 row)

This was rather alarming, so I deleted all the indexes, ran vacuum,
recreated the indexes.  Now the query works:

select note_id from appnotes where note_id=6068;

note_id
-------  6068
(1 row)

What are the general rules about running vacuum, frequency of use, etc.?

I am running pg-6.4 (I know it needs an update) on this particular server
and 6.5.x on others.

Thanks,
Matthew Hagerty


Re: [SQL] Deleting indexes before vacuum?

From
Tom Lane
Date:
Matthew Hagerty <matthew@venux.net> writes:
> I read a post sometime back where someone said their normal routine for
> running a vacuum was to delete the indexes first, then recreate them after
> the vacuum.  Is this necessary?  If the indexes are gone does vacuum
> analyze still make sense?

Sure.  Vacuum analyze is mostly about deleting dead tuples, reclaiming
the space they occupied, and computing statistics about column values.
These activities are useful whether there are indexes or not.  If there
are indexes, vacuum also cleans up useless entries in the indexes (ie,
pointers to dead tuples).

The reason for the drop index/vacuum/create index raindance is that
vacuum's method of cleaning up indexes seems to be horrendously
inefficient.  It actually takes less time to rebuild an index on a
large table *from scratch* than to let vacuum fix up the index.

This is, of course, pretty bogus.  I personally have no idea *why*
vacuum's index-handling code is so slow --- I've been griping about it
freely for a year or more, but have not had time to look into causes or
solutions.  Perhaps someone else has a better idea of what's going on
here (...paging Vadim...)

Anyway, the drop/vacuum/recreate business is nothing more nor less than
a quick-hack workaround for a performance deficiency in the current
implementation of vacuum.  That deficiency should go away eventually,
but in the meantime, if vacuum takes too long on your large tables,
you might give it a try.

> I am concerned about this because I never removed the indexes prior to
> vacuum, but today I executed this simple query:
> select note_id from appnotes where note_id=6068;
> note_id
> -------
>   17768
> (1 row)
> This was rather alarming, so I deleted all the indexes, ran vacuum,
> recreated the indexes.  Now the query works:

Ugh.  Definitely a busted index.  Vacuum is *not* designed to recover
from corrupted-data situations.  Dropping/rebuilding indexes will
recover from index corruption problems (with or without a vacuum), so
long as the underlying table is OK.  That seems to be what you saw here.

I have no words of wisdom about what might have caused the index
corruption --- if you can find a sequence that reproduces it, please
file a bug report!

> I am running pg-6.4 (I know it needs an update) on this particular server
> and 6.5.x on others.

6.5 is more stable than 6.4 AFAIK.  It's still got bugs of course...
        regards, tom lane


Re: [SQL] Deleting indexes before vacuum?

From
Oleg Bartunov
Date:
I see another reason for index/vacuum/create
Currently index files grows infinitely because vacuum analyze
doesn't truncates them. Vadim has implemented in 6.5.3 a quick hack for
index reuse, but indices still grow. This cause a visible performance
degradation if your table is often updated.
The question is: what's the right way to do index/vacuum/create ?
Do I need transaction ?
Regards,    Oleg

On Sat, 20 Nov 1999, Tom Lane wrote:

> Date: Sat, 20 Nov 1999 01:14:25 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Matthew Hagerty <matthew@venux.net>
> Cc: pgsql-sql@postgreSQL.org
> Subject: Re: [SQL] Deleting indexes before vacuum? 
> 
> Matthew Hagerty <matthew@venux.net> writes:
> > I read a post sometime back where someone said their normal routine for
> > running a vacuum was to delete the indexes first, then recreate them after
> > the vacuum.  Is this necessary?  If the indexes are gone does vacuum
> > analyze still make sense?
> 
> Sure.  Vacuum analyze is mostly about deleting dead tuples, reclaiming
> the space they occupied, and computing statistics about column values.
> These activities are useful whether there are indexes or not.  If there
> are indexes, vacuum also cleans up useless entries in the indexes (ie,
> pointers to dead tuples).
> 
> The reason for the drop index/vacuum/create index raindance is that
> vacuum's method of cleaning up indexes seems to be horrendously
> inefficient.  It actually takes less time to rebuild an index on a
> large table *from scratch* than to let vacuum fix up the index.
> 
> This is, of course, pretty bogus.  I personally have no idea *why*
> vacuum's index-handling code is so slow --- I've been griping about it
> freely for a year or more, but have not had time to look into causes or
> solutions.  Perhaps someone else has a better idea of what's going on
> here (...paging Vadim...)
> 
> Anyway, the drop/vacuum/recreate business is nothing more nor less than
> a quick-hack workaround for a performance deficiency in the current
> implementation of vacuum.  That deficiency should go away eventually,
> but in the meantime, if vacuum takes too long on your large tables,
> you might give it a try.
> 
> > I am concerned about this because I never removed the indexes prior to
> > vacuum, but today I executed this simple query:
> > select note_id from appnotes where note_id=6068;
> > note_id
> > -------
> >   17768
> > (1 row)
> > This was rather alarming, so I deleted all the indexes, ran vacuum,
> > recreated the indexes.  Now the query works:
> 
> Ugh.  Definitely a busted index.  Vacuum is *not* designed to recover
> from corrupted-data situations.  Dropping/rebuilding indexes will
> recover from index corruption problems (with or without a vacuum), so
> long as the underlying table is OK.  That seems to be what you saw here.
> 
> I have no words of wisdom about what might have caused the index
> corruption --- if you can find a sequence that reproduces it, please
> file a bug report!
> 
> > I am running pg-6.4 (I know it needs an update) on this particular server
> > and 6.5.x on others.
> 
> 6.5 is more stable than 6.4 AFAIK.  It's still got bugs of course...
> 
>             regards, tom lane
> 
> ************
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: [SQL] Deleting indexes before vacuum?

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> The question is: what's the right way to do drop index/vacuum/create ?
> Do I need transaction ?

VACUUM should not be run inside a transaction block.  (I am planning to
make the code enforce this for 7.0, but it doesn't yet.)  So you can't
protect the whole sequence with a transaction.  AFAICS the only real
problem is that if you are depending on UNIQUE indexes to catch
attempts to insert duplicate data, an insertion that got in between
the drop and recreate wouldn't get checked.

I have been toying with the notion of ripping out the existing VACUUM
index handling code and putting in fresh code that would simply do an
index drop and rebuild ;-).  This could happen inside VACUUM's exclusive
lock on the table, so it'd be perfectly safe whereas doing it the manual
way is not.  But I do not want to do this unless Vadim approves it as a
good idea --- perhaps the existing index-vacuuming code can be fixed to
be an even better solution than this.  I haven't looked at the code to
understand why it's so slow or whether there's a way to make it better.
        regards, tom lane


Re: [SQL] Deleting indexes before vacuum?

From
Oleg Bartunov
Date:
When I did some performance testing I got several times index
corruption. The test was simple - 30 concurrent processes 
update *one* row and one process vacuuming the same table in parallel.

vacuuming was done using shell script:

while true ;dopsql  -tq discovery <./vacuum_hits.sqlrc=$?i=$((i+1))echo Vaccuming: $i, RC=$rcsleep 10;
done

and ./vacuum_hits.sql:

drop index hits_pkey;
vacuum analyze hits(msg_id);
create unique index hits_pkey on hits(msg_id);

Message was:

NOTICE:  Index hits_pkey: NUMBER OF INDEX' TUPLES (187) IS NOT THE SAME AS HEAP' (190)

Regards,
    Oleg

On Sat, 20 Nov 1999, Tom Lane wrote:

> Date: Sat, 20 Nov 1999 12:01:12 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Oleg Bartunov <oleg@sai.msu.su>
> Cc: Matthew Hagerty <matthew@venux.net>, pgsql-sql@postgreSQL.org
> Subject: Re: [SQL] Deleting indexes before vacuum? 
> 
> Oleg Bartunov <oleg@sai.msu.su> writes:
> > The question is: what's the right way to do drop index/vacuum/create ?
> > Do I need transaction ?
> 
> VACUUM should not be run inside a transaction block.  (I am planning to
> make the code enforce this for 7.0, but it doesn't yet.)  So you can't
> protect the whole sequence with a transaction.  AFAICS the only real
> problem is that if you are depending on UNIQUE indexes to catch
> attempts to insert duplicate data, an insertion that got in between
> the drop and recreate wouldn't get checked.
> 
> I have been toying with the notion of ripping out the existing VACUUM
> index handling code and putting in fresh code that would simply do an
> index drop and rebuild ;-).  This could happen inside VACUUM's exclusive
> lock on the table, so it'd be perfectly safe whereas doing it the manual
> way is not.  But I do not want to do this unless Vadim approves it as a
> good idea --- perhaps the existing index-vacuuming code can be fixed to
> be an even better solution than this.  I haven't looked at the code to
> understand why it's so slow or whether there's a way to make it better.
> 
>             regards, tom lane
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: [SQL] Deleting indexes before vacuum?

From
Oleg Bartunov
Date:
Well,

running my test right now I got postgres crush :-(
Linux 2.2.13, 6.5.3
My scenario:

30 httpd concurrently update table hits - the same row and
I use LOCK TABLE hits IN SHARE ROW EXCLUSIVE MODE.
The same table is permanently vacuumed as I described in previous message
Also I monitor database using psql.
after about 2000 requests in apaches' error log:

NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 

All persistent connections between httpd and postgres has destroyed
and what's interesting my psql client aslo crashed with message:
discovery=> select last_hit();
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 
We have lost the connection to the backend, so further processing is impossible.  Terminating.

postmaster is still alive and I could connect to database,
I had to restart httpd servers to reinitiate db connection.

Here is backtrace from core:
22:12:12[om]:/usr/local/pgsql/data/base/discovery$ gdb /usr/local/pgsql/bin/postgres core
GDB is free software and you are welcome to distribute copies of itunder certain conditions; type "show copying" to see
theconditions.
 
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (i486-slackware-linux), 
Copyright 1996 Free Software Foundation, Inc...

warning: core file may not match specified executable file.
Core was generated by /usr/local/pgsql/bin/postgres localhost httpd discovery idle                  '.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libdl.so.1...done.
Reading symbols from /lib/libm.so.5...done.
Reading symbols from /lib/libtermcap.so.2...done.
Reading symbols from /lib/libc.so.5...done.
Reading symbols from /lib/ld-linux.so.1...done.
#0  0x806b29b in heap_endscan ()
(gdb) bt 
#0  0x806b29b in heap_endscan ()
#1  0x80d05a5 in index_info ()
#2  0x80d03b9 in find_secondary_index ()
#3  0x80d02e6 in find_relation_indices ()
#4  0x80c6059 in set_base_rel_pathlist ()
#5  0x80c5fdf in make_one_rel ()
#6  0x80cb105 in subplanner ()
#7  0x80cb05e in query_planner ()
#8  0x80cb3ef in union_planner ()
#9  0x80cb1e4 in planner ()
#10 0x80ed018 in pg_parse_and_plan ()
#11 0x80ed16a in pg_exec_query_dest ()
#12 0x80ed11e in pg_exec_query ()
#13 0x80ee3af in PostgresMain ()
#14 0x80d62a0 in DoBackend ()
#15 0x80d5dba in BackendStartup ()
#16 0x80d5449 in ServerLoop ()
#17 0x80d4f5e in PostmasterMain ()
#18 0x80a654e in main ()
#19 0x8060e3e in _start ()
(gdb)  


Beside this problem I see another:

What will happens if one process doing drop/vacuum/create
while another tries to access database. It seems I already 
encounter this problem. I see in apaches error log rather rare
messages:

[Sat Nov 20 22:06:51 1999] [error] DBD::Pg::st execute failed: ERROR:  mdopen: c
ouldn't open hits_pkey: No such file or directory

So, in this environment it would be nice to have a possibility
to do 'atomic' drop/vacuum/create

I remind I need vacuuming because of dramatic performance degradation
during permanent update. I need droping/creating index because of
their infinite growing, well not so fast with 6.5.3 but still.
I build a Web site which expects about 5 mln req/day and all requests
are db-driven. Most requests are read-only and I'm not worried
about them, but I'd like to keep hit count for each message and 
most performance drops here. Currently my home machine 
P166, 64Mb RAM, Apache 1.3.9 + mod_perl, Postgres 6.5.3 keeps
about 2mln reqs/day which is rather good performance, average load is about
3-4, so I could work. Another question is a stability :-)

Regards,
    Oleg

On Sat, 20 Nov 1999, Oleg Bartunov wrote:

> Date: Sat, 20 Nov 1999 21:57:59 +0300 (MSK)
> From: Oleg Bartunov <oleg@sai.msu.su>
> To: Tom Lane <tgl@sss.pgh.pa.us>
> Cc: Matthew Hagerty <matthew@venux.net>, pgsql-sql@postgreSQL.org
> Subject: Re: [SQL] Deleting indexes before vacuum? 
> 
> When I did some performance testing I got several times index
> corruption. The test was simple - 30 concurrent processes 
> update *one* row and one process vacuuming the same table in parallel.
> 
> vacuuming was done using shell script:
> 
> while true ;do
>  psql  -tq discovery <./vacuum_hits.sql
>  rc=$?
>  i=$((i+1))
>  echo Vaccuming: $i, RC=$rc
>  sleep 10;
> done
> 
> and ./vacuum_hits.sql:
> 
> drop index hits_pkey;
> vacuum analyze hits(msg_id);
> create unique index hits_pkey on hits(msg_id);
> 
> Message was:
> 
> NOTICE:  Index hits_pkey: NUMBER OF INDEX' TUPLES (187) IS NOT THE SAME AS HEAP' (190)
> 
> 
>     Regards,
> 
>         Oleg
> 
> On Sat, 20 Nov 1999, Tom Lane wrote:
> 
> > Date: Sat, 20 Nov 1999 12:01:12 -0500
> > From: Tom Lane <tgl@sss.pgh.pa.us>
> > To: Oleg Bartunov <oleg@sai.msu.su>
> > Cc: Matthew Hagerty <matthew@venux.net>, pgsql-sql@postgreSQL.org
> > Subject: Re: [SQL] Deleting indexes before vacuum? 
> > 
> > Oleg Bartunov <oleg@sai.msu.su> writes:
> > > The question is: what's the right way to do drop index/vacuum/create ?
> > > Do I need transaction ?
> > 
> > VACUUM should not be run inside a transaction block.  (I am planning to
> > make the code enforce this for 7.0, but it doesn't yet.)  So you can't
> > protect the whole sequence with a transaction.  AFAICS the only real
> > problem is that if you are depending on UNIQUE indexes to catch
> > attempts to insert duplicate data, an insertion that got in between
> > the drop and recreate wouldn't get checked.
> > 
> > I have been toying with the notion of ripping out the existing VACUUM
> > index handling code and putting in fresh code that would simply do an
> > index drop and rebuild ;-).  This could happen inside VACUUM's exclusive
> > lock on the table, so it'd be perfectly safe whereas doing it the manual
> > way is not.  But I do not want to do this unless Vadim approves it as a
> > good idea --- perhaps the existing index-vacuuming code can be fixed to
> > be an even better solution than this.  I haven't looked at the code to
> > understand why it's so slow or whether there's a way to make it better.
> > 
> >             regards, tom lane
> > 
> 
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
> 
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



RE: [SQL] Deleting indexes before vacuum?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: owner-pgsql-sql@postgreSQL.org [mailto:owner-pgsql-sql@postgreSQL.
> org]On Behalf Of Tom Lane
> Sent: Sunday, November 21, 1999 2:01 AM
> To: Oleg Bartunov
> Cc: Matthew Hagerty; pgsql-sql@postgreSQL.org
> Subject: Re: [SQL] Deleting indexes before vacuum?
>
>
> Oleg Bartunov <oleg@sai.msu.su> writes:
> > The question is: what's the right way to do drop index/vacuum/create ?
> > Do I need transaction ?
>
> VACUUM should not be run inside a transaction block.  (I am planning to
> make the code enforce this for 7.0, but it doesn't yet.)  So you can't
> protect the whole sequence with a transaction.  AFAICS the only real
> problem is that if you are depending on UNIQUE indexes to catch
> attempts to insert duplicate data, an insertion that got in between
> the drop and recreate wouldn't get checked.
>
> I have been toying with the notion of ripping out the existing VACUUM
> index handling code and putting in fresh code that would simply do an
> index drop and rebuild ;-).  This could happen inside VACUUM's exclusive
> lock on the table, so it'd be perfectly safe whereas doing it the manual
> way is not.  But I do not want to do this unless Vadim approves it as a
> good idea --- perhaps the existing index-vacuuming code can be fixed to
> be an even better solution than this.  I haven't looked at the code to
> understand why it's so slow or whether there's a way to make it better.
>

Shouldn't vacuum preserve consistency even in case of abort ?
Currently PostgreSQL doesn't do little in case of abort(even commit also ?).
So I think it's a strong limitation.

It's well known that rebuilding indexes is faster than insert/deleting
many index tuples.
I have thought rebuilding indexes in vacuum for half a year.
But I don't have a reasonable solution yet.

Rebuiding indexes in vacuum has a big flaw that index may vanish in
case of abort and we may have to recreate index manually.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [SQL] Deleting indexes before vacuum?

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> I have been toying with the notion of ripping out the existing VACUUM
>> index handling code and putting in fresh code that would simply do an
>> index drop and rebuild ;-).

> Shouldn't vacuum preserve consistency even in case of abort ?

That would be nice, certainly, but the existing code doesn't guarantee
it either, and I don't see how we could guarantee it by any method.
If you're moving tuples around then you are going to have some
inconsistent intermediate states :-(.  However, the big problem in
that respect is an abort while vacuum is reshuffling the data in the
table itself; changing our approach to index vacuuming isn't going
to make any difference there.  Once the table is finished and vacuum
moves on to fixing the indexes, an abort would leave the index(es)
corrupt but the table data should be OK.  So the issue is how to
recover the indexes after that happens.

> Rebuiding indexes in vacuum has a big flaw that index may vanish in
> case of abort and we may have to recreate index manually.

The index won't "vanish" --- I see no reason to touch the system-table
entries for it.  The data in the index might be corrupt, but that can
happen now.

ISTM that a big advantage of the rebuild approach is that if something
does go wrong during the index-fixing phase, you can try to recover just
by doing another vacuum.  That strikes me as less "manual" than dropping
and rebuilding the indexes, which is the only available recovery path
now.  It might even work for an index on a system table...
        regards, tom lane


RE: [SQL] Deleting indexes before vacuum?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Monday, November 22, 1999 12:18 PM
> To: Hiroshi Inoue
> Cc: pgsql-sql@postgreSQL.org
> Subject: Re: [SQL] Deleting indexes before vacuum? 
> 
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> I have been toying with the notion of ripping out the existing VACUUM
> >> index handling code and putting in fresh code that would simply do an
> >> index drop and rebuild ;-).
> 
> > Shouldn't vacuum preserve consistency even in case of abort ?
> 
> That would be nice, certainly, but the existing code doesn't guarantee
> it either, and I don't see how we could guarantee it by any method.
> If you're moving tuples around then you are going to have some
> inconsistent intermediate states :

AFAIK,moving tuples in vacuum doesn't cause inconsistency.
Time qualification doesn't regard tuples as valid which are newly
being insert(mov)ed until the internal commit of vacuum.
Index tuples are inserted corrsponding to being inserted tuples
i.e index tuples are held for both being deleted and being inserted
heap tuples.
After the internal commit of vacuum,old delet(mov)ed tuples are
no longer valid and newly insert(mov)ed tuples become valid
and from this point we could freely delete index tuples correponding
to deleted heap tuples.

>-(.  However, the big problem in
> that respect is an abort while vacuum is reshuffling the data in the
> table itself; changing our approach to index vacuuming isn't going
> to make any difference there.  Once the table is finished and vacuum
> moves on to fixing the indexes, an abort would leave the index(es)
> corrupt but the table data should be OK.  So the issue is how to
> recover the indexes after that happens.
> 
> > Rebuiding indexes in vacuum has a big flaw that index may vanish in
> > case of abort and we may have to recreate index manually.
> 
> The index won't "vanish" --- I see no reason to touch the system-table
> entries for it.  The data in the index might be corrupt, but that can
> happen now.
>

I have thought about rebuilding indexes in vacuum only for speeding things 
up.  Removing(or invalidating) indexes is necessary if we rebuild indexes
in vacuum and want to preserve consistency between heap and index
relations even in case abort.
There's no inconsistency if there's no index.

> ISTM that a big advantage of the rebuild approach is that if something
> does go wrong during the index-fixing phase, you can try to recover just
> by doing another vacuum.  That strikes me as less "manual" than dropping
> and rebuilding the indexes, which is the only available recovery path
> now.  It might even work for an index on a system table...
>

Hmm,if system indexes corrupt,we may not be able to start postmaster
as Jan says.  There should be an utility which is executed in bootstrap
mode if Index recovery is really needed.
Regards.

Hiroshi Inoue
Inoue@tpf.co.jp