Thread: Strange slow database

Strange slow database

From
David Mitchell
Date:
Hi all,

Recently our databases started to experience a significant slowdown.
Queries that were taking 500ms now take up to 20 seconds. An insert can
take 150ms or more. This is strange since we are still hitting indexes
and we vacuum regularly. Here is the description of our system:

A single server has a medium size database with around 30-40 tables.
Some tables have ~1000 rows. These tables tend to have lots of inserts
and deletes, so we vacuum them regularly (every ten minutes). One table
has 15 million rows, but it only every has inserts, never updated or
deleted, so this table never gets vacuumed. Vacuuming is very low impact:

vacuum_cost_delay = 50
vacuum_cost_limit = 100

We have fsync = true and max_connections = 150. There are 12 servers
each with the same schema and config, but different data (although the
number of rows in the tables are very similar). We assign users to a
server when we set them up so all the data for a ser is on one server.
This whole setup typically works well and is speedy, we have checked all
the indexes and they are hit when appropriate.

The table with 15million is the table on which inserts can take 150ms or
more (I've seen 800ms for a single insert before).

Strangely, this slowdown is on all 12 servers. Perhaps this is a
configuration issue? If anyone has any ideas we'd love to hear them.
Since we vacuum regularly and all the indexes get hit I don't know where
to go next.

Machines are Pentium 4 3.2Ghz running Linux kernel 2.6.10. Postgres
version 8.0.1. Machines have 2Gb ram and two 10k RPM disks in a RAID-0
configuration.

Regards
--
David Mitchell
Software Engineer
Telogis

Re: Strange slow database

From
Andrew Sullivan
Date:
On Fri, Nov 18, 2005 at 10:05:47AM +1300, David Mitchell wrote:
> Strangely, this slowdown is on all 12 servers. Perhaps this is a
> configuration issue? If anyone has any ideas we'd love to hear them.

It sounds like a use-pattern issue.  Did something in your use
change?  Any time you get repeatable change like that, look for the
common changed factor.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
When my information changes, I alter my conclusions.  What do you do sir?
        --attr. John Maynard Keynes

Re: Strange slow database

From
Tom Lane
Date:
David Mitchell <david.mitchell@telogis.com> writes:
> Recently our databases started to experience a significant slowdown.
> Queries that were taking 500ms now take up to 20 seconds. An insert can
> take 150ms or more. This is strange since we are still hitting indexes
> and we vacuum regularly. Here is the description of our system:

Have you checked to see whether you are encountering table or index
bloat?  Your vacuuming policy doesn't sound unreasonable on its face,
but there's no substitute for actually looking at physical file sizes
when you aren't sure what's wrong.  Try doing a pg_dump and reload into
a virgin database, then compare file sizes with the original
table-by-table to see if anything is drastically out of line.
Don't forget to look at the system catalogs too.

            regards, tom lane

Re: Strange slow database

From
David Mitchell
Date:
Tom Lane wrote:
> Have you checked to see whether you are encountering table or index
> bloat?  Your vacuuming policy doesn't sound unreasonable on its face,
> but there's no substitute for actually looking at physical file sizes
> when you aren't sure what's wrong.  Try doing a pg_dump and reload into
> a virgin database, then compare file sizes with the original
> table-by-table to see if anything is drastically out of line.
> Don't forget to look at the system catalogs too.

I dumped and loaded then wrote a script to compare the sizes of tables.
I noticed that a few of the indexes, especially the indexes on the
15million table shrunk a lot (~50% reduction - ~400Mb). Also, the
pg_attribute_relid_attnam_index index shrank loads, to 1% of its
original size. From this, we tried reindexing indexes and found we got
the same reduction in index size. Strangely though, our queries appear
to have sped up not just on the servers we reindexed, but across the
whole cluster - so apparently something else is affecting the speed of
this all, not just bloat. We're scratching our heads, this is nothing if
not a little frustrating.

--
David Mitchell
Software Engineer
Telogis

Re: Strange slow database

From
Tom Lane
Date:
David Mitchell <david.mitchell@telogis.com> writes:
> I dumped and loaded then wrote a script to compare the sizes of tables.
> I noticed that a few of the indexes, especially the indexes on the
> 15million table shrunk a lot (~50% reduction - ~400Mb). Also, the
> pg_attribute_relid_attnam_index index shrank loads, to 1% of its
> original size.

Hm, better check whether your vacuuming policy is taking care of the
system catalogs ...

> From this, we tried reindexing indexes and found we got
> the same reduction in index size. Strangely though, our queries appear
> to have sped up not just on the servers we reindexed, but across the
> whole cluster - so apparently something else is affecting the speed of
> this all, not just bloat.

Odd indeed.

            regards, tom lane

Re: Strange slow database

From
David Mitchell
Date:
Tom Lane wrote:
>>Also, the
>>pg_attribute_relid_attnam_index index shrank loads, to 1% of its
>>original size.
>
> Hm, better check whether your vacuuming policy is taking care of the
> system catalogs ...

Yes, fair point. So I did:

18/11/2005 03:02:29 INFO:  vacuuming "pg_catalog.pg_attribute"
18/11/2005 03:02:29 INFO:  index "pg_attribute_relid_attnam_index" now
contains 2861 row versions in 11900 pages
18/11/2005 03:02:29 DETAIL:  11834 index pages have been deleted, 11834
are currently reusable.
18/11/2005 03:02:29 CPU 0.11s/0.03u sec elapsed 0.45 sec.
18/11/2005 03:02:30 INFO:  index "pg_attribute_relid_attnum_index" now
contains 2861 row versions in 2942 pages
18/11/2005 03:02:30 DETAIL:  2917 index pages have been deleted, 2917
are currently reusable.
18/11/2005 03:02:30 CPU 0.01s/0.00u sec elapsed 0.22 sec.
18/11/2005 03:02:30 INFO:  "pg_attribute": found 0 removable, 2861
nonremovable row versions in 48 pages
18/11/2005 03:02:30 DETAIL:  0 dead row versions cannot be removed yet.
18/11/2005 03:02:30 There were 19 unused item pointers.
18/11/2005 03:02:30 0 pages are entirely empty.
18/11/2005 03:02:30 CPU 0.12s/0.03u sec elapsed 0.72 sec.
18/11/2005 03:02:30 VACUUM

This is from our vacuum process log. The first index there is the one in
question. Hmm, any clues? Of course everything is running at full speed
currently.
--
David Mitchell
Software Engineer
Telogis

Re: Strange slow database

From
Tom Lane
Date:
David Mitchell <david.mitchell@telogis.com> writes:
> 18/11/2005 03:02:29 INFO:  index "pg_attribute_relid_attnam_index" now
> contains 2861 row versions in 11900 pages
> 18/11/2005 03:02:29 DETAIL:  11834 index pages have been deleted, 11834
> are currently reusable.

> 18/11/2005 03:02:30 INFO:  index "pg_attribute_relid_attnum_index" now
> contains 2861 row versions in 2942 pages
> 18/11/2005 03:02:30 DETAIL:  2917 index pages have been deleted, 2917
> are currently reusable.

> 18/11/2005 03:02:30 INFO:  "pg_attribute": found 0 removable, 2861
> nonremovable row versions in 48 pages

Those indexes are pretty bloated :-( but pg_attribute itself seems to be
nice and tight.  I speculate that you did a VACUUM FULL on it recently.

The condition of the indexes suggests strongly that you've not been
vacuuming pg_attribute often enough (and perhaps not any of the other
system catalogs, either?).  Heavy use of temp tables will cause
pg_attribute and pg_class to bloat if you don't keep after them.

At the moment, a REINDEX seems to be indicated.

            regards, tom lane

Re: Strange slow database

From
David Mitchell
Date:
Tom Lane wrote:
> I speculate that you did a VACUUM FULL on it recently.

You speculate right, we do a vacuum full every sunday night as a safety
net. So the vacuum full was a week old.
>
> The condition of the indexes suggests strongly that you've not been
> vacuuming pg_attribute often enough (and perhaps not any of the other
> system catalogs, either?).  Heavy use of temp tables will cause
> pg_attribute and pg_class to bloat if you don't keep after them.

That's interesting. We stopped using temporary tables because they were
causing us lots of trouble, including table bloat. We vacuum
pg_attribute (and every other table with an entry in pg_tables) every
ten minutes. What other that temp tables could bloat pg_attribute? We
use refcursors to return data from our stored procedures. We also have a
few stored procedures that return SETOF.

Thanks for helping us out on this.
--
David Mitchell
Software Engineer
Telogis