Thread: Database size growing over time and leads to performance impact

Database size growing over time and leads to performance impact

From
"Gnanakumar"
Date:

Hi,

 

We're using PostgreSQL 8.2.  Recently, in our production database, there was a severe performance impact..  Even though, we're regularly doing both:

1.     VACUUM FULL ANALYZE once in a week during low-usage time and

2.     ANALYZE everyday at low-usage time

 

Also, we noticed that the physical database size has grown upto 30 GB.  But, if I dump the database in the form of SQL and import it locally in my machine, it was only 3.2 GB.  Then while searching in Google to optimize database size, I found the following useful link:

 

http://www.linuxinsight.com/optimize_postgresql_database_size.html

 

It says that even vacuumdb or reindexdb doesn't really compact database size, only dump/restore does because of MVCC architecture feature in PostgreSQL and this has been proven here.

 

So, finally we decided to took our production database offline and performed dump/restore.  After this, the physical database size has also reduced from  30 GB to 3.5 GB and the performance was also very good than it was before.

 

Physical database size was found using the following command:

du -sh /usr/local/pgsql/data/base/<database-oid>

 

I also cross-checked this size using "pg_size_pretty(pg_database_size(datname))".

 

Questions

1.     Is there any version/update of PostgreSQL addressing this issue?

2.     How in real time, this issues are handled by other PostgreSQL users without taking to downtime?

3.     Any ideas or links whether this is addressed in upcoming PostgreSQL version 9.0 release?

 

Re: Database size growing over time and leads to performance impact

From
Andy Colson
Date:
On 03/27/2010 08:00 AM, Gnanakumar wrote:
> Hi,
>
> We're using PostgreSQL 8.2. Recently, in our production database, there
> was a severe performance impact.. Even though, we're regularly doing both:
>
> 1. VACUUM FULL ANALYZE once in a week during low-usage time and
>
> 2. ANALYZE everyday at low-usage time
>
> Also, we noticed that the physical database size has grown upto 30 GB.
> But, if I dump the database in the form of SQL and import it locally in
> my machine, it was only 3.2 GB. Then while searching in Google to
> optimize database size, I found the following useful link:
>
> http://www.linuxinsight.com/optimize_postgresql_database_size.html
>
> It says that even vacuumdb or reindexdb doesn't really compact database
> size, only dump/restore does because of MVCC architecture feature in
> PostgreSQL and this has been proven here.
>
> So, finally we decided to took our production database offline and
> performed dump/restore. After this, the physical database size has also
> reduced from 30 GB to 3.5 GB and the performance was also very good than
> it was before.
>
> Physical database size was found using the following command:
>
> du -sh /usr/local/pgsql/data/base/<database-oid>
>
> I also cross-checked this size using
> "pg_size_pretty(pg_database_size(datname))".
>
> Questions
>
> 1. Is there any version/update of PostgreSQL addressing this issue?
>
> 2. How in real time, this issues are handled by other PostgreSQL users
> without taking to downtime?
>
> 3. Any ideas or links whether this is addressed in upcoming PostgreSQL
> version 9.0 release?
>

The "issue" is not with PG's.  Any newer version of PG will act exactly the same.  I don't think you understand.
Vacuumis not meant to reduce size of the db, its meant to mark pages for reuse.  VACUUM FULL is almost never needed.
Thefact it didnt reduce your db size is probably because of something else, like an open transaction.  If you have a
transactionleft open, then your db will never be able to shrink or re-use pages.  You'd better fix that issue first.
(runps -ax|grep postgres  and look for "idle in transaction") 

You need to vacuum way more often than once a week.  Just VACUUM ANALYZE, two, three times a day.  Or better yet, let
autovacuumdo its thing.  (if you do have autovacuum enabled, then the only problem is the open transaction thing). 

Dont "VACUUM FULL", its not helping you, and is being removed in newer versions.

-Andy

Re: Database size growing over time and leads to performance impact

From
"Pierre C"
Date:
> 1.     VACUUM FULL ANALYZE once in a week during low-usage time and

VACUUM FULL compacts tables, but tends to bloat indexes. Running it weekly
is NOT RECOMMENDED.

A correctly configured autovacuum (or manual vacuum in some circumstances)
should maintain your DB healthy and you shouldn't need VACUUM FULL.

If you realize you got a bloat problem, for instance due to a
misconfigured vacuum, use CLUSTER, which re-generates table AND index
data, and besides, having your table clustered on an index of your choice
can boost performance quite a lot in some circumstances.

8.2 is so old I don't remember if autovacuum is even included. Please try
upgrading to the latest version...

Since your database probably fits in RAM, CLUSTER will be pretty fast.
You can schedule it weekly, if you need clustering. If you don't,
autovacuum will suffice.
Hint : add a "SELECT count(*) FROM yourtable;" before "CLUSTER yourtable;"
so that the table is pulled in the OS disk cache, it'll make CLUSTER
faster.


Re: Database size growing over time and leads to performance impact

From
Greg Smith
Date:
Pierre C wrote:
> If you realize you got a bloat problem, for instance due to a
> misconfigured vacuum, use CLUSTER, which re-generates table AND index
> data, and besides, having your table clustered on an index of your
> choice can boost performance quite a lot in some circumstances.
>
> 8.2 is so old I don't remember if autovacuum is even included. Please
> try upgrading to the latest version...

In 8.2, it's included, but not turned on by default.  And it can only
have a single autovacuum worker, which limits its ability to keep up
with more difficult workloads.

As for CLUSTER, the implementation in 8.2 is limited compared to the 8.3
one.  If you look at
http://www.postgresql.org/docs/8.2/static/sql-cluster.html you'll see a
scary paragraph starting with "CLUSTER loses all visibility information
of tuples..." that is missing from later versions, because that problem
was fixed in 8.3.  I try to avoid using CLUSTER on 8.2 or earlier
versions unless I can block all clients during the maintenance window
it's running in.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Database size growing over time and leads to performance impact

From
"Gnanakumar"
Date:
We're using pgpool-II version 2.0.1 for PostgreSQL connection management.

pgpool configurations are:
num_init_children = 450
child_life_time = 300
connection_life_time = 120
child_max_connections = 30

As you recommended, I ran "ps -ax|grep postgres" at almost a busy
transaction time and I can find "idle" entries:
[root@newuser ~]# ps -ax|grep postgres
 2664 ?        Ss     0:00 postgres: newuser mydb 192.168.0.200(43545) idle
 2783 ?        Ss     0:00 postgres: newuser mydb 192.168.0.200(43585) idle
 2806 ?        Ss     0:02 postgres: newuser mydb 192.168.0.200(43588) idle
 2807 ?        Ss     0:01 postgres: newuser mydb 192.168.0.200(43589) idle
 2818 ?        Ss     0:00 postgres: newuser mydb 192.168.0.200(43601) idle
 2819 ?        Ss     0:00 postgres: newuser mydb 192.168.0.200(43602) idle
 2833 ?        Ss     0:02 postgres: newuser mydb 192.168.0.200(43603) idle
 2856 ?        Ss     0:03 postgres: newuser mydb 192.168.0.200(43614) idle

Based on pgpool documentation, and also as far as I know, even though
application layer returns/closes the application, pgpool will only handle
actual closing of connections based on the connection_life_time parameter
defined.  And if this timeout, it goes to "wait for connection request"
state.

Can you throw some light on this?  Is there any better way that we need to
re-configure our pgpool parameters?

-----Original Message-----
From: Andy Colson [mailto:andy@squeakycode.net]
Sent: Saturday, March 27, 2010 7:06 PM
To: Gnanakumar; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Database size growing over time and leads to
performance impact

On 03/27/2010 08:00 AM, Gnanakumar wrote:
> Hi,
>
> We're using PostgreSQL 8.2. Recently, in our production database, there
> was a severe performance impact.. Even though, we're regularly doing both:
>
> 1. VACUUM FULL ANALYZE once in a week during low-usage time and
>
> 2. ANALYZE everyday at low-usage time
>
> Also, we noticed that the physical database size has grown upto 30 GB.
> But, if I dump the database in the form of SQL and import it locally in
> my machine, it was only 3.2 GB. Then while searching in Google to
> optimize database size, I found the following useful link:
>
> http://www.linuxinsight.com/optimize_postgresql_database_size.html
>
> It says that even vacuumdb or reindexdb doesn't really compact database
> size, only dump/restore does because of MVCC architecture feature in
> PostgreSQL and this has been proven here.
>
> So, finally we decided to took our production database offline and
> performed dump/restore. After this, the physical database size has also
> reduced from 30 GB to 3.5 GB and the performance was also very good than
> it was before.
>
> Physical database size was found using the following command:
>
> du -sh /usr/local/pgsql/data/base/<database-oid>
>
> I also cross-checked this size using
> "pg_size_pretty(pg_database_size(datname))".
>
> Questions
>
> 1. Is there any version/update of PostgreSQL addressing this issue?
>
> 2. How in real time, this issues are handled by other PostgreSQL users
> without taking to downtime?
>
> 3. Any ideas or links whether this is addressed in upcoming PostgreSQL
> version 9.0 release?
>

The "issue" is not with PG's.  Any newer version of PG will act exactly the
same.  I don't think you understand.  Vacuum is not meant to reduce size of
the db, its meant to mark pages for reuse.  VACUUM FULL is almost never
needed.  The fact it didnt reduce your db size is probably because of
something else, like an open transaction.  If you have a transaction left
open, then your db will never be able to shrink or re-use pages.  You'd
better fix that issue first.  (run ps -ax|grep postgres  and look for "idle
in transaction")

You need to vacuum way more often than once a week.  Just VACUUM ANALYZE,
two, three times a day.  Or better yet, let autovacuum do its thing.  (if
you do have autovacuum enabled, then the only problem is the open
transaction thing).

Dont "VACUUM FULL", its not helping you, and is being removed in newer
versions.

-Andy


Re: Database size growing over time and leads to performance impact

From
Andy Colson
Date:
On 3/30/2010 6:17 AM, Gnanakumar wrote:
> We're using pgpool-II version 2.0.1 for PostgreSQL connection management.
>
> pgpool configurations are:
> num_init_children = 450
> child_life_time = 300
> connection_life_time = 120
> child_max_connections = 30
>
> As you recommended, I ran "ps -ax|grep postgres" at almost a busy
> transaction time and I can find "idle" entries:
> [root@newuser ~]# ps -ax|grep postgres
>   2664 ?        Ss     0:00 postgres: newuser mydb 192.168.0.200(43545) idle
>   2783 ?        Ss     0:00 postgres: newuser mydb 192.168.0.200(43585) idle
>   2806 ?        Ss     0:02 postgres: newuser mydb 192.168.0.200(43588) idle
>   2807 ?        Ss     0:01 postgres: newuser mydb 192.168.0.200(43589) idle
>   2818 ?        Ss     0:00 postgres: newuser mydb 192.168.0.200(43601) idle
>   2819 ?        Ss     0:00 postgres: newuser mydb 192.168.0.200(43602) idle
>   2833 ?        Ss     0:02 postgres: newuser mydb 192.168.0.200(43603) idle
>   2856 ?        Ss     0:03 postgres: newuser mydb 192.168.0.200(43614) idle
>
> Based on pgpool documentation, and also as far as I know, even though
> application layer returns/closes the application, pgpool will only handle
> actual closing of connections based on the connection_life_time parameter
> defined.  And if this timeout, it goes to "wait for connection request"
> state.
>
> Can you throw some light on this?  Is there any better way that we need to
> re-configure our pgpool parameters?
>

Connections are ok.  Connection is different than transaction.  The
output above looks good, that's what you want to see.  (If it had said
"idle in transaction" that would be a problem).  I dont think you need
to change anything.

Hopefully just vacuuming more often will help.

-Andy


Re: Database size growing over time and leads to performance impact

From
Greg Smith
Date:
Please don't cc two of the lists here.  It makes things difficult for
users who only subscribe to one list or the other who reply--their post
to the other list will be held for moderation.  And that's a pain for
the moderators too.  In this case, either the pgsql-admin or
pgsql-performance list would have been appropriate for this question,
but not both at the same time.  The suggested approach when unsure is to
try the most obvious list, and if you don't get a response after a day
or two then try a second one.

Gnanakumar wrote:
>
> We're using PostgreSQL 8.2.  Recently, in our production database,
> there was a severe performance impact..  Even though, we're regularly
> doing both:
>
> 1.     VACUUM FULL ANALYZE once in a week during low-usage time and
>
> 2.     ANALYZE everyday at low-usage time
>
>
> Also, we noticed that the physical database size has grown upto 30
> GB.  But, if I dump the database in the form of SQL and import it
> locally in my machine, it was only 3.2 GB.
>

Most VACUUM problems are caused by not running VACUUM often enough.  A
weekly VACUUM is really infrequent.  And it's rarely ever a good idea to
run VACUUM FULL.

You should switch over to running a regular VACUUM, not a full one, on
something closer to a daily or more frequent basis instead.

> Then while searching in Google to optimize database size, I found the
> following useful link:
>
> http://www.linuxinsight.com/optimize_postgresql_database_size.html
>
> It says that even vacuumdb or reindexdb doesn't really compact
> database size, only dump/restore does because of MVCC architecture
> feature in PostgreSQL and this has been proven here.
>

That article covers PostgreSQL as of V7.4, and much of it is outdated
information that doesn't apply to the 8.2 you're running.  It's a pretty
bad description even of that version.  You should try to forget
everything you read there and instead look at
http://www.postgresql.org/docs/8.2/interactive/maintenance.html for an
accurate introduction to this topic.  I'm sorry you've been misled by it.

> Physical database size was found using the following command:
>
> du -sh /usr/local/pgsql/data/base/<database-oid>
>
> I also cross-checked this size using
> "pg_size_pretty(pg_database_size(datname))".
>

You should use the queries shown at
http://wiki.postgresql.org/wiki/Disk_Usage instead of this, which will
break down where the disk space is going by table and index.  You will
discover one of two things:

1) As the database grows, most of the disk space is being taken up by
the tables themselves.  In this case, a more frequent VACUUM is likely
to make that go away.  You might also need to bump up one of the
parameters in the postgresql.conf file, max_fsm_pages

2) Lots of disk space is being taken up by indexes on the tables.  If
this is the case, the fact that you're running VACUUM FULL all the time
is the likely cause of your problem.


> Questions
>
> 1.     Is there any version/update of PostgreSQL addressing this issue?
>
> 2.     How in real time, this issues are handled by other PostgreSQL
> users without taking to downtime?
>
> 3.     Any ideas or links whether this is addressed in upcoming
> PostgreSQL version 9.0 release?
>
>
>

PostgreSQL 8.3 turns on a better tuned autovacuum by default so that
it's more likely VACUUM will run often enough to keep the problem you're
having from happening.  8.4 removes an additional source of problems
that can cause VACUUM to stop working.  As of 8.4, most of the problems
in this area are gone in the default configuration.  Just looking at
newer versions of the associated documentation will give you an idea
what's changed;
http://www.postgresql.org/docs/current/interactive/maintenance.html is
the 8.4 version.  The problems with VACUUM FULL are so bad that as of
9.0, the old implementation of that (the one you're probably getting bad
behavior from) has been replaced by a more efficient one.

The main situation newer PostgreSQL versions can still run into a
problem where the indexes get large if you're deleting records in some
ways; http://www.postgresql.org/docs/8.2/static/routine-reindex.html
describes that issue, and that bit of documentation and the underlying
behavior is unchanged in later releases.  It's much more likely that
you're running into the very common situation instead where you're
running VACUUM FULL infrequently, where you should be running regular
VACUUM frequently instead.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Database size growing over time and leads to performance impact

From
Scott Carey
Date:
On Mar 27, 2010, at 6:35 AM, Andy Colson wrote:
>
> Dont "VACUUM FULL", its not helping you, and is being removed in newer versions.
>

Off topic:  How is that going to work?  CLUSTER doesn't work on tables without an index.  I would love to be able to
CLUSTERon some column set that doesn't necessarily have an index. 

> -Andy
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: Database size growing over time and leads to performance impact

From
Robert Haas
Date:
On Wed, Mar 31, 2010 at 4:37 PM, Scott Carey <scott@richrelevance.com> wrote:
> On Mar 27, 2010, at 6:35 AM, Andy Colson wrote:
>>
>> Dont "VACUUM FULL", its not helping you, and is being removed in newer versions.
>>
>
> Off topic:  How is that going to work?  CLUSTER doesn't work on tables without an index.  I would love to be able to
CLUSTERon some column set that doesn't necessarily have an index. 

I believe the new VF implementation just rewrites the data in the same
physical order as it was in previously, but without the dead space.
So it's sort of like cluster-by-no-index-at-all.

...Robert

Re: Database size growing over time and leads to performance impact

From
Alvaro Herrera
Date:
Scott Carey wrote:
>
> On Mar 27, 2010, at 6:35 AM, Andy Colson wrote:
> >
> > Dont "VACUUM FULL", its not helping you, and is being removed in newer versions.
> >
>
> Off topic:  How is that going to work?  CLUSTER doesn't work on tables
> without an index.  I would love to be able to CLUSTER on some column
> set that doesn't necessarily have an index.

VACUUM FULL has been rewritten in 9.0 so that it uses the CLUSTER logic,
except that it doesn't require an index.

If you want to do it in earlier versions, you can use a no-op SET TYPE
command, like so:

ALTER TABLE foo ALTER COLUMN bar SET TYPE baz;

assuming that table foo has a column bar which is already of type baz.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Database size growing over time and leads to performance impact

From
Scott Carey
Date:
On Mar 31, 2010, at 1:47 PM, Robert Haas wrote:

> On Wed, Mar 31, 2010 at 4:37 PM, Scott Carey <scott@richrelevance.com> wrote:
>> On Mar 27, 2010, at 6:35 AM, Andy Colson wrote:
>>>
>>> Dont "VACUUM FULL", its not helping you, and is being removed in newer versions.
>>>
>>
>> Off topic:  How is that going to work?  CLUSTER doesn't work on tables without an index.  I would love to be able to
CLUSTERon some column set that doesn't necessarily have an index. 
>
> I believe the new VF implementation just rewrites the data in the same
> physical order as it was in previously, but without the dead space.
> So it's sort of like cluster-by-no-index-at-all.
>

Still off topic:

Will CLUSTER/VF respect FILLFACTOR in 9.0?

As far as I can tell in 8.4, it does not.  CLUSTER on a table with FILLFACTOR=100, then alter the table to
FILLFACTOR=90,cluster again -- the file size reported by \dt+ is the same.  This is a fairly big performance issue
sinceit means that HOT doesn't function well on a table just CLUSTERed. 

> ...Robert


Re: Database size growing over time and leads to performance impact

From
Tom Lane
Date:
Scott Carey <scott@richrelevance.com> writes:
> Still off topic:

> Will CLUSTER/VF respect FILLFACTOR in 9.0?

> As far as I can tell in 8.4, it does not.

Works for me, in both branches.

            regards, tom lane

Re: Database size growing over time and leads to performance impact

From
Scott Carey
Date:
On Apr 1, 2010, at 1:42 PM, Tom Lane wrote:

> Scott Carey <scott@richrelevance.com> writes:
>> Still off topic:
>
>> Will CLUSTER/VF respect FILLFACTOR in 9.0?
>
>> As far as I can tell in 8.4, it does not.
>
> Works for me, in both branches.
>

I stand corrected.  I must have done something wrong in my test.  On a different system I tried FILLFACTOR=45 and
FILLFACTOR=90and the resulting size was nearly a factor of two different. 

>             regards, tom lane