Thread: Postgresql Automatic vacuum

Postgresql Automatic vacuum

From
"Shridhar Daithankar"
Date:
Hello All,

I have written a small daemon that can automatically vacuum PostgreSQL
database, depending upon activity per table.

It sits on top of postgres statistics collector. The postgres installation
should have per row statistics collection enabled.

Features are,

* Vacuuming based on activity on the table
* Per table vacuum. So only heavily updated tables are vacuumed.
* multiple databases supported
* Performs 'vacuum analyze' only, so it will not block the database


The project location is
http://gborg.postgresql.org/project/pgavd/projdisplay.php

Let me know for bugs/improvements and comments..

I am sure real world postgres installations has some sort of scripts doing
similar thing. This is an attempt to provide a generic interface to periodic
vacuum.


Bye
 Shridhar

--
The Abrams' Principle:    The shortest distance between two points is off the
wall.


Re: Postgresql Automatic vacuum

From
"Shridhar Daithankar"
Date:
On 23 Sep 2002 at 14:50, Lee Kindness wrote:

> Shridhar,
> 
> Might be useful to add a .tag.gz to the downloads, so people do not
> have to use CVS to take a look.

There is a development snapshot..


ByeShridhar

--
In most countries selling harmful things like drugs is punishable.Then howcome 
people can sell Microsoft software and go unpunished?(By hasku@rost.abo.fi, 
Hasse Skrifvars)



Re: Postgresql Automatic vacuum

From
John Buckman
Date:
Just an FYI - this kind of thing would be a *great* feature addition to the generic PostgresSQL release.  We at Lyris
oftenhear that "postgressql is very slow, and the files are getting larger" and then "wow! it's so much faster now that
we'reregularly vacuuming!" after we let them know about this need (the RPM install of PostgresSQL is so easy that most
peopledon't read any docs).  Automatic maintenance of database tables is a Good Thing (tm) and would make more people
weintroduce to pgsql favorably disposed toward it. 

-john


> I have written a small daemon that can automatically vacuum PostgreSQL
> database, depending upon activity per table.

> It sits on top of postgres statistics collector. The postgres installation
> should have per row statistics collection enabled.

> Features are,

> * Vacuuming based on activity on the table
> * Per table vacuum. So only heavily updated tables are vacuumed.
> * multiple databases supported
> * Performs 'vacuum analyze' only, so it will not block the database


> The project location is
> http://gborg.postgresql.org/project/pgavd/projdisplay.php

> Let me know for bugs/improvements and comments..

> I am sure real world postgres installations has some sort of scripts doing
> similar thing. This is an attempt to provide a generic interface to periodic
> vacuum.


> Bye
> Shridhar

> --
> The Abrams' Principle:    The shortest distance between two points is off the
> wall.


> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly


Re: Postgresql Automatic vacuum

From
"Shridhar Daithankar"
Date:
On 23 Sep 2002 at 13:28, Matthew T. O'Connor wrote:

> On Monday 23 September 2002 09:43 am, Shridhar Daithankar wrote:
> > Hello All,
> >
> > I have written a small daemon that can automatically vacuum PostgreSQL
> > database, depending upon activity per table.
> 
> Hello Shridhar, sorry I didn't respond to the email you sent me a while back.  
> Anyway, I saw this post, and just started taking a look a it.  I wasn't 
> thinking of doing this as a totally separate executable / code base, but 
> perhaps that has advantages I need to think more.  
> 
> A couple of quick questions, you are using C++, but all postgres source code 
> is in C, do you want this to eventually be included as part of the postgres 
> distribution?  If so, I think that C might be a better choice.

Well, I wrote it in C++ because I like it. I have lost habit of writing pure C 
code. Nothing else.

As far as getting into base postgresql distro. I don't mind it rewriting but I 
have some reservations.

1) As it is postgresql source code is huge. Adding functions to it which 
directly taps into it's nervous system e.g. cache, would take far more time to 
perfect in all conditions.

My application as it is is an external client app. It enjoys all the isolation 
provided by postgresql. Besides this is a low priority functionality at 
runtime, unlike real time replication. It would rarely matter it vacuum is 
triggered after 6 seconds instead of configuerd 5 seconds, for example.

Less code, less bugs is my thinking. 

I wanted this functionality out fast. I didn't want to invest in learning 
postgresql source code because I didn't have time. So I wrote a separate app. 
Besides it would run on all previous postgresql versions which supports 
statistics collection. That's a huge plus if you ask me.

2) Consider this. No other database offers built in tool to clean the things.  
Is it that nobody needs it? No everybody needs it. And then you end up cleaning 
database by taking it down.

If people take for granted that postgresql does not need manual cleaning, by 
deploying apps. like pgavd, vacuum will be a big feature of postgres. Clean the 
database without taking it down..


> I will play with it more and give you some more feedback.

Awaiting that.

I am Cc'ing this to Hackers because I am sure some people might have same 
doubts.
ByeShridhar

--
intoxicated, adj.:    When you feel sophisticated without being able to pronounce 
it.



Re: Postgresql Automatic vacuum

From
Mario Weilguni
Date:
Am Dienstag, 24. September 2002 08:16 schrieb Shridhar Daithankar:
>
> > I will play with it more and give you some more feedback.
>
> Awaiting that.
>

IMO there are still several problems with that approach, namely:
* every database will get "polluted" with the autovacuum table, which is undesired
* the biggest problem is the ~/.pgavrc file. I think it should work like other postgres utils do, e.g. supporting -U,
-d,.... 
* it's not possible to use without activly administration the config file. it should be able to work without
adminstratorassistance. 

When this is a daemon, why not store the data in memory? Even with several thousands of tables the memory footprint
wouldstillbe small. And it should be possible to use for all databases without modifying a config file. 

Two weeks ago I began writing a similar daemon, but had no time yet to finish it. I've tried to avoid using fixed
numbers(namely "vacuum table 
after 1000 updates") and tried to make my own heuristic based on the statistics data and the size of the table. The
reasonis, for a large table 1000 entries might be  
a small percentage and vacuum is not necessary, while for small tables 10 updates might be sufficient.

Best regards,Mario Weilguni



Re: Postgresql Automatic vacuum

From
"Shridhar Daithankar"
Date:
On 24 Sep 2002 at 8:42, Mario Weilguni wrote:

> Am Dienstag, 24. September 2002 08:16 schrieb Shridhar Daithankar:
> IMO there are still several problems with that approach, namely:
> * every database will get "polluted" with the autovacuum table, which is undesired 

I agree. But that was the best alternative I could see. explanation 
follows..Besides I didn't want to touch PG meta data..

> * the biggest problem is the ~/.pgavrc file. I think it should work like other postgres utils do, e.g. supporting -U,
-d,....
 

Shouldn't be a problem. The config stuff is working and I can add that. I would 
rather term it a minor issue. On personal preference, I would just fire it 
without any arguments. It's not a thing that you change daily. Configure it in 
config file and done..

> * it's not possible to use without activly administration the config file. it should be able to work without
>   adminstrator assistance.

Well. I would call that tuning. Each admin can tune it. Yes it's an effort but 
certainly not an active administration.
> When this is a daemon, why not store the data in memory? Even with several thousands of tables the memory footprint
would
>  still be small. And it should be possible to use for all databases without modifying a config file.

Well. When postgresql has ability to deal with arbitrary number of rows, it 
seemed redundant to me to duplicate all those functionality. Why write lists 
and arrays again and again? Let postgresql do it.


> Two weeks ago I began writing a similar daemon, but had no time yet to finish it. I've tried to avoid using fixed
numbers(namely "vacuum table
 
> after 1000 updates") and tried to make my own heuristic based on the statistics data and the size of the table. The
reasonis, for a large table 1000 entries might be 
 
> a small percentage and vacuum is not necessary, while for small tables 10 updates might be sufficient.

Well, that fixed number is not really fixed but admin tunable, that too per 
database. These are just defaults. Tune it to suit your needs.

The objective of whole exercise is to get rid of periodic vacuum as this app. 
shifts threshold to activity rather than time.

Besides a table should be vacuumed when it starts affecting performance. On an 
installation if a table a 1M rows and change 1K rows affects performance, there 
will be a similar performance hit for a 100K rows table for 1K rows update. 
Because overhead involved would be almost same.(Not disk space. pgavd does not 
target vacuum full but tuple size should matter).

At least me thinks so..

I plan to implement per table threshold in addition to per database thresholds. 
But right now, it seems like overhead to me. Besides there is an item in TODO, 
to shift unit of work from rows to blocks affected. I guess that takes care of 
some of your points..
ByeShridhar

--
Jones' Second Law:    The man who smiles when things go wrong has thought of 
someone    to blame it on.



Re: Postgresql Automatic vacuum

From
John Buckman
Date:
> As far as getting into base postgresql distro. I don't mind it rewriting but I
> have some reservations.
> 1) As it is postgresql source code is huge. Adding functions to it which
> directly taps into it's nervous system e.g. cache, would take far more time to
> perfect in all conditions.

It doesn't have to make its way into the postgresql daemon itself -- in fact since some people like tuning the
vacuuming,it makes more sense to make this a daemon. No, my suggestion is simple that some sort of auto-vacuumer be
compiledas a stand-alone app and included in the standard postgresql tar.gz file, and the install instructions
recommendthe site adding it as a cron job.  

On linux, it'd be good if the RPM install it automatically (or else it ran as a mostly-asleep daemon) because so many
ofthe Linux/Postgresql users we see have just no clue about Postgresql, and no intention of reading anything. 

Just an FYI, a message I received today from the postmaster at a major telco about their postgresql experience:
> We have experienced some problems but they have generally
> cleared up after a database vacuum. However, sometimes I
> have found that the vacuum itself (especially a vacuum analyze)
> will go into the CPU consumption loop.

-john


Re: Postgresql Automatic vacuum

From
"Matthew T. O'Connor"
Date:
> It doesn't have to make its way into the postgresql daemon itself -- in
fact since some people like tuning the vacuuming, it makes more sense to
make this a daemon. No, my suggestion is simple that some sort of
auto-vacuumer be compiled as a stand-alone app and included in the standard
postgresql tar.gz file, and the install instructions recommend the site
adding it as a cron job.

unless I missed something.... the point of a daemon is so that we don't need
to use cron.

I also think that some type of daemon should be included in the pg sources,
and installed with the rest of the system, and if configured to do so, the
postmaster launches the auto vac daemon.  I think this still makes sense
even with the proposed setup (autovac client is just special client app).



Re: Postgresql Automatic vacuum

From
"Shridhar Daithankar"
Date:
On 25 Sep 2002 at 1:10, Matthew T. O'Connor wrote:

> > It doesn't have to make its way into the postgresql daemon itself -- in
> fact since some people like tuning the vacuuming, it makes more sense to
> make this a daemon. No, my suggestion is simple that some sort of
> auto-vacuumer be compiled as a stand-alone app and included in the standard
> postgresql tar.gz file, and the install instructions recommend the site
> adding it as a cron job.
> 
> unless I missed something.... the point of a daemon is so that we don't need
> to use cron.
> 
> I also think that some type of daemon should be included in the pg sources,
> and installed with the rest of the system, and if configured to do so, the
> postmaster launches the auto vac daemon.  I think this still makes sense
> even with the proposed setup (autovac client is just special client app).

I would suggest adding it to pg_ctl. Best place to put it IMO..

I can make available rpm of pgavd but with checkinstall, I guess people will 
can have more distro. spcecific rpms. One rpm is obviously not going to install 
on all distros..



ByeShridhar

--
Moore's Constant:    Everybody sets out to do something, and everybody    does 
something, but no one does what he sets out to do.