Thread: detection of VACUUM in progress

detection of VACUUM in progress

From

Bohdan Linda

Date:

30 August 2005, 12:17:44

Hello,

Is there any way how to detect running command VACUUM by reading pg_* tables?

The idea is to detectect when table is not accessible due maintainance. The approach of explicitely setting a flag into
statustable is not very convenient, while I want to cover also non-systematic launching of this command 

Regards,
Bohdan

Re: detection of VACUUM in progress

From

Tom Lane

Date:

30 August 2005, 12:54:13

Bohdan Linda <bohdan.linda@seznam.cz> writes:
> Is there any way how to detect running command VACUUM by reading pg_* tables?

> The idea is to detectect when table is not accessible due maintainance.

Um, ordinary VACUUM doesn't render the table "not accessible".  If
you're using VACUUM FULL, maybe the right answer is to not do that.

But to answer your question, you could look in pg_locks to see if
there's an exclusive lock on the table.

            regards, tom lane

Re: detection of VACUUM in progress

From

Michael Fuhr

Date:

30 August 2005, 13:08:36

On Tue, Aug 30, 2005 at 05:35:13PM +0200, Bohdan Linda wrote:
>
> Is there any way how to detect running command VACUUM by reading pg_* tables?

If you have stats_command_string enabled then you could query
pg_stat_activity, but be aware that the results will be stale by
the time you see them (there's a slight lag in updating the stats
tables, and a VACUUM might start or complete immediately after you
issue the query but before you read the results).  This method is
therefore unreliable.

> The idea is to detectect when table is not accessible due maintainance.

In modern versions of PostgreSQL a plain VACUUM (without FULL)
should have little impact on a table's accessibility unless you're
doing something that needs a strong lock (DDL, etc.).  But VACUUM
FULL and a few other commands do prevent other transactions from
accessing a table, so if you want to check for accessibility then
you need to check for more than just VACUUM.  You might be able to
use statement_timeout to make statements fail if they take longer
than expected, and infer from the failure that the table is
unavailable.  But again, that information could be stale by the
time you see it -- the table might become available immediately
after you decide that it isn't.

What problem are you trying to solve?  If we knew what you're really
trying to do then we might be able to make suggestions.

--
Michael Fuhr

Re: detection of VACUUM in progress

From

Bohdan Linda

Date:

31 August 2005, 05:22:16

On Tue, Aug 30, 2005 at 06:07:24PM +0200, Michael Fuhr wrote:
> tables, and a VACUUM might start or complete immediately after you
> issue the query but before you read the results).  This method is
> therefore unreliable.

I intend to do the VACUUM FULL during quiet hours, thus the chance of
fitting exactly to the time that VACUUM started and it is not reflected in
the tables is quite small. And even if it would happend, very likely it will
affect only one user, who may get around hitting "refresh" button.
>
>
> What problem are you trying to solve?  If we knew what you're really
> trying to do then we might be able to make suggestions.

I have database, which gets around 240 000 new lines each day and about the
same is also deleted each day. The table has something around 8M lines in
average and simple query takes about 70s to complete(V210 1x
UltraSPARC-IIIi). As this time is quite high, I need "defragment" database
on daily basis. These queries get visualized in web application. My
problem is, how to make the web application aware that maintainace (VACUUM
FULL) is in place, but the database is not down. I really would not like
to do it via extra status table, while sometimes it may happend, that
someone will run VACUUM FULL ad-hoc-ly in good-faith and will forget to
update the status table.

Re: detection of VACUUM in progress

From

Ben-Nes Yonatan

Date:

31 August 2005, 07:04:19

Bohdan Linda wrote:
> On Tue, Aug 30, 2005 at 06:07:24PM +0200, Michael Fuhr wrote:
>
>>tables, and a VACUUM might start or complete immediately after you
>>issue the query but before you read the results).  This method is
>>therefore unreliable.
>
>
> I intend to do the VACUUM FULL during quiet hours, thus the chance of
> fitting exactly to the time that VACUUM started and it is not reflected in
> the tables is quite small. And even if it would happend, very likely it will
> affect only one user, who may get around hitting "refresh" button.
>
>>
>>What problem are you trying to solve?  If we knew what you're really
>>trying to do then we might be able to make suggestions.
>
>
> I have database, which gets around 240 000 new lines each day and about the
> same is also deleted each day. The table has something around 8M lines in
> average and simple query takes about 70s to complete(V210 1x
> UltraSPARC-IIIi). As this time is quite high, I need "defragment" database
> on daily basis. These queries get visualized in web application. My
> problem is, how to make the web application aware that maintainace (VACUUM
> FULL) is in place, but the database is not down. I really would not like
> to do it via extra status table, while sometimes it may happend, that
> someone will run VACUUM FULL ad-hoc-ly in good-faith and will forget to
> update the status table.
>

 From the postgresql manual
http://www.postgresql.org/docs/8.0/interactive/maintenance.html :
" The standard form of VACUUM is best used with the goal of maintaining
a fairly level steady-state usage of disk space. If you need to return
disk space to the operating system you can use VACUUM FULL — but what's
the point of releasing disk space that will only have to be allocated
again soon? Moderately frequent standard VACUUM runs are a better
approach than infrequent VACUUM FULL runs for maintaining
heavily-updated tables."

 From this I conclude that an ordinary VACUUM is sufficent to your
purpose cause you insert/delete almost the same amount of data daily.

But then again I can be mistaken so if anyone can back me up here or
throw the manual on me will be nice ;P

Cheers

Re: detection of VACUUM in progress

From

Bohdan Linda

Date:

31 August 2005, 09:01:05

>
> From the postgresql manual
> http://www.postgresql.org/docs/8.0/interactive/maintenance.html :
> " The standard form of VACUUM is best used with the goal of maintaining
> a fairly level steady-state usage of disk space. If you need to return
> disk space to the operating system you can use VACUUM FULL ? but what's
> the point of releasing disk space that will only have to be allocated
> again soon? Moderately frequent standard VACUUM runs are a better
> approach than infrequent VACUUM FULL runs for maintaining
> heavily-updated tables."
>
> From this I conclude that an ordinary VACUUM is sufficent to your
> purpose cause you insert/delete almost the same amount of data daily.
>
> But then again I can be mistaken so if anyone can back me up here or
> throw the manual on me will be nice ;P


If I vacuum only the table, the records may be used by new lines, that is
fine. Problem is, that when creating select on such table, it takes more
pages to be read from the IO (it will read laso deleted rows) thus the
select will last a bit longer.

regards,
Bohdan

Re: detection of VACUUM in progress

From

Patrick.FICHE@AQSACOM.COM

Date:

31 August 2005, 09:24:20

Your table size should stabilize to the size you currently get before VACUUM
FULL....
If you're afraid about having too many "deleted" rows, you just have to run
VACUUM more often...

I think that you have to test how often you have to run VACUUM so that your
performance is not impacted....
Or perhaps you could try pg_autovacuum which will run VACUUM for you based
on statistics...

Regards,
Patrick

----------------------------------------------------------------------------
---------------
Patrick Fiche
email : patrick.fiche@aqsacom.com
tel : 01 69 29 36 18
----------------------------------------------------------------------------
---------------




-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Bohdan Linda
Sent: mercredi 31 aout 2005 14:19
To: Ben-Nes Yonatan
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] detection of VACUUM in progress


>
> From the postgresql manual
> http://www.postgresql.org/docs/8.0/interactive/maintenance.html :
> " The standard form of VACUUM is best used with the goal of maintaining
> a fairly level steady-state usage of disk space. If you need to return
> disk space to the operating system you can use VACUUM FULL ? but what's
> the point of releasing disk space that will only have to be allocated
> again soon? Moderately frequent standard VACUUM runs are a better
> approach than infrequent VACUUM FULL runs for maintaining
> heavily-updated tables."
>
> From this I conclude that an ordinary VACUUM is sufficent to your
> purpose cause you insert/delete almost the same amount of data daily.
>
> But then again I can be mistaken so if anyone can back me up here or
> throw the manual on me will be nice ;P


If I vacuum only the table, the records may be used by new lines, that is
fine. Problem is, that when creating select on such table, it takes more
pages to be read from the IO (it will read laso deleted rows) thus the
select will last a bit longer.

regards,
Bohdan

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: detection of VACUUM in progress

From

Ben-Nes Yonatan

Date:

31 August 2005, 12:16:37

Bohdan Linda wrote:
>From the postgresql manual
>>http://www.postgresql.org/docs/8.0/interactive/maintenance.html :
>>" The standard form of VACUUM is best used with the goal of maintaining
>>a fairly level steady-state usage of disk space. If you need to return
>>disk space to the operating system you can use VACUUM FULL ? but what's
>>the point of releasing disk space that will only have to be allocated
>>again soon? Moderately frequent standard VACUUM runs are a better
>>approach than infrequent VACUUM FULL runs for maintaining
>>heavily-updated tables."
>>
>From this I conclude that an ordinary VACUUM is sufficent to your
>>purpose cause you insert/delete almost the same amount of data daily.
>>
>>But then again I can be mistaken so if anyone can back me up here or
>>throw the manual on me will be nice ;P
>
>
>
> If I vacuum only the table, the records may be used by new lines, that is
> fine. Problem is, that when creating select on such table, it takes more
> pages to be read from the IO (it will read laso deleted rows) thus the
> select will last a bit longer.
>
> regards,
> Bohdan
>

As far as I understand the vacuum process, it does delete the "deleted
rows" so the next queries after the vacuum will not read those rows.
Of course that the table will be the same size as with the "deleted
rows" but I dont think that with a proper index it will result in any
meaningful overhead...

But then again its just my opinion and anyway lately im feeling
increasingly amateur at this subject... :)

Cheers!
Ben-Nes Yonatan

Re: detection of VACUUM in progress

From

Scott Marlowe

Date:

31 August 2005, 12:35:21

On Wed, 2005-08-31 at 07:18, Bohdan Linda wrote:
> >
> > From the postgresql manual
> > http://www.postgresql.org/docs/8.0/interactive/maintenance.html :
> > " The standard form of VACUUM is best used with the goal of maintaining
> > a fairly level steady-state usage of disk space. If you need to return
> > disk space to the operating system you can use VACUUM FULL ? but what's
> > the point of releasing disk space that will only have to be allocated
> > again soon? Moderately frequent standard VACUUM runs are a better
> > approach than infrequent VACUUM FULL runs for maintaining
> > heavily-updated tables."
> >
> > From this I conclude that an ordinary VACUUM is sufficent to your
> > purpose cause you insert/delete almost the same amount of data daily.
> >
> > But then again I can be mistaken so if anyone can back me up here or
> > throw the manual on me will be nice ;P
>
>
> If I vacuum only the table, the records may be used by new lines, that is
> fine. Problem is, that when creating select on such table, it takes more
> pages to be read from the IO (it will read laso deleted rows) thus the
> select will last a bit longer.

It really depends on what percentage of rows you're updating.

If you are updating 240,000 rows a day, and have a database with 100M
rows, then that's not too bad.  Regular vacuums once a day would be
plenty.

If you're updating 240,000 rows a day, spread out over the day, and you
have a table that has 10,000 rows, then you will need to run vacuum far
more often to keep the table at a steady state of 10,000 to 20,000 rows.

If you're doing the 240,000 updates all at once on a small table, then
you might well be a candidate for a vacuum full.

So, again, it's about percentages really.  If 240k rows represents 1% of
your table, then daily, regular vacuums will do fine.

Personally, I just install pg_autovacuum and check on it once a week or
so to make sure it's doing its job.