Thread: pg_stat_get_last_vacuum_time(): why non-FULL?

pg_stat_get_last_vacuum_time(): why non-FULL?

From
CR Lender
Date:
According to the manual (9.1), pg_stat_get_last_vacuum_time() returns

    timestamptz | Time of the last non-FULL vacuum initiated by the
                | user on this table

Why are full vacuums excluded from this statistic? It looks like there's
no way to get the date of the last manual vacuum, if only full vacuums
are performed.


regards,
crl


Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
Kevin Grittner
Date:
CR Lender <crlender@gmail.com> wrote:

> According to the manual (9.1), pg_stat_get_last_vacuum_time() returns
>
>     timestamptz | Time of the last non-FULL vacuum initiated by the
>                 | user on this table
>
> Why are full vacuums excluded from this statistic? It looks like there's
> no way to get the date of the last manual vacuum, if only full vacuums
> are performed.

Because FULL is a bit of a misnomer -- there are important things a
non-FULL vacuum does which a FULL vacuum does not.  In general, a
VACUUM FULL should be followed by a non-FULL vacuum to keep the
database in good shape.  Also, a VACUUM FULL is an extreme form of
maintenance which should rarely be needed; if you find that you
need to run VACUUM FULL, something is probably being done wrong
which should be fixed so that you don't need to continue to do such
extreme maintenance.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
CR Lender
Date:
On 2013-03-26 19:28, Kevin Grittner wrote:
>> Why are full vacuums excluded from this statistic? It looks like there's
>> no way to get the date of the last manual vacuum, if only full vacuums
>> are performed.
>
> Because FULL is a bit of a misnomer -- there are important things a
> non-FULL vacuum does which a FULL vacuum does not.  In general, a
> VACUUM FULL should be followed by a non-FULL vacuum to keep the
> database in good shape.

Thank you, that's very helpful. I wasn't aware of that.

> Also, a VACUUM FULL is an extreme form of
> maintenance which should rarely be needed; if you find that you
> need to run VACUUM FULL, something is probably being done wrong
> which should be fixed so that you don't need to continue to do such
> extreme maintenance.

In this case I was only trying to make sense of an existing database
(8.3). The statistics in pg_stats were way off for some tables, so I
wanted to see if (auto)vacuum and (auto)analyze were being run.
pg_stat_all_tables() showed last_autoanalyze at >400 days for some of
the larger tables. There used to be a weekly cron job with VACUUM FULL
ANALYZE, and I was trying to find out if that cron job was still active.

Thanks,
crl


Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
Martín Marqués
Date:
2013/3/27 CR Lender <crlender@gmail.com>:
>
>> Also, a VACUUM FULL is an extreme form of
>> maintenance which should rarely be needed; if you find that you
>> need to run VACUUM FULL, something is probably being done wrong
>> which should be fixed so that you don't need to continue to do such
>> extreme maintenance.
>
> In this case I was only trying to make sense of an existing database
> (8.3). The statistics in pg_stats were way off for some tables, so I
> wanted to see if (auto)vacuum and (auto)analyze were being run.
> pg_stat_all_tables() showed last_autoanalyze at >400 days for some of
> the larger tables. There used to be a weekly cron job with VACUUM FULL
> ANALYZE, and I was trying to find out if that cron job was still active.
>

What's your autovacuum configuration? autovacuum_vacuum_threshold?
autovacuum_analyze_threshold? autovacuum_vacuum_scale_factor?
autovacuum_analyze_scale_factor?

Related to your 400+ days not vacuumed tables, are you sure those
tables have data changes (INSERT/UPDATE/DELETE)? I have some static
tables with over a year of no vacuum (and autovacuum field never ran
on that relation).

What does n_dead_tup show?

--
Martín Marqués
select 'martin.marques' || '@' || 'gmail.com'
DBA, Programador, Administrador


Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
CR Lender
Date:
On 2013-03-28 13:11, Martín Marqués wrote:
> 2013/3/27 CR Lender <crlender@gmail.com>:
>> In this case I was only trying to make sense of an existing database
>> (8.3). The statistics in pg_stats were way off for some tables, so I
>> wanted to see if (auto)vacuum and (auto)analyze were being run.
>> pg_stat_all_tables() showed last_autoanalyze at >400 days for some of
>> the larger tables. There used to be a weekly cron job with VACUUM FULL
>> ANALYZE, and I was trying to find out if that cron job was still active.
>
> What's your autovacuum configuration? autovacuum_vacuum_threshold?
> autovacuum_analyze_threshold? autovacuum_vacuum_scale_factor?
> autovacuum_analyze_scale_factor?

 autovacuum                      | on
 autovacuum_analyze_scale_factor | 0.1
 autovacuum_analyze_threshold    | 50
 autovacuum_freeze_max_age       | 200000000
 autovacuum_max_workers          | 3
 autovacuum_naptime              | 1min
 autovacuum_vacuum_cost_delay    | 20ms
 autovacuum_vacuum_cost_limit    | -1
 autovacuum_vacuum_scale_factor  | 0.2
 autovacuum_vacuum_threshold     | 50

The database is running on PostgreSQL 8.3.6.

I don't maintain this server, and my knowledge about the autovacuum
feature and its settings are sketchy. The values above could be the
defaults for 8.3.6, or they may have been adjusted by the admin.

> Related to your 400+ days not vacuumed tables, are you sure those
> tables have data changes (INSERT/UPDATE/DELETE)? I have some static
> tables with over a year of no vacuum (and autovacuum field never ran
> on that relation).

Yes. Autovacuum and autoanalyze are active, and tables with frequent
DELETEs and UPDATEs appear to be vacuumed and analyzed at least once a
day. Other tables with relatively frequent INSERTs, but irregular
UPDATEs and rare DELETEs go without vacuum/analyze for long periods of
time. Static tables never get analyzed or vacuumed (as expected).

> What does n_dead_tup show?

Here are the statistics for three exemplary tables:

         relname: | r____      | oe____     | mv____
                  |            |            |
 n_tup_ins        |      35335 |     179507 |       9562
 n_tup_upd        |      46727 |     824898 |          0
 n_tup_del        |          0 |       9709 |       3567
 n_tup_hot_upd    |       2016 |     793169 |          0
 n_live_tup       |     206086 |    1132164 |      57964
 n_dead_tup       |      35583 |      46932 |       5436
 last_autovacuum  | 2011-05-25 |       NULL |       NULL
 last_autoanalyze | 2013-01-07 | 2012-12-27 | 2012-04-16

I'm not saying that autovacuum/autoanalyze aren't working as designed, I
was just surprised by the long delays.


Concerning the earlier reply to my question...

> On 2013-03-26 19:28, Kevin Grittner wrote:
>> Because FULL is a bit of a misnomer -- there are important things a
>> non-FULL vacuum does which a FULL vacuum does not.  In general, a
>> VACUUM FULL should be followed by a non-FULL vacuum to keep the
>> database in good shape.

I've read the manual more carefully now, and I can't see any mention of
what VACUUM does that VACUUM FULL does not. The point about extreme
maintainance is taken, but from what I read, VACUUM FULL should include
everything a normal VACUUM does.


Thanks,
crl


Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
Kevin Grittner
Date:
CR Lender <crlender@gmail.com> wrote:

> The database is running on PostgreSQL 8.3.6.

> I've read the manual more carefully now, and I can't see any mention of
> what VACUUM does that VACUUM FULL does not. The point about extreme
> maintainance is taken, but from what I read, VACUUM FULL should include
> everything a normal VACUUM does.

Prior to release 9.0 that is probably true.  Sorry for not asking
about the version first.  But you should read this page:

http://www.postgresql.org/support/versioning/

8.3 is out of support now.  Even for the 8.3 release, 8.3.6 is
missing over four years of fixes for bugs and security
vulnerabilities.  There is a very good chance that any problem you
see already fixed and you are just choosing to run without the fix.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
Martín Marqués
Date:
2013/3/28 CR Lender <crlender@gmail.com>:
> On 2013-03-28 13:11, Martín Marqués wrote:
>> 2013/3/27 CR Lender <crlender@gmail.com>:
>>> In this case I was only trying to make sense of an existing database
>>> (8.3). The statistics in pg_stats were way off for some tables, so I
>>> wanted to see if (auto)vacuum and (auto)analyze were being run.
>>> pg_stat_all_tables() showed last_autoanalyze at >400 days for some of
>>> the larger tables. There used to be a weekly cron job with VACUUM FULL
>>> ANALYZE, and I was trying to find out if that cron job was still active.
>>
>> What's your autovacuum configuration? autovacuum_vacuum_threshold?
>> autovacuum_analyze_threshold? autovacuum_vacuum_scale_factor?
>> autovacuum_analyze_scale_factor?
>
>  autovacuum                      | on
>  autovacuum_analyze_scale_factor | 0.1
>  autovacuum_analyze_threshold    | 50
>  autovacuum_freeze_max_age       | 200000000
>  autovacuum_max_workers          | 3
>  autovacuum_naptime              | 1min
>  autovacuum_vacuum_cost_delay    | 20ms
>  autovacuum_vacuum_cost_limit    | -1
>  autovacuum_vacuum_scale_factor  | 0.2
>  autovacuum_vacuum_threshold     | 50
>
> The database is running on PostgreSQL 8.3.6.

What you have to look at is autovacuum_analyze_scale_factor which in
your case is 10% of the total tuples of the relation. So when 50
tuples (the threshold) over the 10% of the total tuples are
analyzable, autovacuum analyzes the relation.

Same thing with vacuum, but with 20% in that case.

If you want autovacuum to really clean and analize more, you will have
to lower autovacuum_analyze_scale_factor and
autovacuum_vacuum_scale_factor.

--
Martín Marqués
select 'martin.marques' || '@' || 'gmail.com'
DBA, Programador, Administrador


Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
CR Lender
Date:
On 2013-03-28 20:44, Kevin Grittner wrote:
> CR Lender <crlender@gmail.com> wrote:
>
>> The database is running on PostgreSQL 8.3.6.
>
>> I've read the manual more carefully now, and I can't see any mention of
>> what VACUUM does that VACUUM FULL does not. The point about extreme
>> maintainance is taken, but from what I read, VACUUM FULL should include
>> everything a normal VACUUM does.
>
> Prior to release 9.0 that is probably true.

Hm, I can't find it, even in the manual for 9.2.
http://www.postgresql.org/docs/current/static/sql-vacuum.html

If VACUUM FULL is just a more aggressive VACCUM (including writing new
data files), then I don't understand the "non-FULL" restriction in
pg_stat_get_last_vacuum_time()... unless that information is somehow
lost when table files are rewritten.

> 8.3 is out of support now.  Even for the 8.3 release, 8.3.6 is
> missing over four years of fixes for bugs and security
> vulnerabilities.  There is a very good chance that any problem you
> see already fixed and you are just choosing to run without the fix.

You're right of course, the PostgreSQL version on the server is rather
old. We're redesigning the whole application, and migrating to 9.2 will
be part of the process (I'm running 9.1 locally).

Thanks,
crl


Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
CR Lender
Date:
On 2013-03-31 18:31, CR Lender wrote:
> On 2013-03-28 20:44, Kevin Grittner wrote:
>> CR Lender <crlender@gmail.com> wrote:
>>> I've read the manual more carefully now, and I can't see any mention of
>>> what VACUUM does that VACUUM FULL does not. The point about extreme
>>> maintainance is taken, but from what I read, VACUUM FULL should include
>>> everything a normal VACUUM does.
>>
>> Prior to release 9.0 that is probably true.
>
> Hm, I can't find it, even in the manual for 9.2.
> http://www.postgresql.org/docs/current/static/sql-vacuum.html
>
> If VACUUM FULL is just a more aggressive VACCUM (including writing new
> data files), then I don't understand the "non-FULL" restriction in
> pg_stat_get_last_vacuum_time()... unless that information is somehow
> lost when table files are rewritten.

I don't mean to be pushy, but I have a meeting with the admin of that
database tomorrow, and it would be nice if I had something concrete to
tell him. I still don't know what it is that VACCUM does but VACUUM full
doesn't do. There's nothing in the manual about that.

Thanks,
crl


Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
Amit Kapila
Date:
On Monday, April 08, 2013 4:40 AM CR Lender wrote:
> On 2013-03-31 18:31, CR Lender wrote:
> > On 2013-03-28 20:44, Kevin Grittner wrote:
> >> CR Lender <crlender@gmail.com> wrote:
> >>> I've read the manual more carefully now, and I can't see any
> mention of
> >>> what VACUUM does that VACUUM FULL does not. The point about extreme
> >>> maintainance is taken, but from what I read, VACUUM FULL should
> include
> >>> everything a normal VACUUM does.
> >>
> >> Prior to release 9.0 that is probably true.
> >
> > Hm, I can't find it, even in the manual for 9.2.
> > http://www.postgresql.org/docs/current/static/sql-vacuum.html
> >
> > If VACUUM FULL is just a more aggressive VACCUM (including writing
> new
> > data files), then I don't understand the "non-FULL" restriction in
> > pg_stat_get_last_vacuum_time()... unless that information is somehow
> > lost when table files are rewritten.
>
> I don't mean to be pushy, but I have a meeting with the admin of that
> database tomorrow, and it would be nice if I had something concrete to
> tell him. I still don't know what it is that VACCUM does but VACUUM
> full
> doesn't do. There's nothing in the manual about that.

One of the important difference is that during the time VACUUM FULL is operating on a relation,
no other operations will be allowed on that relation. Most of admin care about this point, because
they don't want to stop operations for background garbage collect.
VACUUM FULL is only done in rare cases when the relation size has grown too bigger than it's actual
Contents.


With Regards,
Amit Kapila.



Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
Jeff Janes
Date:
On Sun, Apr 7, 2013 at 8:55 PM, Amit Kapila <amit.kapila@huawei.com> wrote:

One of the important difference is that during the time VACUUM FULL is operating on a relation,
no other operations will be allowed on that relation. Most of admin care about this point, because
they don't want to stop operations for background garbage collect.

While that is true, it is not a reason not to update pg_stat_get_last_vacuum_time.

I'm having a hard time coming up with a reason not to update pg_stat_get_last_vacuum_time with a full vacuum.

On version 8.4 and below, you could justify it by saying that VACUUM FULL bloated the indexes and then left them that way, and so we shouldn't update the time field.  But that is no longer the case.  And even then, doing a ordinary vacuum afterwards isn't going to fix the index bloat, so even that argument is a bit sketchy.

Cheers,

Jeff

Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
Kevin Grittner
Date:
Jeff Janes <jeff.janes@gmail.com> wrote:
> Amit Kapila <amit.kapila@huawei.com> wrote:

>> One of the important difference is that during the time VACUUM
>> FULL is operating on a relation, no other operations will be
>> allowed on that relation. Most of admin care about this point,
>> because they don't want to stop operations for background
>> garbage collect.

> While that is true, it is not a reason not to update
> pg_stat_get_last_vacuum_time.  I'm having a hard time coming up
> with a reason not to update pg_stat_get_last_vacuum_time with a
> full vacuum.
>
> On version 8.4 and below, you could justify it by saying that
> VACUUM FULL bloated the indexes and then left them that way, and
> so we shouldn't update the time field.  But that is no longer the
> case.  And even then, doing a ordinary vacuum afterwards isn't
> going to fix the index bloat, so even that argument is a bit
> sketchy.

I'm not sure that what we're doing now is correct, but updating
things as if a normal vacuum had been done would *not* be the thing
to do.  For starters, VACUUM FULL blows away the free space map and
visibility map for a table.  Among other things, that means that
index-only scans will cease to work until the table has a normal
vacuum.  A normal vacuum (or autovacuum) will restore those, so a
VACUUM FULL should probably set things up to show that the table is
in need of a vacuum soon.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
Jeff Janes
Date:
On Monday, April 8, 2013, Kevin Grittner wrote:
Jeff Janes <jeff.janes@gmail.com> wrote:
> Amit Kapila <amit.kapila@huawei.com> wrote:

>> One of the important difference is that during the time VACUUM
>> FULL is operating on a relation, no other operations will be
>> allowed on that relation. Most of admin care about this point,
>> because they don't want to stop operations for background
>> garbage collect.

> While that is true, it is not a reason not to update
> pg_stat_get_last_vacuum_time.  I'm having a hard time coming up
> with a reason not to update pg_stat_get_last_vacuum_time with a
> full vacuum.
>
> On version 8.4 and below, you could justify it by saying that
> VACUUM FULL bloated the indexes and then left them that way, and
> so we shouldn't update the time field.  But that is no longer the
> case.  And even then, doing a ordinary vacuum afterwards isn't
> going to fix the index bloat, so even that argument is a bit
> sketchy.

I'm not sure that what we're doing now is correct, but updating
things as if a normal vacuum had been done would *not* be the thing
to do.  For starters, VACUUM FULL blows away the free space map and
visibility map for a table. 

Ah, OK, that is obvious in retrospect.  I was wracking my brain for stats-collector-aspects and completely forgot about those.

I don't know how hard (for the hackers) or extra work (for the server) it would be to make VACUUM FULL reset those things to reasonable values.  But it should be fairly easy to at least document them.

One often uses VACUUM FULL when one is up to ones elbows in alligators, so it is understandable that we would not want to impose another burden on the server at that particular moment.  So I'm leaning towards documenting the issue.  Or are they already, and I'm just missing it?

Cheers,

Jeff

Re: pg_stat_get_last_vacuum_time(): why non-FULL?

From
CR Lender
Date:
On 2013-04-09 00:09, Kevin Grittner wrote:
> I'm not sure that what we're doing now is correct, but updating
> things as if a normal vacuum had been done would *not* be the thing
> to do.  For starters, VACUUM FULL blows away the free space map and
> visibility map for a table.  Among other things, that means that
> index-only scans will cease to work until the table has a normal
> vacuum.

Ah, now it makes sense. Thank you, that's what I was looking for.

And I agree with Jeff that this could be documented in more detail.

Thanks,
crl