Re: More detail on settings for pgavd? - Mailing list pgsql-performance

From Shridhar Daithankar
Subject Re: More detail on settings for pgavd?
Date
Msg-id 3FBC6BED.9090809@myrealbox.com
Whole thread Raw
In response to Re: More detail on settings for pgavd?  (Josh Berkus <josh@agliodbs.com>)
Responses Re: More detail on settings for pgavd?
List pgsql-performance
Josh Berkus wrote:

> Shridhar,
  >>However I do not agree with this logic entirely. It pegs the next vacuum
>>w.r.t current table size which is not always a good thing.
>
>
> No, I think the logic's fine, it's the numbers which are wrong.   We want to
> vacuum when updates reach between 5% and 15% of total rows.   NOT when
> updates reach 110% of total rows ... that's much too late.

Well, looks like thresholds below 1 should be norm rather than exception.

> Hmmm ... I also think the threshold level needs to be lowered; I guess the
> purpose was to prevent continuous re-vacuuuming of small tables?
> Unfortunately, in the current implementation, the result is tha small tables
> never get vacuumed at all.
>
> So for defaults, I would peg -V at 0.1 and -v at 100, so our default
> calculation for a table with 10,000 rows is:
>
> 100 +  ( 0.1 * 10,000 ) = 1100 rows.

I would say -V 0.2-0.4 could be great as well. Fact to emphasize is that
thresholds less than 1 should be used.

>>Furthermore analyze threshold depends upon inserts+updates. I think it
>>should also depends upon deletes for obvious reasons.
> Yes.  Vacuum threshold is counting deletes, I hope?

It does.

> My comment about the frequency of vacuums vs. analyze is that currently the
> *default* is to analyze twice as often as you vacuum.    Based on my
> experiece as a PG admin on a variety of databases, I believe that the default
> should be to analyze half as often as you vacuum.

OK.

>>I am all for experimentation. If you have real life data to play with, I
>>can give you some patches to play around.
> I will have real data very soon .....

I will submit a patch that would account deletes in analyze threshold. Since you
want to delay the analyze, I would calculate analyze count as

n=updates + inserts *-* deletes

Rather than current "n = updates + inserts". Also update readme about examples
and analyze frequency.

What does statistics gather BTW? Just number of rows or something else as well?
I think I would put that on Hackers separately.

I am still wary of inverting vacuum analyze frequency. You think it is better to
set inverted default rather than documenting it?

  Shridhar


pgsql-performance by date:

Previous
From: Shridhar Daithankar
Date:
Subject: Re: High Processor consumption
Next
From: Shridhar Daithankar
Date:
Subject: Re: [HACKERS] More detail on settings for pgavd?