Thread: autovacuum suggestions for 500,000,000+ row tables?

autovacuum suggestions for 500,000,000+ row tables?

From
Alex Stapleton
Date:
Hi, i'm trying to optimise our autovacuum configuration so that it
vacuums / analyzes some of our larger tables better. It has been set
to the default settings for quite some time. We never delete
anything  (well not often, and not much) from the tables, so I am not
so worried about the VACUUM status, but I am wary of XID wraparound
nuking us at some point if we don't sort vacuuming out so we VACUUM
at least once every year ;) However not running ANALYZE for such huge
periods of time is probably impacting the statistics accuracy
somewhat, and I have seen some unusually slow queries at times.
Anyway, does anyone think we might benefit from a more aggressive
autovacuum configuration?

Re: autovacuum suggestions for 500,000,000+ row

From
Jacques Caron
Date:
Hi,

At 16:44 20/06/2005, Alex Stapleton wrote:
>We never delete
>anything  (well not often, and not much) from the tables, so I am not
>so worried about the VACUUM status

DELETEs are not the only reason you might need to VACUUM. UPDATEs are
important as well, if not more. Tables that are constantly updated
(statistics, session data, queues...) really need to be VACUUMed a lot.

>but I am wary of XID wraparound
>nuking us at some point if we don't sort vacuuming out so we VACUUM
>at least once every year ;)

That would give you a maximum average of 31 transactions/sec... Don't know
if that's high or low for you.

>  However not running ANALYZE for such huge
>periods of time is probably impacting the statistics accuracy
>somewhat, and I have seen some unusually slow queries at times.
>Anyway, does anyone think we might benefit from a more aggressive
>autovacuum configuration?

ANALYZE is not a very expensive operation, however VACUUM can definitely be
a big strain and take a looooong time on big tables, depending on your
setup. I've found that partitioning tables (at the application level) can
be quite helpful if you manage to keep each partition to a reasonable size
(under or close to available memory), especially if the partitioning scheme
is somehow time-related. YMMV.

Jacques.



Re: autovacuum suggestions for 500,000,000+ row tables?

From
Alex Stapleton
Date:
On 20 Jun 2005, at 15:59, Jacques Caron wrote:

> Hi,
>
> At 16:44 20/06/2005, Alex Stapleton wrote:
>
>> We never delete
>> anything  (well not often, and not much) from the tables, so I am not
>> so worried about the VACUUM status
>>
>
> DELETEs are not the only reason you might need to VACUUM. UPDATEs
> are important as well, if not more. Tables that are constantly
> updated (statistics, session data, queues...) really need to be
> VACUUMed a lot.

We UPDATE it even less often.

>
>> but I am wary of XID wraparound
>> nuking us at some point if we don't sort vacuuming out so we VACUUM
>> at least once every year ;)
>>
>
> That would give you a maximum average of 31 transactions/sec...
> Don't know if that's high or low for you.

It's high as far as inserts go for us. It does them all at the end of
each minute.

>
>>  However not running ANALYZE for such huge
>> periods of time is probably impacting the statistics accuracy
>> somewhat, and I have seen some unusually slow queries at times.
>> Anyway, does anyone think we might benefit from a more aggressive
>> autovacuum configuration?
>>
>
> ANALYZE is not a very expensive operation, however VACUUM can
> definitely be a big strain and take a looooong time on big tables,
> depending on your setup. I've found that partitioning tables (at
> the application level) can be quite helpful if you manage to keep
> each partition to a reasonable size (under or close to available
> memory), especially if the partitioning scheme is somehow time-
> related. YMMV.
>
> Jacques.

That's not currently an option as it would require a pretty large
amount of work to implement. I think we will have to keep that in
mind though.


Re: autovacuum suggestions for 500,000,000+ row tables?

From
John Arbash Meinel
Date:
Alex Stapleton wrote:

>
> On 20 Jun 2005, at 15:59, Jacques Caron wrote:
>
...

>> ANALYZE is not a very expensive operation, however VACUUM can
>> definitely be a big strain and take a looooong time on big tables,
>> depending on your setup. I've found that partitioning tables (at  the
>> application level) can be quite helpful if you manage to keep  each
>> partition to a reasonable size (under or close to available  memory),
>> especially if the partitioning scheme is somehow time- related. YMMV.
>>
>> Jacques.
>
>
> That's not currently an option as it would require a pretty large
> amount of work to implement. I think we will have to keep that in
> mind though.

Remember, you can fake it with a low-level set of tables, and then wrap
them into a UNION ALL view.
So you get something like:

CREATE VIEW orig_table AS
    SELECT * FROM table_2005_04
    UNION ALL SELECT * FROM table_2005_05
    UNION ALL SELECT * FROM table_2005_06
...
;

Then at least your individual operations are fast. As you insert, you
can create a rule that on insert into orig_table do instead ... insert
into table_2005_07 (or whatever the current table is).
It takes a little bit of maintenance on the DB admin's part, since every
month they have to create a new table, and then update all of the views
and triggers. But it is pretty straightforward.
If you are doing append-only inserting, then you have the nice feature
that only the last table is ever modified, which means that the older
tables don't really need to be vacuumed or analyzed.
And even if you have to have each table modified as you go, you still
can break up a VACUUM into only doing one of the sub tables at a time.

I don't know you db schema, but I thought I would mention that true
partitioning isn't implemented yet, you can still get something very
similar with views, triggers and rules.

John
=:->


Attachment

Re: autovacuum suggestions for 500,000,000+ row tables?

From
Josh Berkus
Date:
Alex,

> Hi, i'm trying to optimise our autovacuum configuration so that it
> vacuums / analyzes some of our larger tables better. It has been set
> to the default settings for quite some time. We never delete
> anything  (well not often, and not much) from the tables, so I am not
> so worried about the VACUUM status, but I am wary of XID wraparound
> nuking us at some point if we don't sort vacuuming out so we VACUUM
> at least once every year ;)

I personally don't use autovaccuum on very large databases.   For DW,
vacuuming is far better tied to ETL operations or a clock schedule of
downtime.

XID wraparound may be further away than you think.   Try checking
pg_controldata, which will give you the current XID, and you can calculate
how long you are away from wraparound.  I just tested a 200G data warehouse
and figured out that we are 800 months away from wraparound, despite hourly
ETL.

> However not running ANALYZE for such huge
> periods of time is probably impacting the statistics accuracy
> somewhat, and I have seen some unusually slow queries at times.
> Anyway, does anyone think we might benefit from a more aggressive
> autovacuum configuration?

Hmmm, good point, you could use autovacuum for ANALYZE only.  Just set the
VACUUM settings preposterously high (like 10x) so it never runs.   Then it'll
run ANALYZE only.   I generally threshold 200, multiple 0.1x for analyze;
that is, re-analyze after 200+10% of rows have changed.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: autovacuum suggestions for 500,000,000+ row tables?

From
Alex Stapleton
Date:
On 20 Jun 2005, at 18:46, Josh Berkus wrote:


> Alex,
>
>
>
>> Hi, i'm trying to optimise our autovacuum configuration so that it
>> vacuums / analyzes some of our larger tables better. It has been set
>> to the default settings for quite some time. We never delete
>> anything  (well not often, and not much) from the tables, so I am not
>> so worried about the VACUUM status, but I am wary of XID wraparound
>> nuking us at some point if we don't sort vacuuming out so we VACUUM
>> at least once every year ;)
>>
>>
>
> I personally don't use autovaccuum on very large databases.   For DW,
> vacuuming is far better tied to ETL operations or a clock schedule of
> downtime.
>

Downtime is something I'd rather avoid if possible. Do you think we
will need to run VACUUM FULL occasionally? I'd rather not lock tables
up unless I cant avoid it. We can probably squeeze an automated
vacuum tied to our data inserters every now and then though.


> XID wraparound may be further away than you think.   Try checking
> pg_controldata, which will give you the current XID, and you can
> calculate
> how long you are away from wraparound.  I just tested a 200G data
> warehouse
> and figured out that we are 800 months away from wraparound,
> despite hourly
> ETL.
>

Is this an 8.0 thing? I don't have a pg_controldata from what I can
see. Thats nice to hear though.


>
>
>
>> However not running ANALYZE for such huge
>> periods of time is probably impacting the statistics accuracy
>> somewhat, and I have seen some unusually slow queries at times.
>> Anyway, does anyone think we might benefit from a more aggressive
>> autovacuum configuration?
>>
>>
>
> Hmmm, good point, you could use autovacuum for ANALYZE only.  Just
> set the
> VACUUM settings preposterously high (like 10x) so it never runs.
> Then it'll
> run ANALYZE only.   I generally threshold 200, multiple 0.1x for
> analyze;
> that is, re-analyze after 200+10% of rows have changed.
>

I will try those settings out, that sounds good to me though.


> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
>
>



Re: autovacuum suggestions for 500,000,000+ row tables?

From
Josh Berkus
Date:
Alex,

> Downtime is something I'd rather avoid if possible. Do you think we
> will need to run VACUUM FULL occasionally? I'd rather not lock tables
> up unless I cant avoid it. We can probably squeeze an automated
> vacuum tied to our data inserters every now and then though.

As long as your update/deletes are less than 10% of the table for all time,
you should never have to vacuum, pending XID wraparound.

> Is this an 8.0 thing? I don't have a pg_controldata from what I can
> see. Thats nice to hear though.

'fraid so, yes.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: autovacuum suggestions for 500,000,000+ row tables?

From
Alex Stapleton
Date:
On 21 Jun 2005, at 18:13, Josh Berkus wrote:

> Alex,
>
>
>> Downtime is something I'd rather avoid if possible. Do you think we
>> will need to run VACUUM FULL occasionally? I'd rather not lock tables
>> up unless I cant avoid it. We can probably squeeze an automated
>> vacuum tied to our data inserters every now and then though.
>>
>
> As long as your update/deletes are less than 10% of the table for
> all time,
> you should never have to vacuum, pending XID wraparound.
>

Hmm, I guess as we have hundreds of millions of rows, and when we do
delete things, it's only a few thousand, and rarely. VACUUMing
shouldn't need to happen too often. Thats good. Thanks a lot for the
advice.

>> Is this an 8.0 thing? I don't have a pg_controldata from what I can
>> see. Thats nice to hear though.
>>
>
> 'fraid so, yes.

Bloody Debian stable. I might have to experiment with building from
source or using alien on debian to convert the rpms. Fun. Oh well.

> --
> --Josh
>
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>
>


Re: autovacuum suggestions for 500,000,000+ row tables?

From
"Steinar H. Gunderson"
Date:
On Tue, Jun 21, 2005 at 11:08:43PM +0100, Alex Stapleton wrote:
> Bloody Debian stable. I might have to experiment with building from
> source or using alien on debian to convert the rpms. Fun. Oh well.

Or just pull in postgresql-8.0 from unstable; sid is close enough to sarge
for it to work quite well in practice, AFAIK.

You'll lose the security support, though, but you will with building from
source or using alien anyhow :-)

/* Steinar */
--
Homepage: http://www.sesse.net/