Thread: best practice to avoid table bloat?

best practice to avoid table bloat?

From
"Anibal David Acosta"
Date:

Hi,

if I have a table that daily at night is deleted about 8 millions of rows (table maybe has 9 millions) is recommended to do a vacuum analyze after delete completes or can I leave this job to autovacuum?

 

This table is very active during the day but less active during night

 

I think that the only only thing where Postgres is weak, is in this area (table and index bloat).

 

For some reason for the same amount of data every day postgres consume a little more.

 

Thanks!

Re: best practice to avoid table bloat?

From
Andrew Dunstan
Date:
On 08/16/2012 04:33 PM, Anibal David Acosta wrote:
>
> Hi,
>
> if I have a table that daily at night is deleted about 8 millions of
> rows (table maybe has 9 millions) is recommended to do a vacuum
> analyze after delete completes or can I leave this job to autovacuum?
>
> This table is very active during the day but less active during night
>
> I think that the only only thing where Postgres is weak, is in this
> area (table and index bloat).
>
> For some reason for the same amount of data every day postgres consume
> a little more.
>
>


Check out pg_reorg.

cheers

andrew


Re: best practice to avoid table bloat?

From
"Kevin Grittner"
Date:
"Anibal David Acosta" <aa@devshock.com> wrote:

> if I have a table that daily at night is deleted about 8 millions
> of rows (table maybe has 9 millions) is recommended to do a vacuum
> analyze after delete completes or can I leave this job to
> autovacuum?

Deleting a high percentage of the rows should cause autovacuum to
deal with the table the next time it wakes up, so an explicit VACUUM
ANALYZE shouldn't be needed.

> For some reason for the same amount of data every day postgres
> consume a little more.

How are you measuring the data and how are you measuring the space?
And what version of PostgreSQL is this?

-Kevin


Re: best practice to avoid table bloat?

From
"Anibal David Acosta"
Date:
Thanks Kevin.
Postgres version is 9.1.4 (lastest)

Every day the table has about 7 millions of new rows.
The table hold the data for 60 days, so approx. the total rows must be
around 420 millions.
Every night a delete process run, and remove rows older than 60 days.

So, the space used by postgres should not be increase drastically because
every day arrive 7 millions of rows but also same quantity is deleted but my
disk get out of space every 4 months.
I must copy tables outside the server, delete local table and create it
again, after this process I got again space for about 4 months.

Maybe is a wrong autovacuum config, but is really complicate to understand
what values are correct to avoid performance penalty but to keep table in
good fit.

I think that autovacuum configuration should have some like "auto-config"
that recalculate every day which is the best configuration for the server
condition

Thanks!


-----Mensaje original-----
De: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Enviado el: jueves, 16 de agosto de 2012 04:52 p.m.
Para: Anibal David Acosta; pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] best practice to avoid table bloat?

"Anibal David Acosta" <aa@devshock.com> wrote:

> if I have a table that daily at night is deleted about 8 millions of
> rows (table maybe has 9 millions) is recommended to do a vacuum
> analyze after delete completes or can I leave this job to autovacuum?

Deleting a high percentage of the rows should cause autovacuum to deal with
the table the next time it wakes up, so an explicit VACUUM ANALYZE shouldn't
be needed.

> For some reason for the same amount of data every day postgres consume
> a little more.

How are you measuring the data and how are you measuring the space?
And what version of PostgreSQL is this?

-Kevin



Re: best practice to avoid table bloat?

From
"Kevin Grittner"
Date:
[please don't top-post]

"Anibal David Acosta" <aa@devshock.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.go> wrote:
>> "Anibal David Acosta" <aa@devshock.com> wrote:
>>
>>> if I have a table that daily at night is deleted about 8
>>> millions of rows (table maybe has 9 millions) is recommended to
>>> do a vacuum analyze after delete completes or can I leave this
>>> job to autovacuum?
>>
>> Deleting a high percentage of the rows should cause autovacuum to
>> deal with the table the next time it wakes up, so an explicit
>> VACUUM ANALYZE shouldn't be needed.

> Every day the table has about 7 millions of new rows.
> The table hold the data for 60 days, so approx. the total rows
> must be around 420 millions.
> Every night a delete process run, and remove rows older than 60
> days.

Oh, I thought you were saying the table grew to 9 million rows each
day and you deleted 8 million of them each night.  That would
definitely trigger autovacuum.  Deleting 7 million rows from a table
of 420 million rows would not, so an explicit VACUUM ANALYZE after
the delete might be helpful.  Even better, with a pattern like that,
you might want to consider partitioning the table:

http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

>>> For some reason for the same amount of data every day postgres
>>> consume a little more.
>>
>> How are you measuring the data and how are you measuring the
>> space?

> [no answer]

Without knowing what is increasing, it's hard to say why it is
increasing.  For all we know you are logging all statements and
never deleting log files.  The solution for that would be entirely
different from the solution for some other problem.

> So, the space used by postgres should not be increase drastically
> because every day arrive 7 millions of rows but also same quantity
> is deleted but my disk get out of space every 4 months.

What is getting bigger over time?

> I must copy tables outside the server, delete local table and
> create it again, after this process I got again space for about 4
> months.

How do you do that?  pg_dump, DROP TABLE, restore the dump?  Have
you captured sizes of heap, toast, indexes, etc. before and after
this aggressive maintenance?  Is the size going up by orders of
magnitude, or are you running really tight and getting killed by a
10% increase.  We don't know unless you tell us.

> Maybe is a wrong autovacuum config, but is really complicate to
> understand what values are correct to avoid performance penalty
> but to keep table in good fit.

Please show us the entire result from running this query:

http://wiki.postgresql.org/wiki/Server_Configuration

-Kevin