Re: Vacuuming strategy - Mailing list pgsql-general

From Elanchezhiyan Elango
Subject Re: Vacuuming strategy
Date
Msg-id CALqA5kiJGPJ_8iXjwjM0g95rmq2cbO3cHS0_g5gyuTh+Sv1r3A@mail.gmail.com
Whole thread Raw
In response to Re: Vacuuming strategy  (Elanchezhiyan Elango <elanelango@gmail.com>)
Responses Re: Vacuuming strategy
List pgsql-general

Each table has 4 indices. The updates are to the indexed columns.
Here is the schema of a table: http://pastebin.com/ffu0dUjS All tables have this same schema except that some tables don't have a port column and so will have one less index
What indexes exist?  Are the updates to indexed columns?  

Sorry I was wrong when I said the updates are to the indexed columns. The updates are to the 'data[]' column which doesn't have any index.



On Wed, Apr 30, 2014 at 10:45 AM, Elanchezhiyan Elango <elanelango@gmail.com> wrote:
Missed to answer this one:
 Is the 5GB for the table plus indexes, or just the table itself?
No it's not including the the indices. Including indices it's actually 17GB!! 


On Wed, Apr 30, 2014 at 10:40 AM, Elanchezhiyan Elango <elanelango@gmail.com> wrote:
Sergey,
Thanks for the aggressive settings. I have listed some settings I am planning to try below. Please review and let me know your feedback.

Francisco,
Thanks for the partitioning idea. I used to have the tables partitioned. But now that I have moved to a schema where data is split across about ~90 tables I have moved away from partitioning. But it's something I have to reconsider at least for the high traffic tables.

What indexes exist?  Are the updates to indexed columns?  
Each table has 4 indices. The updates are to the indexed columns.
Here is the schema of a table: http://pastebin.com/ffu0dUjS All tables have this same schema except that some tables don't have a port column and so will have one less index.

Which day's tuples are deleted every night?  Is it the day a week ago, or a month ago, or something else?
Tables1, Tables2: Anything older than 1 month is deleted.
Tables3, Tables4: Anything older than 3 months is deleted.

If only 50,000 get inserted daily and 1.2M get deleted, this table will soon be empty!  I think you said daily when you meant hourly somewhere in there.
 Sorry that was wrong.
For Tables3 it should have been 50000 rows (not 1.2M) getting deleted daily.
And for Tables4 it should have been 2000 rows (not 48000) getting deleted daily.

Why do you have a 4 minute timeout?  That seems counter-productive.
Oh, Is it less or more?

 You delete a bunch of tuples every night, so of course a vacuum full after that is going to return a lot of space.  But that space is probably just going to be needed again the next day.  If you don't do the vacuum full, does the *peak* space keep increasing, or does it stabilize?
I haven't tested to see if the space keeps on increasing.

I did pgstattupe() on one of the tables:

managed_target_stats=# select * from pgstattuple('xyz');

 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent

------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------

 5642526720 |    18880283 | 4042874387 |         71.65 |           122037 |       21495560 |               0.38 | 1239598044 |        21.97

This is one of Tables1 tables and this is after running for a week or so with the default autovacuum settings. The dead_tuple_percent look good. But the free_percent looks high. Is this normal? Also when I enabled autovacuum logs, I saw the autovacuums triggering very rarely. May be that's the reason for too much free space? I am going to try with the following settings:

log_autovacuum_min_duration = 0

autovacuum_vacuum_scale_factor = 0

autovacuum_vacuum_threshold = 40000

autovacuum_vacuum_cost_delay = 10ms

autovacuum_max_workers = 5

autovacuum_analyze_scale_factor = 0

autovacuum_analyze_threshold = 40000

Do these settings look ok? I am setting autovacuum_vacuum_scale_factor and autovacuum_analyze_threshold to 0 and the thresholds to a constant 40000. My reasoning is updates happen in a burst every 5 minutes and the upper bound of updates is 50000 every 5 minutes. So I have just harcoded a threshold of 40000. Also I am leaving autovacuum_naptime at the default of 1 min because updates anyways happen only every 5 minutes at the max. So I don't see a point in running autovacuum more frequently than 1min. Correct me if my settings look wrong.

Thanks,

Elan. 


On Wed, Apr 30, 2014 at 8:36 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Apr 29, 2014 at 4:59 PM, Elanchezhiyan Elango <elanelango@gmail.com> wrote:
Hi,

I need help on deciding my vacuuming strategy. I need to know if I ever need to do 'vacuum full' for my tables.

Tables1: Following is the query patterns on 4 high traffic table in my database:
1. Every 5 minutes about 50000 rows in the table are updated. And for a given clock hour the same 50000 rows are updated again and again.
2. When a new hour begins, another 50000 rows get inserted and they get updated every 5 minutes.
3. Every night a days worth of old stats are deleted. So this would be 24 * 50000 = 1.2M records that get deleted every night.

What indexes exist?  Are the updates to indexed columns?  Which day's tuples are deleted every night?  Is it the day a week ago, or a month ago, or something else?

 
Tables3: Another 4 tables have the following pattern:
1. Every 1 hour 50000 rows get updated. For a given day the same 50000 rows are updated again and again.
2. When a new day begins, another 50000 rows get inserted and they get updated every hour.
3. Every night 1.2M records get deleted.

If only 50,000 get inserted daily and 1.2M get deleted, this table will soon be empty!  I think you said daily when you meant hourly somewhere in there.
 

Tables4: Another 39 tables have the following pattern:
1. Every 1 hour 2000 rows gets updated. For a given day the same 2000 rows are updated again and again.
2. When a new day begins, another 2000 rows get inserted and they get updated every hour.
3. Every night 48000 rows get deleted.

With the above query pattern with intensive updates and deletes, I need to do some aggressive vacuuming.

Current strategy:I am running with default autovacuum settings (postgres 9.1.9) and I tried doing a 'vacuum full' for the 8 high traffic tables (Tables1 and Tables3) every night. But after a point, the 'vacuum full's started timing out (with 4min timeout) every night.

Why do you have a 4 minute timeout?  That seems counter-productive.

 
I think this is because the table is growing bigger (~5GB) and doing a vacuum full every night is probably not feasible.

It is probably not necessary, but it certainly seems feasible.  4 min * 8 tables = 32 minutes.  Call it one hour, since the 4 minute timeout has started not being enough.  Is the 5GB for the table plus indexes, or just the table itself?
 

Going with the default autovacuum settings and not doing 'vacuum full' at all is also not enough for my usecase. Whenever vacuum full succeeded every night, it did seem to reclaim a considerable amount of space.

You delete a bunch of tuples every night, so of course a vacuum full after that is going to return a lot of space.  But that space is probably just going to be needed again the next day.  If you don't do the vacuum full, does the *peak* space keep increasing, or does it stabilize?

Cheers,

Jeff



pgsql-general by date:

Previous
From: Elanchezhiyan Elango
Date:
Subject: Re: Vacuuming strategy
Next
From: Torsten Förtsch
Date:
Subject: arrays of rows and dblink