Thread: autovacuum suggestions for 500,000,000+ row tables?
Hi, i'm trying to optimise our autovacuum configuration so that it vacuums / analyzes some of our larger tables better. It has been set to the default settings for quite some time. We never delete anything (well not often, and not much) from the tables, so I am not so worried about the VACUUM status, but I am wary of XID wraparound nuking us at some point if we don't sort vacuuming out so we VACUUM at least once every year ;) However not running ANALYZE for such huge periods of time is probably impacting the statistics accuracy somewhat, and I have seen some unusually slow queries at times. Anyway, does anyone think we might benefit from a more aggressive autovacuum configuration?
Hi, At 16:44 20/06/2005, Alex Stapleton wrote: >We never delete >anything (well not often, and not much) from the tables, so I am not >so worried about the VACUUM status DELETEs are not the only reason you might need to VACUUM. UPDATEs are important as well, if not more. Tables that are constantly updated (statistics, session data, queues...) really need to be VACUUMed a lot. >but I am wary of XID wraparound >nuking us at some point if we don't sort vacuuming out so we VACUUM >at least once every year ;) That would give you a maximum average of 31 transactions/sec... Don't know if that's high or low for you. > However not running ANALYZE for such huge >periods of time is probably impacting the statistics accuracy >somewhat, and I have seen some unusually slow queries at times. >Anyway, does anyone think we might benefit from a more aggressive >autovacuum configuration? ANALYZE is not a very expensive operation, however VACUUM can definitely be a big strain and take a looooong time on big tables, depending on your setup. I've found that partitioning tables (at the application level) can be quite helpful if you manage to keep each partition to a reasonable size (under or close to available memory), especially if the partitioning scheme is somehow time-related. YMMV. Jacques.
On 20 Jun 2005, at 15:59, Jacques Caron wrote: > Hi, > > At 16:44 20/06/2005, Alex Stapleton wrote: > >> We never delete >> anything (well not often, and not much) from the tables, so I am not >> so worried about the VACUUM status >> > > DELETEs are not the only reason you might need to VACUUM. UPDATEs > are important as well, if not more. Tables that are constantly > updated (statistics, session data, queues...) really need to be > VACUUMed a lot. We UPDATE it even less often. > >> but I am wary of XID wraparound >> nuking us at some point if we don't sort vacuuming out so we VACUUM >> at least once every year ;) >> > > That would give you a maximum average of 31 transactions/sec... > Don't know if that's high or low for you. It's high as far as inserts go for us. It does them all at the end of each minute. > >> However not running ANALYZE for such huge >> periods of time is probably impacting the statistics accuracy >> somewhat, and I have seen some unusually slow queries at times. >> Anyway, does anyone think we might benefit from a more aggressive >> autovacuum configuration? >> > > ANALYZE is not a very expensive operation, however VACUUM can > definitely be a big strain and take a looooong time on big tables, > depending on your setup. I've found that partitioning tables (at > the application level) can be quite helpful if you manage to keep > each partition to a reasonable size (under or close to available > memory), especially if the partitioning scheme is somehow time- > related. YMMV. > > Jacques. That's not currently an option as it would require a pretty large amount of work to implement. I think we will have to keep that in mind though.
Alex Stapleton wrote: > > On 20 Jun 2005, at 15:59, Jacques Caron wrote: > ... >> ANALYZE is not a very expensive operation, however VACUUM can >> definitely be a big strain and take a looooong time on big tables, >> depending on your setup. I've found that partitioning tables (at the >> application level) can be quite helpful if you manage to keep each >> partition to a reasonable size (under or close to available memory), >> especially if the partitioning scheme is somehow time- related. YMMV. >> >> Jacques. > > > That's not currently an option as it would require a pretty large > amount of work to implement. I think we will have to keep that in > mind though. Remember, you can fake it with a low-level set of tables, and then wrap them into a UNION ALL view. So you get something like: CREATE VIEW orig_table AS SELECT * FROM table_2005_04 UNION ALL SELECT * FROM table_2005_05 UNION ALL SELECT * FROM table_2005_06 ... ; Then at least your individual operations are fast. As you insert, you can create a rule that on insert into orig_table do instead ... insert into table_2005_07 (or whatever the current table is). It takes a little bit of maintenance on the DB admin's part, since every month they have to create a new table, and then update all of the views and triggers. But it is pretty straightforward. If you are doing append-only inserting, then you have the nice feature that only the last table is ever modified, which means that the older tables don't really need to be vacuumed or analyzed. And even if you have to have each table modified as you go, you still can break up a VACUUM into only doing one of the sub tables at a time. I don't know you db schema, but I thought I would mention that true partitioning isn't implemented yet, you can still get something very similar with views, triggers and rules. John =:->
Attachment
Alex, > Hi, i'm trying to optimise our autovacuum configuration so that it > vacuums / analyzes some of our larger tables better. It has been set > to the default settings for quite some time. We never delete > anything (well not often, and not much) from the tables, so I am not > so worried about the VACUUM status, but I am wary of XID wraparound > nuking us at some point if we don't sort vacuuming out so we VACUUM > at least once every year ;) I personally don't use autovaccuum on very large databases. For DW, vacuuming is far better tied to ETL operations or a clock schedule of downtime. XID wraparound may be further away than you think. Try checking pg_controldata, which will give you the current XID, and you can calculate how long you are away from wraparound. I just tested a 200G data warehouse and figured out that we are 800 months away from wraparound, despite hourly ETL. > However not running ANALYZE for such huge > periods of time is probably impacting the statistics accuracy > somewhat, and I have seen some unusually slow queries at times. > Anyway, does anyone think we might benefit from a more aggressive > autovacuum configuration? Hmmm, good point, you could use autovacuum for ANALYZE only. Just set the VACUUM settings preposterously high (like 10x) so it never runs. Then it'll run ANALYZE only. I generally threshold 200, multiple 0.1x for analyze; that is, re-analyze after 200+10% of rows have changed. -- Josh Berkus Aglio Database Solutions San Francisco
On 20 Jun 2005, at 18:46, Josh Berkus wrote: > Alex, > > > >> Hi, i'm trying to optimise our autovacuum configuration so that it >> vacuums / analyzes some of our larger tables better. It has been set >> to the default settings for quite some time. We never delete >> anything (well not often, and not much) from the tables, so I am not >> so worried about the VACUUM status, but I am wary of XID wraparound >> nuking us at some point if we don't sort vacuuming out so we VACUUM >> at least once every year ;) >> >> > > I personally don't use autovaccuum on very large databases. For DW, > vacuuming is far better tied to ETL operations or a clock schedule of > downtime. > Downtime is something I'd rather avoid if possible. Do you think we will need to run VACUUM FULL occasionally? I'd rather not lock tables up unless I cant avoid it. We can probably squeeze an automated vacuum tied to our data inserters every now and then though. > XID wraparound may be further away than you think. Try checking > pg_controldata, which will give you the current XID, and you can > calculate > how long you are away from wraparound. I just tested a 200G data > warehouse > and figured out that we are 800 months away from wraparound, > despite hourly > ETL. > Is this an 8.0 thing? I don't have a pg_controldata from what I can see. Thats nice to hear though. > > > >> However not running ANALYZE for such huge >> periods of time is probably impacting the statistics accuracy >> somewhat, and I have seen some unusually slow queries at times. >> Anyway, does anyone think we might benefit from a more aggressive >> autovacuum configuration? >> >> > > Hmmm, good point, you could use autovacuum for ANALYZE only. Just > set the > VACUUM settings preposterously high (like 10x) so it never runs. > Then it'll > run ANALYZE only. I generally threshold 200, multiple 0.1x for > analyze; > that is, re-analyze after 200+10% of rows have changed. > I will try those settings out, that sounds good to me though. > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > >
Alex, > Downtime is something I'd rather avoid if possible. Do you think we > will need to run VACUUM FULL occasionally? I'd rather not lock tables > up unless I cant avoid it. We can probably squeeze an automated > vacuum tied to our data inserters every now and then though. As long as your update/deletes are less than 10% of the table for all time, you should never have to vacuum, pending XID wraparound. > Is this an 8.0 thing? I don't have a pg_controldata from what I can > see. Thats nice to hear though. 'fraid so, yes. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On 21 Jun 2005, at 18:13, Josh Berkus wrote: > Alex, > > >> Downtime is something I'd rather avoid if possible. Do you think we >> will need to run VACUUM FULL occasionally? I'd rather not lock tables >> up unless I cant avoid it. We can probably squeeze an automated >> vacuum tied to our data inserters every now and then though. >> > > As long as your update/deletes are less than 10% of the table for > all time, > you should never have to vacuum, pending XID wraparound. > Hmm, I guess as we have hundreds of millions of rows, and when we do delete things, it's only a few thousand, and rarely. VACUUMing shouldn't need to happen too often. Thats good. Thanks a lot for the advice. >> Is this an 8.0 thing? I don't have a pg_controldata from what I can >> see. Thats nice to hear though. >> > > 'fraid so, yes. Bloody Debian stable. I might have to experiment with building from source or using alien on debian to convert the rpms. Fun. Oh well. > -- > --Josh > > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org > >
On Tue, Jun 21, 2005 at 11:08:43PM +0100, Alex Stapleton wrote: > Bloody Debian stable. I might have to experiment with building from > source or using alien on debian to convert the rpms. Fun. Oh well. Or just pull in postgresql-8.0 from unstable; sid is close enough to sarge for it to work quite well in practice, AFAIK. You'll lose the security support, though, but you will with building from source or using alien anyhow :-) /* Steinar */ -- Homepage: http://www.sesse.net/