Thread: Table growing faster than autovacuum can vacuum
Hello. I've got a database with a very large table (currently holding 23.5 billion rows, the output of various data loggers over the course of my PhD so far). The table itself has a trivial structure (see below) and is partitioned by data time/date and has quite acceptable INSERT/SELECT performance. CREATE TABLE rawdata ( value REAL NOT NULL, sample_time TIMESTAMP WITH TIME ZONE NOT NULL, -- Dataset reference. dataset INTEGER NOT NULL ); The data loggers are collected every month or so and uploaded into the database, resulting in another 1-2 billion rows in the table each time. Data is never deleted from the table and so it's essentially read-only. My problem is that the autovacuum system isn't keeping up with INSERTs and I keep running out of transaction IDs. SELECT performance also drops off the more I insert, which from looking at the output of iostat seems to be because the autovacuum tasks are taking up a lot of the disk bandwidth - the disks are commodity items in a software RAID and not terribly fast. My current workaround is to stop the server after a batch of inserts and then manually VACUUM FULL all the databases, letting this run over a weekend. I'm a complete newby when it comes to PostgreSQL system settings and it isn't obvious to me what I'd need to change to improve the autovacuum. Is there any way to manually freeze the rows of the table after they've been inserted so that the autovacuum doesn't need to touch the table? The rest of the database contains metadata about the main data, about 250 million rows at the moment, which I guess is a small enough amount that I can let PostgreSQL handle it automatically. The server is running PostgreSQL 8.4.2 (under FreeBSD) at the moment but it wouldn't be a problem to upgrade to 9.1 if that was helpful. Asher.
On Wed, Feb 15, 2012 at 18:46, Asher Hoskins <asher@piceur.com> wrote: > My problem is that the autovacuum system isn't keeping up with INSERTs and I > keep running out of transaction IDs. This is usually not a problem with vacuum, but a problem with consuming too many transaction IDs. I suspect you're loading that data with individual INSERT statements with no explicit transaction management -- which means every data-modifying query gets its own transaction ID. In most applications, the simplest solution is batching up lots of INSERTs (say, 10k) and run them all in a single transaction between BEGIN and COMMIT -- thus consuming just 1 transaction ID for 10k rows. You could also look at multi-row inserts or the COPY command to get better performance. Here's an overview of different data loading approaches: http://www.depesz.com/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/ But it's also possible that transaction IDs are consumed by other queries. Any PL/pgSQL procedure that makes use of exception handling consumes a transaction ID. So does every SAVEPOINT command. > I'm a complete newby when it comes to PostgreSQL system settings and it > isn't obvious to me what I'd need to change to improve the autovacuum. Is > there any way to manually freeze the rows of the table after they've been > inserted so that the autovacuum doesn't need to touch the table? VACUUM FREEZE is the command. :) You may need to tune the vacuum settings to be more aggressive if you want it to run faster (reducing cost_delay, increasing cost_limit). But I don't have much experience with these myself. http://www.postgresql.org/docs/8.4/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST Note that vacuum freeze is only necessary when you have a shortage of transaction IDs. Normally Postgres doesn't vacuum insert-only tables at all (it just analyzes them). VACUUM FULL is extremely inefficient in PostgreSQL 8.4 and older. > The server is running PostgreSQL 8.4.2 (under FreeBSD) at the moment but it > wouldn't be a problem to upgrade to 9.1 if that was helpful. Upgrading to 9.1 certainly lets you take advantage of several performance improvements, but I don't know how useful those are in your workload. ---- Additionally, it's often a good idea to partition large tables into smaller partitions (e.g. separate partition for each day/week worth of data). This way maintenance commands (VACUUM, ANALYZE) don't have to scan the whole huge table, but work with smaller individual tables, and you can drop chunks using a simple DROP TABLE. BUT it also has its problems: some types of query optimizations are impossible with partitioned tables (version 9.1 relaxed those limitations somewhat). So if you find your current performance to be sufficient, then this might not be worth doing. Regards, Marti
On Wed, Feb 15, 2012 at 19:25, Marti Raudsepp <marti@juffo.org> wrote: > VACUUM FULL is extremely inefficient in PostgreSQL 8.4 and older. Oh, a word of warning, PostgreSQL 9.0+ has a faster VACUUM FULL implementation, but it now requires twice the disk space of your table size, during the vacuum process. Regards, Marti
On 02/15/12 8:46 AM, Asher Hoskins wrote: > I've got a database with a very large table (currently holding 23.5 > billion rows, a table that large should probably be partitioned, likely by time. maybe a partition for each month. as each partition is filled, it can be VACUUM FREEZE'd since it will never be modified again. if you're doing your batch inserts of 1-2 billion rows once a month or so, perhaps that should be the partition right there. since you're inserting the new data sequentially, and not doing UPDATEs, there's no point in doing VACUUM FULL, just a plain vacuum..... so, your ~ monthly batch run could be something like... create new partition table copy/insert your 1-2 billion rows vacuum analyze (NOT full) new table vacuum freeze new table update master partition table rules this way, autovacuum won't even bother with these tables. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On Wed, Feb 15, 2012 at 12:38 PM, John R Pierce <pierce@hogranch.com> wrote: > so, your ~ monthly batch run could be something like... > > create new partition table > copy/insert your 1-2 billion rows > vacuum analyze (NOT full) new table > vacuum freeze new table > update master partition table rules Just FYI, you can "vacuum freeze analyze;" all at once.
On 2012-02-15, Asher Hoskins <asher@piceur.com> wrote: > Hello. > > I've got a database with a very large table (currently holding 23.5 > billion rows, the output of various data loggers over the course of my > PhD so far). The table itself has a trivial structure (see below) and is > partitioned by data time/date and has quite acceptable INSERT/SELECT > performance. > > CREATE TABLE rawdata ( > value REAL NOT NULL, > sample_time TIMESTAMP WITH TIME ZONE NOT NULL, > -- Dataset reference. > dataset INTEGER NOT NULL > ); > The data loggers are collected every month or so and uploaded into the > database, resulting in another 1-2 billion rows in the table each time. > Data is never deleted from the table and so it's essentially read-only. > > My problem is that the autovacuum system isn't keeping up with INSERTs > and I keep running out of transaction IDs. SELECT performance also drops > off the more I insert, which from looking at the output of iostat seems > to be because the autovacuum tasks are taking up a lot of the disk > bandwidth - the disks are commodity items in a software RAID and not > terribly fast. have you tried using COPY instead of INSERT (you'll have to insert into the correct partition) or altertatiely putting several rows in the VALUES part of the insert (that should work with trigger based partitioning) insert into rawdata values (1.0,'2012-03-25 16:29:01 +13',1), (1.1,'2012-03-25 16:29:02 +13',1), (1.15,'2012-03-25 16:29:03 +13',1), (1.17,'2012-03-25 16:29:04 +13',1), (1.18,'2012-03-25 16:29:05 +13',1); etc... -- ⚂⚃ 100% natural
On Sat, Mar 24, 2012 at 9:40 PM, Jasen Betts <jasen@xnet.co.nz> wrote: > > have you tried using COPY instead of INSERT (you'll have to insert > into the correct partition) triggers fire on copy, but rules do not. So if he has partitioning triggers they'll fire on the parent table etc. HOWEVER, that'll be slower than copying to the proper partition to begin with.