Thread: Table growing faster than autovacuum can vacuum

Table growing faster than autovacuum can vacuum

From
Asher Hoskins
Date:
Hello.

I've got a database with a very large table (currently holding 23.5
billion rows, the output of various data loggers over the course of my
PhD so far). The table itself has a trivial structure (see below) and is
partitioned by data time/date and has quite acceptable INSERT/SELECT
performance.

   CREATE TABLE rawdata (
     value REAL NOT NULL,
     sample_time TIMESTAMP WITH TIME ZONE NOT NULL,
     -- Dataset reference.
     dataset INTEGER NOT NULL
   );

The data loggers are collected every month or so and uploaded into the
database, resulting in another 1-2 billion rows in the table each time.
Data is never deleted from the table and so it's essentially read-only.

My problem is that the autovacuum system isn't keeping up with INSERTs
and I keep running out of transaction IDs. SELECT performance also drops
off the more I insert, which from looking at the output of iostat seems
to be because the autovacuum tasks are taking up a lot of the disk
bandwidth - the disks are commodity items in a software RAID and not
terribly fast.

My current workaround is to stop the server after a batch of inserts and
then manually VACUUM FULL all the databases, letting this run over a
weekend.

I'm a complete newby when it comes to PostgreSQL system settings and it
isn't obvious to me what I'd need to change to improve the autovacuum.
Is there any way to manually freeze the rows of the table after they've
been inserted so that the autovacuum doesn't need to touch the table?
The rest of the database contains metadata about the main data, about
250 million rows at the moment, which I guess is a small enough amount
that I can let PostgreSQL handle it automatically.

The server is running PostgreSQL 8.4.2 (under FreeBSD) at the moment but
it wouldn't be a problem to upgrade to 9.1 if that was helpful.


Asher.

Re: Table growing faster than autovacuum can vacuum

From
Marti Raudsepp
Date:
On Wed, Feb 15, 2012 at 18:46, Asher Hoskins <asher@piceur.com> wrote:
> My problem is that the autovacuum system isn't keeping up with INSERTs and I
> keep running out of transaction IDs.

This is usually not a problem with vacuum, but a problem with
consuming too many transaction IDs. I suspect you're loading that data
with individual INSERT statements with no explicit transaction
management -- which means every data-modifying query gets its own
transaction ID.

In most applications, the simplest solution is batching up lots of
INSERTs (say, 10k) and run them all in a single transaction between
BEGIN and COMMIT -- thus consuming just 1 transaction ID for 10k rows.
You could also look at multi-row inserts or the COPY command to get
better performance. Here's an overview of different data loading
approaches: http://www.depesz.com/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/

But it's also possible that transaction IDs are consumed by other
queries. Any PL/pgSQL procedure that makes use of exception handling
consumes a transaction ID. So does every SAVEPOINT command.

> I'm a complete newby when it comes to PostgreSQL system settings and it
> isn't obvious to me what I'd need to change to improve the autovacuum. Is
> there any way to manually freeze the rows of the table after they've been
> inserted so that the autovacuum doesn't need to touch the table?

VACUUM FREEZE is the command. :)

You may need to tune the vacuum settings to be more aggressive if you
want it to run faster (reducing cost_delay, increasing cost_limit).
But I don't have much experience with these myself.
http://www.postgresql.org/docs/8.4/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST

Note that vacuum freeze is only necessary when you have a shortage of
transaction IDs. Normally Postgres doesn't vacuum insert-only tables
at all (it just analyzes them).

VACUUM FULL is extremely inefficient in PostgreSQL 8.4 and older.

> The server is running PostgreSQL 8.4.2 (under FreeBSD) at the moment but it
> wouldn't be a problem to upgrade to 9.1 if that was helpful.

Upgrading to 9.1 certainly lets you take advantage of several
performance improvements, but I don't know how useful those are in
your workload.

----

Additionally, it's often a good idea to partition large tables into
smaller partitions (e.g. separate partition for each day/week worth of
data). This way maintenance commands (VACUUM, ANALYZE) don't have to
scan the whole huge table, but work with smaller individual tables,
and you can drop chunks using a simple DROP TABLE.

BUT it also has its problems: some types of query optimizations are
impossible with partitioned tables (version 9.1 relaxed those
limitations somewhat). So if you find your current performance to be
sufficient, then this might not be worth doing.

Regards,
Marti

Re: Table growing faster than autovacuum can vacuum

From
Marti Raudsepp
Date:
On Wed, Feb 15, 2012 at 19:25, Marti Raudsepp <marti@juffo.org> wrote:
> VACUUM FULL is extremely inefficient in PostgreSQL 8.4 and older.

Oh, a word of warning, PostgreSQL 9.0+ has a faster VACUUM FULL
implementation, but it now requires twice the disk space of your table
size, during the vacuum process.

Regards,
Marti

Re: Table growing faster than autovacuum can vacuum

From
John R Pierce
Date:
On 02/15/12 8:46 AM, Asher Hoskins wrote:
> I've got a database with a very large table (currently holding 23.5
> billion rows,

a table that large should probably be partitioned, likely by time.
maybe a partition for each month.  as each partition is filled, it can
be VACUUM FREEZE'd since it will never be modified again.   if you're
doing your batch inserts of 1-2 billion rows once a month or so, perhaps
that should be the partition right there.

since you're inserting the new data sequentially, and not doing UPDATEs,
there's no point in doing VACUUM FULL, just a plain vacuum.....

so, your ~ monthly batch run could be something like...

     create new partition table
     copy/insert your 1-2 billion rows
     vacuum analyze (NOT full) new table
     vacuum freeze new table
     update master partition table rules

this way, autovacuum won't even bother with these tables.



--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast


Re: Table growing faster than autovacuum can vacuum

From
Scott Marlowe
Date:
On Wed, Feb 15, 2012 at 12:38 PM, John R Pierce <pierce@hogranch.com> wrote:
> so, your ~ monthly batch run could be something like...
>
>    create new partition table
>    copy/insert your 1-2 billion rows
>    vacuum analyze (NOT full) new table
>    vacuum freeze new table
>    update master partition table rules

Just FYI, you can "vacuum freeze analyze;" all at once.

Re: Table growing faster than autovacuum can vacuum

From
Jasen Betts
Date:
On 2012-02-15, Asher Hoskins <asher@piceur.com> wrote:
> Hello.
>
> I've got a database with a very large table (currently holding 23.5
> billion rows, the output of various data loggers over the course of my
> PhD so far). The table itself has a trivial structure (see below) and is
> partitioned by data time/date and has quite acceptable INSERT/SELECT
> performance.
>
>    CREATE TABLE rawdata (
>      value REAL NOT NULL,
>      sample_time TIMESTAMP WITH TIME ZONE NOT NULL,
>      -- Dataset reference.
>      dataset INTEGER NOT NULL
>    );

> The data loggers are collected every month or so and uploaded into the
> database, resulting in another 1-2 billion rows in the table each time.
> Data is never deleted from the table and so it's essentially read-only.
>
> My problem is that the autovacuum system isn't keeping up with INSERTs
> and I keep running out of transaction IDs. SELECT performance also drops
> off the more I insert, which from looking at the output of iostat seems
> to be because the autovacuum tasks are taking up a lot of the disk
> bandwidth - the disks are commodity items in a software RAID and not
> terribly fast.

have you tried using COPY instead of INSERT (you'll have to insert
into the correct partition)

or altertatiely putting several rows in the VALUES part of the insert
(that should work with trigger based partitioning)

insert into rawdata
values
     (1.0,'2012-03-25 16:29:01 +13',1),
     (1.1,'2012-03-25 16:29:02 +13',1),
     (1.15,'2012-03-25 16:29:03 +13',1),
     (1.17,'2012-03-25 16:29:04 +13',1),
     (1.18,'2012-03-25 16:29:05 +13',1);

etc...


--
⚂⚃ 100% natural

Re: Table growing faster than autovacuum can vacuum

From
Scott Marlowe
Date:
On Sat, Mar 24, 2012 at 9:40 PM, Jasen Betts <jasen@xnet.co.nz> wrote:
>
> have you tried using COPY instead of INSERT (you'll have to insert
> into the correct partition)

triggers fire on copy, but rules do not.  So if he has partitioning
triggers they'll fire on the parent table etc.

HOWEVER, that'll be slower than copying to the proper partition to begin with.