Re: BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases - Mailing list pgsql-bugs

From Sakari A. Maaranen
Subject Re: BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases
Date
Msg-id AANLkTinHhAAT5bKIpBYDrRJMozm5ImukgIUp2OV4jkMy@mail.gmail.com
Whole thread Raw
In response to Re: BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases  (Greg Stark <gsstark@mit.edu>)
Responses Re: Re: BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases
List pgsql-bugs
2010/5/29 Greg Stark <gsstark@mit.edu>:
> On Fri, May 28, 2010 at 2:02 PM, Sakari Maaranen <sam@iki.fi> wrote:
>> The documentation says that the best way to use autovacuum is to let the
>> database vacuum often. However when the tables are very large, this can take
>> a very long time.
>
> In addition to Tom's comments, as of 8.4 this is no longer true as
> well. The running time of VACUUM should be related to the amount of
> the table which has been dirtied, not the total size of the table.
>
> It's not perfect (yet) though. If you have large indexes they have to
> be scanned as well, and periodically a vacuum freeze has to happen
> which does have to scan the whole table but that's at a much longer
> interval than the frequent vacuums.
>
> --
> greg
>

In my case (at this time) the bulk import actually did dirty basically
the whole table. The failed bulk load was one huge transaction with
tens of millions of rows, and that transaction failed (because the
server process died).

I discussed this with Tom in detail, and he helped me find and
understand the cause of the crash. That was PostgreSQL running out of
memory because of large bulk operations on existing tables with
foreign key constraints. I was doing it in a way that PostgreSQL is
not really designed to support. Couldn't have solved it without Tom's
advice.

Tom Lane wrote:  "...you're doing a data-only restore into a table
with pre-existing foreign key constraints.  Each new row will create a
pending AFTER trigger event (to fire the FK check trigger) and after
enough rows get loaded you'll be out of memory.  Even if you didn't
run out of memory, firing all those triggers will take forever, so
this isn't the way to approach the task."

So, now I'm altering my target tables to drop the foreign key
constraints before update. Will alter again to rebuild them after the
operation. Not very convenient, but doesn't matter as long as it
works.

Later on, my planned use case is:

 -  a big database in use 24/7 (mostly reads) with hundreds of millions of rows

 -  large bulk updates applied weekly, potentially with millions (or
at least 100's of thousands) of changes. Or maybe smaller updates more
often, if that's an easier load profile for PostgreSQL.

This means there will be millions of dirty rows whenever the bulk
updates happen. Still the database needs to be in constant use, so I
can't really remove its foreign keys for the bulk loads. Would be
great if PostgreSQL could support very large batch updates without the
need to remove foreign keys. In my case a performance penalty is
acceptable, but a crash is not.

According to Tom, the current implementation can't really handle very
large batch updates on existing tables with foreign keys. Trying this
makes PostgreSQL server run out of memory when the dataset gets large
enough. I think a performance penalty would be acceptable (can't be
avoided), but the server shouldn't run out of memory like it does now.
Some people might consider this a feature, some see it as a bug...

For now, I can work around this on the client side by splitting the
updates into a million separate transactions instead of a single big
one. Will be slow, but it should work.

In conclusion, this bug was apparently not so much about vacuuming
after all. It was more about very large batch updates on pre-existing
tables with foreign key constraints causing too many pending AFTER
triggers thus exhausting all available memory and taking forever to
process. The hanging lock was probably just a side effect when the
system was running out of memory.

--
Br, Sakari

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #5483: PQescapeStringConn behaviour ??
Next
From: Robert Haas
Date:
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation