BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases - Mailing list pgsql-bugs

From Sakari Maaranen
Subject BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases
Date
Msg-id 201005281302.o4SD2vjk058189@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases
Re: BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5480
Logged by:          Sakari Maaranen
Email address:      sam@iki.fi
PostgreSQL version: 8.4.4
Operating system:   CentOS 5.5
Description:        Autovacuum interferes with operations (e.g. truncate) on
very large databases
Details:

I'm using PostgreSQL to maintain a very large database that holds the full
OpenStreetMap database. For example, the table for geospatial nodes (points
on the map) contains hundreds of millions of rows.

The documentation says that the best way to use autovacuum is to let the
database vacuum often. However when the tables are very large, this can take
a very long time.

The documentation also says that the quickest way to empty a whole table
would be by using the truncate command. However, if the autovacuum daemon
happens to be working on the same table, it causes the truncate command to
hang for a very long time to wait the vacuum to finish. This makes no sense,
because the table is going to be emptied, so there's no point in vacuuming
it first (or is there?)

For some reason PostgreSQL failed when I had two separate processes working
on different tables of the same very large database:

1. I was restoring one table via psql from a pg_dump that was created
earlier in the default (COPY) mode.

2. At the same time there was another process inserting data in another
table via a JDBC connection.

Both tables are in the same database. Both processes were long-running
operations -- (1) the COPY restoration taking several hours and (2) the JDBC
connection running several days. At some point the database failed for an
unknown reason.

This is the output from the restoration process:

bash-3.2$ 7za x -so /var/lib/pgsql/backups/osm_way_nodes.pgdump.7z | pv |
psql -d routing

7-Zip (A) 4.61 beta  Copyright (c) 1999-2008 Igor Pavlov  2008-11-23
p7zip Version 4.61 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,8 CPUs)

Processing archive: /var/lib/pgsql/backups/osm_way_nodes.pgdump.7z

Extracting  osm_way_nodes.pgdump
SET
SET
SET
SET
<...running for several hours here, then suddenly...>
WARNING:  terminating connection because of crash of another server process
 ]
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
CONTEXT:  COPY osm_way_nodes, line 161870753: "17043232 176883291
356"
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost


The JDBC connection had been running for over a week before, then restarted
about 3h 12min 50sec before this crash happened.

This is the output from the JDBC connection at the time of crash:

12:27 osm2postgis.core.Monitor run INFO: Time elapsed 0 d 03:12:50;
Committed up to line 718713520;
12:27 osm2postgis.core.Monitor run INFO: Cumulative:
public.osm_changesets:ignored=3607228 public.osm_nodes:created=3996789
public.osm_nodes:ig
nored=493155015 public.osm_nodes:verified=4050
12:48 postgis.dao.OutputTarget keepAlive WARNING: The database connection
failed.
12:48 postgis.dao.OutputTarget keepAlive FINE:
org.postgresql.util.PSQLException: An I/O error occured while sending to the
backend.
Exception in thread "Monitor" java.lang.IllegalStateException: Database
connection died.
        at net.sourceforge.osm2postgis.core.Monitor.run(Monitor.java:149)
        at net.sourceforge.osm2postgis.Shell.main(Shell.java:207)
12:48 postgis.dao.OutputTarget rollback WARNING: Rollback failed.
org.postgresql.util.PSQLException: An I/O error occured while sending to the
backend.
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:283)

        at
org.postgresql.jdbc2.AbstractJdbc2Connection.executeTransactionCommand(Abstr
actJdbc2Connection.java:671)
        at
org.postgresql.jdbc2.AbstractJdbc2Connection.rollback(AbstractJdbc2Connectio
n.java:708)
        at
net.sourceforge.osm2postgis.dao.OutputTarget.rollback(OutputTarget.java:335)

        at
net.sourceforge.osm2postgis.dao.OutputTarget.write(OutputTarget.java:311)
        at
net.sourceforge.osm2postgis.dao.OutputTarget.retryWrite(OutputTarget.java:35
8)
        at
net.sourceforge.osm2postgis.core.PostGISWriter.run(PostGISWriter.java:91)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Stream closed
        at sun.nio.cs.StreamEncoder.ensureOpen(Unknown Source)
        at sun.nio.cs.StreamEncoder.flush(Unknown Source)
        at java.io.OutputStreamWriter.flush(Unknown Source)
        at org.postgresql.core.PGStream.flush(PGStream.java:507)
        at
org.postgresql.core.v3.QueryExecutorImpl.sendSync(QueryExecutorImpl.java:110
7)
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:256)

        ... 7 more
12:48 postgis.dao.OutputTarget retryWrite WARNING: Problems writing
OSMNode{id=577394474,changeset=3268325,time=1259707636000,version=1,lat=44.
194435,lon=9.903339}. Retry attempt 0 of 5.
org.postgresql.util.PSQLException: An I/O error occured while sending to the
backend.
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:283)

        at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.j
ava:479)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2St
atement.java:367)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2State
ment.java:321)
        at net.sourceforge.osm2postgis.dao.NodeDAO.create(NodeDAO.java:104)
        at net.sourceforge.osm2postgis.dao.NodeDAO.create(NodeDAO.java:30)
        at
net.sourceforge.osm2postgis.dao.OutputTarget.write(OutputTarget.java:288)
        at
net.sourceforge.osm2postgis.dao.OutputTarget.retryWrite(OutputTarget.java:35
8)
        at
net.sourceforge.osm2postgis.core.PostGISWriter.run(PostGISWriter.java:91)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
        at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:261)
        at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.ja
va:1620)
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)

        ... 9 more
12:48 postgis.dao.OutputTarget keepAlive WARNING: The database connection is
closed.



Soon after the crash I decided that I should truncate the osm_way_nodes
table that was the target of the restoration process (1).  When I tried to
do that (truncate osm_way_nodes;) the command was simply left hanging. I
decided to check the pg_locks and found that the autovacuum process had
locked the table. I had to kill -KILL the autovacuum process to unlock it so
I could truncate the table.

I still don't know why the database failed in the first place.

Regardless of that, I think PostgreSQL should automatically stop
autovacuuming when the user is truncating a table. The truncate command
should interrupt autovacuum on that table and work immediately to empty the
table. Let autovacuum continue after truncate, if it's still needed. This is
probably not a problem on small tables, but mine had (hundreds of) millions
of rows. PostgreSQL should work with very large databases as well.

Maybe the autovacuum was the reason why those processes failed in the first
place, but I'm not sure about that.

Please either make PostgreSQL work with very large databases, or give
recommendations on the maximum size of tables that should work well with
PostgreSQL autovacuum and all. Thank you!

pgsql-bugs by date:

Previous
From: Jasen Betts
Date:
Subject: Re: BUG #5476: sequence corruption
Next
From: Krzysztof Nienartowicz
Date:
Subject: Re: Query causing explosion of temp space with join involving partitioning