Large object loading stalls - Mailing list pgsql-general

From Michael Akinde
Subject Large object loading stalls
Date
Msg-id 499C0CCD.7010304@met.no
Whole thread Raw
Responses Re: Large object loading stalls
List pgsql-general
Hi,

We have been running into a strange problem on our production servers of
a system we are about to roll out with the loading of large objects. At
various times (usually separated by a few days), the loading process
against the server will suddenly either take an extremely long time
(going from the usual 15-20 minutes up to 45 or more minutes) or simply
hang up altogether (i.e., when 8+ hours have gone by, we haven't had the
patience to wait and see if it eventually finishes). This is on an 8.3.3
server running on debian etch - most of the previous testing occurred on
8.1.10, where we did not encounter this problem (though we had other
issues on that platform).

I have run some tests with the loading of large objects into a
PostgreSQL database (8.3.3) using a more rapid turnaround (loading,
deleting, vacuuming directly and then starting the loading over again)
than in production and have been able to provoke what seems to be the
same issue there - usually within 12 hours.

Essentially, we have a C++ program (pqxx) loading a large number of
large objects (somewhere around a 80,000 or so) exhibiting strange
behavior. It varies quite a lot in loading time (from 1600 to 2500
seconds), and at random times it will simply just hang up (at least it
seems so, given that it has not progressed at all after 7-8 hours). It
displays this behaviour irrespective of whether we load all 80000 fields
in one go, or the loading is broken up into smaller continuos bits of
1000 blobs at a time (though it does seem easier to provoke the problem
with the former loading approach). It can run fine for hours at a time,
and then suddenly lock up (in the latest test I did, it loaded the data
correctly 22 times in a row, before locking up on the 23rd run through).

The pattern seems to be identical each time; ps aux shows the process
(wdb) idle in transaction, with no activity on the system being apparent:

postgres 19965  0.0  0.0  72628  5584 ?        S    Feb17   0:01
/usr/lib/postgresql/8.3/bin/postgres -D /var/lib/postgresql/8.3/main -c
config_file=/etc/postgresql/8.3/main/postgresql.conf
postgres 19967  0.4  0.0  72748 27636 ?        Ss   Feb17   6:38
postgres: writer
process

postgres 19968  0.0  0.0  72628  1784 ?        Ss   Feb17   0:09
postgres: wal writer
process

postgres 19969  0.0  0.0  72940  1924 ?        Ss   Feb17   0:00
postgres: autovacuum launcher
process

postgres 19970  0.0  0.0  42352  1736 ?        Ss   Feb17   0:02
postgres: stats collector
process

postgres 16617  0.0  0.0  95156 10948 ?        Ss   08:42   0:00
postgres: autovacuum worker process
wdb

postgres 18834  0.0  0.1 118632 37320 ?        Ss   09:06   0:01
postgres: wdb wdb [local] idle in
transaction


A core dump on the program we did earlier seemed to indicate that the
program was waiting on the database to return on the insert statement.
top indicates that both postgres and the application are idling (no load
on the server whatsoever).

pg_locks and pg_stat_activity seem to indicate that there may perhaps be
some issue with autovacuum?

 locktype  | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |  pid
|           mode           | granted

------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+--------------------------+---------
 relation   |    16385 |     2683 |      |       |
|               |         |       |          | 2/1122117          |
16617 | RowExclusiveLock         | t
 relation   |    16385 |    10969 |      |       |
|               |         |       |          | 3/1105             |
27895 | AccessShareLock          | t
 relation   |    16385 |     2613 |      |       |
|               |         |       |          | 2/1122117          |
16617 | ShareUpdateExclusiveLock | t
 virtualxid |          |          |      |       | 3/1105
|               |         |       |          | 3/1105             |
27895 | ExclusiveLock            | t
 virtualxid |          |          |      |       | 2/1122117
|               |         |       |          | 2/1122117          |
16617 | ExclusiveLock            | t
(5 rows)

 datid | datname | procpid | usesysid | usename  |
current_query                     | waiting |
xact_start           |          query_start          |
backend_start         | client_addr | client_port

-------+---------+---------+----------+----------+------------------------------------------------------+---------+-------------------------------+-------------------------------+-------------------------------+-------------+-------------
 16385 | wdb     |   16617 |       10 | postgres | autovacuum: VACUUM
ANALYZE pg_catalog.pg_largeobject | f       | 2009-02-18
08:42:45.066922+00 | 2009-02-18 08:42:45.066922+00 | 2009-02-18
08:42:45.065857+00 |             |
 16385 | wdb     |   27895 |    16384 | wdb      | select * from
pg_stat_activity ;                     | f       | 2009-02-18
12:21:47.925813+00 | 2009-02-18 12:21:47.925813+00 | 2009-02-18
11:09:31.201507+00 |             |          -1
 16385 | wdb     |   18834 |    16384 | wdb      | <IDLE> in
transaction                                | f
|                               | 2009-02-18 09:06:26.479633+00 |
2009-02-18 09:06:23.443698+00 |             |          -1
(3 rows)

Any leads as to how to figure out the problem or alternatively how to
work around it would be much appreciated.

I noticed that the disk on the test machine seemed to be running low on
disk caused by an error in the fsm settings, but this was not the case
in the production systems, so I assume that this is not the problem here
either (will fix it for future test runs, though).

Regards,

Michael Akinde




Attachment

pgsql-general by date:

Previous
From: Sam Mason
Date:
Subject: Re: transfering tables into other schema
Next
From: Glyn Astill
Date:
Subject: Query palns and tug-of-war with enable_sort