Re: Vacuum Full Analyze Stalled - Mailing list pgsql-admin

From Kevin Grittner
Subject Re: Vacuum Full Analyze Stalled
Date
Msg-id s3413ba7.062@gwmta.wicourts.gov
Whole thread Raw
In response to Vacuum Full Analyze Stalled  ("Jeff Kirby" <Jeff.Kirby@wicourts.gov>)
Responses Re: Vacuum Full Analyze Stalled
List pgsql-admin
The previous report was from the same databases -- I'm a consultant
tasked with evaluating the various open source options and making
one of them work with our existing framework.  Jeff developed a new
app (which is in beta testing) which is our first use of PostgreSQL
with real production load for prolonged periods of time.

Jeff was monitoring the app over the weekend, and posted here when
he had problems.  Unfortunately, he didn't capture enough information
to diagnose much, but I'm passing along the sequence of events
since they lead up to a failure this morning for which we have more
detail.

Saturday, 9:00 a.m. -- Noticed performace beginning to degrade.
Saturday, 1:00 p.m. -- After continued degradation, stops client app,
   and runs vacuum full analyze on both Windows and Linux boxes.
Saturday, 3:00 p.m. -- Windows box completes the vacuum.
Sunday, 1:00 p.m. -- Vacuum on Linux still not done.  CPU on Linux
   box is completely idle.  Killed the vacuum process.  Ran vacuum
   full analyze on each table individually.  All fine except for the 72 row
   table with very frequent updates.  Left this one running.
Sunday, 9:00 p.m. -- Vacuum of 72 row table still running.  Killed all
   processes and restarted postgres.  Started a vacuum full analyze
   of the 72 row table again.
Sunday, 11:00 p.m. -- Vacuum of 72 row table finishes.  Started
   applications, which had been stopped since Saturday, 1:00 p.m.
Monday, 9:20 a.m. -- Performance is beginning to degrade again.
   Detailed information from this point below.

dtr=# select * from pg_locks order by pid;
   locktype    | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid  |
         mode           | granted 

---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+--------------------------+---------
 transactionid |          |          |      |       |      36430565 |         |       |          |    36430565 | 2771 |
ExclusiveLock           | t 
 relation      |    30793 |    30976 |      |       |               |         |       |          |    36430565 | 2771 |
AccessShareLock         | t 
 relation      |    30793 |    30976 |      |       |               |         |       |          |    36430565 | 2771 |
RowExclusiveLock        | t 
 relation      |    30793 |    30871 |      |       |               |         |       |          |    36430565 | 2771 |
AccessShareLock         | t 
 relation      |    30793 |    30871 |      |       |               |         |       |          |    36430565 | 2771 |
RowExclusiveLock        | t 
 relation      |    30793 |    30976 |      |       |               |         |       |          |    36430561 | 2777 |
AccessShareLock         | t 
 relation      |    30793 |    30976 |      |       |               |         |       |          |    36430561 | 2777 |
RowExclusiveLock        | t 
 transactionid |          |          |      |       |      36430561 |         |       |          |    36430561 | 2777 |
ExclusiveLock           | t 
 relation      |    30793 |    30871 |      |       |               |         |       |          |    36430561 | 2777 |
AccessShareLock         | t 
 relation      |    30793 |    30871 |      |       |               |         |       |          |    36430561 | 2777 |
RowExclusiveLock        | t 
 transactionid |          |          |      |       |      36430563 |         |       |          |    36430563 | 2783 |
ExclusiveLock           | t 
 relation      |    30793 |    30871 |      |       |               |         |       |          |    36430563 | 2783 |
AccessShareLock         | t 
 relation      |    30793 |    30871 |      |       |               |         |       |          |    36430563 | 2783 |
RowExclusiveLock        | t 
 relation      |    30793 |    30976 |      |       |               |         |       |          |    36430563 | 2783 |
AccessShareLock         | t 
 relation      |    30793 |    30976 |      |       |               |         |       |          |    36430563 | 2783 |
RowExclusiveLock        | t 
 relation      |    30793 |    30976 |      |       |               |         |       |          |    36430559 | 2790 |
AccessShareLock         | t 
 relation      |    30793 |    30976 |      |       |               |         |       |          |    36430559 | 2790 |
RowExclusiveLock        | t 
 relation      |    30793 |    30871 |      |       |               |         |       |          |    36430559 | 2790 |
AccessShareLock         | t 
 relation      |    30793 |    30871 |      |       |               |         |       |          |    36430559 | 2790 |
RowExclusiveLock        | t 
 transactionid |          |          |      |       |      36430559 |         |       |          |    36430559 | 2790 |
ExclusiveLock           | t 
 relation      |    30793 |    30976 |      |       |               |         |       |          |    35633343 | 3158 |
ShareUpdateExclusiveLock| t 
 transactionid |          |          |      |       |      35633343 |         |       |          |    35633343 | 3158 |
ExclusiveLock           | t 
 relation      |    30793 |    30871 |      |       |               |         |       |          |    35633343 | 3158 |
ShareUpdateExclusiveLock| t 
 relation      |    30793 |    10340 |      |       |               |         |       |          |    36430567 | 4238 |
AccessShareLock         | t 
 transactionid |          |          |      |       |      36430567 |         |       |          |    36430567 | 4238 |
ExclusiveLock           | t 
(25 rows)

dtr=# \q
postgres@linpost:/var/pgsql/data> ps aux|grep ^postgres
postgres 28937  0.0  0.0  2436  992 pts/1    S    Sep30   0:00 su postgres
postgres 28938  0.0  0.0  2944 1620 pts/1    S    Sep30   0:00 bash
postgres  2612  0.0  0.0 170968 4952 ?       S    Oct02   0:00 /usr/local/pgsql/bin/postmaster
postgres  2614  0.0  1.9 171200 165528 ?     S    Oct02   0:08 postgres: writer process
postgres  2615  0.0  0.0  7240 2788 ?        S    Oct02   0:27 postgres: stats buffer process
postgres  2617  0.0  0.0  6556 1988 ?        S    Oct02   0:22 postgres: stats collector process
postgres  2724  0.0  0.0 171828 7156 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(32986) idle
postgres  2771  6.4  2.0 171996 167176 ?     S    Oct02  44:33 postgres: dtr dtr 127.0.0.1(32990) idle
postgres  2773  0.0  0.0 171460 5824 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(32991) idle
postgres  2774  0.0  0.0 171460 5824 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(32992) idle
postgres  2775  0.0  0.0 171460 5824 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(32993) idle
postgres  2776  0.0  0.0 171720 6848 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(32994) idle
postgres  2777  6.4  2.0 171996 167228 ?     R    Oct02  44:29 postgres: dtr dtr 127.0.0.1(32995) UPDATE
postgres  2778  0.0  0.0 171460 5824 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(32996) idle
postgres  2779  0.0  0.0 171460 5824 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(32997) idle
postgres  2780  0.0  0.0 171460 5824 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(32998) idle
postgres  2781  0.0  0.0 171460 5824 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(32999) idle
postgres  2782  0.0  0.0 171460 5824 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(33000) idle
postgres  2783  6.4  2.0 172016 167292 ?     S    Oct02  44:38 postgres: dtr dtr 127.0.0.1(33001) idle
postgres  2784  0.0  0.0 171460 5824 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(33002) idle
postgres  2785  0.0  0.0 171460 5828 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(33003) idle
postgres  2786  6.4  2.0 172000 167216 ?     S    Oct02  44:38 postgres: dtr dtr 127.0.0.1(33004) idle
postgres  2787  6.4  2.0 172000 167236 ?     R    Oct02  44:33 postgres: dtr dtr 127.0.0.1(33005) UPDATE
postgres  2788  0.0  0.0 171460 5828 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(33006) idle
postgres  2789  0.0  0.0 171460 5828 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(33007) idle
postgres  2790  6.4  2.0 171992 167280 ?     R    Oct02  44:27 postgres: dtr dtr 127.0.0.1(33008) UPDATE
postgres  2791  0.0  0.0 171460 5828 ?       S    Oct02   0:00 postgres: dtr dtr 127.0.0.1(33009) idle
postgres  3158  0.0  0.0 188324 8044 ?       S    Oct02   0:00 postgres: autovacuum process   dtr
postgres  4289  0.0  0.0  2148  672 pts/1    R+   09:21   0:00 ps aux
postgres  4290  0.0  0.0  1688  688 pts/1    R+   09:21   0:00 grep ^postgres

oid 30871 is the 72 row table.
oid 30976 is the index for its primary key.

We stopped the applications, and the autovacuum process didn't
wake up.

dtr=# select * from pg_locks order by pid;
   locktype    | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid  |
         mode           | granted 

---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+--------------------------+---------
 relation      |    30793 |    30976 |      |       |               |         |       |          |    35633343 | 3158 |
ShareUpdateExclusiveLock| t 
 transactionid |          |          |      |       |      35633343 |         |       |          |    35633343 | 3158 |
ExclusiveLock           | t 
 relation      |    30793 |    30871 |      |       |               |         |       |          |    35633343 | 3158 |
ShareUpdateExclusiveLock| t 
 relation      |    30793 |    10340 |      |       |               |         |       |          |    36600517 | 4498 |
AccessShareLock         | t 
 transactionid |          |          |      |       |      36600517 |         |       |          |    36600517 | 4498 |
ExclusiveLock           | t 
(5 rows)

dtr=# \q
postgres@linpost:/var/pgsql/data> ps aux|grep ^postgres
postgres  2612  0.0  0.0 170968 4952 ?       S    Oct02   0:00 /usr/local/pgsql/bin/postmaster
postgres  2614  0.0  1.9 171200 165528 ?     S    Oct02   0:10 postgres: writer process
postgres  2615  0.0  0.0  7240 2788 ?        S    Oct02   0:32 postgres: stats buffer process
postgres  2617  0.0  0.0  6556 1988 ?        S    Oct02   0:26 postgres: stats collector process
postgres  3158  0.0  0.0 188324 8044 ?       S    Oct02   0:00 postgres: autovacuum process   dtr
postgres  4481  0.0  0.0  2432  988 pts/1    S    10:48   0:00 su postgres
postgres  4482  0.0  0.0  2944 1604 pts/1    S    10:48   0:00 bash
postgres  4499  0.0  0.0  2148  672 pts/1    R+   10:49   0:00 ps aux
postgres  4500  0.0  0.0  1680  636 pts/1    S+   10:49   0:00 grep ^postgres
postgres@linpost:/var/pgsql/data> uptime
 10:48am  up 6 days 19:20,  2 users,  load average: 0.00, 0.28, 0.73
postgres@linpost:/var/pgsql/data> vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0      8 278376  62676 7770684    0    0    15     7   17    18 11  1 87  1
 0  0      8 278376  62700 7770660    0    0     0    11 1035     0  1  0 99  0
 0  0      8 278440  62700 7770660    0    0     0     0 1013     0  0  0 100  0
 0  0      8 278440  62700 7770660    0    0     0     0 1016     0  1  0 99  0
 0  0      8 278440  62700 7770660    0    0     0     4 1018     0  1  0 99  0

We found the 72 row table somewhat bloated, so we cleaned it up.

dtr=# vacuum analyze verbose "DbTranImageStatus";
INFO:  vacuuming "public.DbTranImageStatus"
INFO:  index "DbTranImageStatusPK" now contains 72 row versions in 6665 pages
DETAIL:  6555 index pages have been deleted, 6555 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 4244 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 479137 unused item pointers.
0 pages are entirely empty.
CPU 0.01s/0.01u sec elapsed 0.02 sec.
INFO:  vacuuming "pg_toast.pg_toast_272793"
INFO:  index "pg_toast_272793_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.DbTranImageStatus"
INFO:  "DbTranImageStatus": scanned 3000 of 4244 pages, containing 68 live rows and 0 dead rows; 68 rows in sample, 96
estimatedtotal rows 
VACUUM
dtr=# vacuum full analyze verbose "DbTranImageStatus";
INFO:  vacuuming "public.DbTranImageStatus"
INFO:  "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 4244 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 68 to 68 bytes long.
There were 479137 unused item pointers.
Total free space (including removable row versions) is 32760236 bytes.
4227 pages are or will become empty, including 0 at the end of the table.
4244 pages containing 32760236 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "DbTranImageStatusPK" now contains 72 row versions in 6654 pages
DETAIL:  0 index row versions were removed.
6544 index pages have been deleted, 6544 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "DbTranImageStatus": moved 22 row versions, truncated 4244 to 1 pages
DETAIL:  CPU 0.00s/0.02u sec elapsed 0.03 sec.
INFO:  index "DbTranImageStatusPK" now contains 72 row versions in 6654 pages
DETAIL:  22 index row versions were removed.
6544 index pages have been deleted, 6544 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_272793"
INFO:  "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 0 to 0 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_toast_272793_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.DbTranImageStatus"
INFO:  "DbTranImageStatus": scanned 1 of 1 pages, containing 72 live rows and 0 dead rows; 72 rows in sample, 72
estimatedtotal rows 
VACUUM
dtr=# reindex table "DbTranImageStatus";
REINDEX
dtr=# vacuum analyze verbose "DbTranImageStatus";
INFO:  vacuuming "public.DbTranImageStatus"
INFO:  index "DbTranImageStatusPK" now contains 72 row versions in 2 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 48 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_272793"
INFO:  index "pg_toast_272793_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.DbTranImageStatus"
INFO:  "DbTranImageStatus": scanned 1 of 1 pages, containing 72 live rows and 0 dead rows; 72 rows in sample, 72
estimatedtotal rows 
VACUUM

These all ran sub-second.  We updated postgresql.conf for more
aggressive autovacuum, and restarted postgres, then restarted the
application.

This failure seems rather different from the one I previously posted,
since it was blocking on the application table, rather than
pg_constraint_contypid_index, and it did not wake up when all
other processes where stopped.

As before, both boxes are running 8.1beta2.  Windows is not showing
these problems with the autovacuum blocking; just Linux.

-Kevin


>>> Tom Lane <tgl@sss.pgh.pa.us> 10/02/05 8:53 PM >>>
"Jeff Kirby" <Jeff.Kirby@wicourts.gov> writes:
> the Linux box however is still chugging away this morning... and
> appears to be stuck on vacuuming "pg_constraint_contypid_index".  How
> do I know... well I don't really... I'm inferring based on the order
> of the log output on the Windows box.

Looking in pg_locks would give you a more reliable indicator of what the
VACUUM is currently working on.

Is the Linux box otherwise idle?  There was another report recently of a
vacuum hung up for a long time on pg_constraint_contypid_index, but it
seemed to be due to extremely heavy usage of domain types ...

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org


pgsql-admin by date:

Previous
From: Joshua Ort
Date:
Subject: PostgreSQL error (new user)
Next
From: "Kevin Grittner"
Date:
Subject: Re: Vacuum Full Analyze Stalled