Re: Vacuum Full Analyze Stalled - Mailing list pgsql-admin
From | Jim C. Nasby |
---|---|
Subject | Re: Vacuum Full Analyze Stalled |
Date | |
Msg-id | 20051003194114.GD40138@pervasive.com Whole thread Raw |
In response to | Re: Vacuum Full Analyze Stalled ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: Vacuum Full Analyze Stalled
|
List | pgsql-admin |
I haven't looked in detail at autovacuum in 8.1 yet, but I know that in previous versions it was a bad idea to depend on it to vacuum a small table that has a lot of update activity frequently enough. The issue is that since it's single-threaded if it starts a vacuum on a large table it could be hours before it gets around to vacuuming the small table again. In a case like this, your best bet is to either have the application periodically vacuum the small table or just do it once a minute via cron. It should take next to no time to vacuum a 72 row table, so running vacuum once a minute shouldn't really present much overhead when the vacuum isn't needed. On Mon, Oct 03, 2005 at 02:09:37PM -0500, Kevin Grittner wrote: > The previous report was from the same databases -- I'm a consultant > tasked with evaluating the various open source options and making > one of them work with our existing framework. Jeff developed a new > app (which is in beta testing) which is our first use of PostgreSQL > with real production load for prolonged periods of time. > > Jeff was monitoring the app over the weekend, and posted here when > he had problems. Unfortunately, he didn't capture enough information > to diagnose much, but I'm passing along the sequence of events > since they lead up to a failure this morning for which we have more > detail. > > Saturday, 9:00 a.m. -- Noticed performace beginning to degrade. > Saturday, 1:00 p.m. -- After continued degradation, stops client app, > and runs vacuum full analyze on both Windows and Linux boxes. > Saturday, 3:00 p.m. -- Windows box completes the vacuum. > Sunday, 1:00 p.m. -- Vacuum on Linux still not done. CPU on Linux > box is completely idle. Killed the vacuum process. Ran vacuum > full analyze on each table individually. All fine except for the 72 row > table with very frequent updates. Left this one running. > Sunday, 9:00 p.m. -- Vacuum of 72 row table still running. Killed all > processes and restarted postgres. Started a vacuum full analyze > of the 72 row table again. > Sunday, 11:00 p.m. -- Vacuum of 72 row table finishes. Started > applications, which had been stopped since Saturday, 1:00 p.m. > Monday, 9:20 a.m. -- Performance is beginning to degrade again. > Detailed information from this point below. > > dtr=# select * from pg_locks order by pid; > locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted > ---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+--------------------------+--------- > transactionid | | | | | 36430565 | | | | 36430565 | 2771 |ExclusiveLock | t > relation | 30793 | 30976 | | | | | | | 36430565 | 2771 |AccessShareLock | t > relation | 30793 | 30976 | | | | | | | 36430565 | 2771 |RowExclusiveLock | t > relation | 30793 | 30871 | | | | | | | 36430565 | 2771 |AccessShareLock | t > relation | 30793 | 30871 | | | | | | | 36430565 | 2771 |RowExclusiveLock | t > relation | 30793 | 30976 | | | | | | | 36430561 | 2777 |AccessShareLock | t > relation | 30793 | 30976 | | | | | | | 36430561 | 2777 |RowExclusiveLock | t > transactionid | | | | | 36430561 | | | | 36430561 | 2777 |ExclusiveLock | t > relation | 30793 | 30871 | | | | | | | 36430561 | 2777 |AccessShareLock | t > relation | 30793 | 30871 | | | | | | | 36430561 | 2777 |RowExclusiveLock | t > transactionid | | | | | 36430563 | | | | 36430563 | 2783 |ExclusiveLock | t > relation | 30793 | 30871 | | | | | | | 36430563 | 2783 |AccessShareLock | t > relation | 30793 | 30871 | | | | | | | 36430563 | 2783 |RowExclusiveLock | t > relation | 30793 | 30976 | | | | | | | 36430563 | 2783 |AccessShareLock | t > relation | 30793 | 30976 | | | | | | | 36430563 | 2783 |RowExclusiveLock | t > relation | 30793 | 30976 | | | | | | | 36430559 | 2790 |AccessShareLock | t > relation | 30793 | 30976 | | | | | | | 36430559 | 2790 |RowExclusiveLock | t > relation | 30793 | 30871 | | | | | | | 36430559 | 2790 |AccessShareLock | t > relation | 30793 | 30871 | | | | | | | 36430559 | 2790 |RowExclusiveLock | t > transactionid | | | | | 36430559 | | | | 36430559 | 2790 |ExclusiveLock | t > relation | 30793 | 30976 | | | | | | | 35633343 | 3158 |ShareUpdateExclusiveLock | t > transactionid | | | | | 35633343 | | | | 35633343 | 3158 |ExclusiveLock | t > relation | 30793 | 30871 | | | | | | | 35633343 | 3158 |ShareUpdateExclusiveLock | t > relation | 30793 | 10340 | | | | | | | 36430567 | 4238 |AccessShareLock | t > transactionid | | | | | 36430567 | | | | 36430567 | 4238 |ExclusiveLock | t > (25 rows) > > dtr=# \q > postgres@linpost:/var/pgsql/data> ps aux|grep ^postgres > postgres 28937 0.0 0.0 2436 992 pts/1 S Sep30 0:00 su postgres > postgres 28938 0.0 0.0 2944 1620 pts/1 S Sep30 0:00 bash > postgres 2612 0.0 0.0 170968 4952 ? S Oct02 0:00 /usr/local/pgsql/bin/postmaster > postgres 2614 0.0 1.9 171200 165528 ? S Oct02 0:08 postgres: writer process > postgres 2615 0.0 0.0 7240 2788 ? S Oct02 0:27 postgres: stats buffer process > postgres 2617 0.0 0.0 6556 1988 ? S Oct02 0:22 postgres: stats collector process > postgres 2724 0.0 0.0 171828 7156 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32986) idle > postgres 2771 6.4 2.0 171996 167176 ? S Oct02 44:33 postgres: dtr dtr 127.0.0.1(32990) idle > postgres 2773 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32991) idle > postgres 2774 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32992) idle > postgres 2775 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32993) idle > postgres 2776 0.0 0.0 171720 6848 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32994) idle > postgres 2777 6.4 2.0 171996 167228 ? R Oct02 44:29 postgres: dtr dtr 127.0.0.1(32995) UPDATE > postgres 2778 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32996) idle > postgres 2779 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32997) idle > postgres 2780 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32998) idle > postgres 2781 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32999) idle > postgres 2782 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33000) idle > postgres 2783 6.4 2.0 172016 167292 ? S Oct02 44:38 postgres: dtr dtr 127.0.0.1(33001) idle > postgres 2784 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33002) idle > postgres 2785 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33003) idle > postgres 2786 6.4 2.0 172000 167216 ? S Oct02 44:38 postgres: dtr dtr 127.0.0.1(33004) idle > postgres 2787 6.4 2.0 172000 167236 ? R Oct02 44:33 postgres: dtr dtr 127.0.0.1(33005) UPDATE > postgres 2788 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33006) idle > postgres 2789 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33007) idle > postgres 2790 6.4 2.0 171992 167280 ? R Oct02 44:27 postgres: dtr dtr 127.0.0.1(33008) UPDATE > postgres 2791 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33009) idle > postgres 3158 0.0 0.0 188324 8044 ? S Oct02 0:00 postgres: autovacuum process dtr > postgres 4289 0.0 0.0 2148 672 pts/1 R+ 09:21 0:00 ps aux > postgres 4290 0.0 0.0 1688 688 pts/1 R+ 09:21 0:00 grep ^postgres > > oid 30871 is the 72 row table. > oid 30976 is the index for its primary key. > > We stopped the applications, and the autovacuum process didn't > wake up. > > dtr=# select * from pg_locks order by pid; > locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted > ---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+--------------------------+--------- > relation | 30793 | 30976 | | | | | | | 35633343 | 3158 |ShareUpdateExclusiveLock | t > transactionid | | | | | 35633343 | | | | 35633343 | 3158 |ExclusiveLock | t > relation | 30793 | 30871 | | | | | | | 35633343 | 3158 |ShareUpdateExclusiveLock | t > relation | 30793 | 10340 | | | | | | | 36600517 | 4498 |AccessShareLock | t > transactionid | | | | | 36600517 | | | | 36600517 | 4498 |ExclusiveLock | t > (5 rows) > > dtr=# \q > postgres@linpost:/var/pgsql/data> ps aux|grep ^postgres > postgres 2612 0.0 0.0 170968 4952 ? S Oct02 0:00 /usr/local/pgsql/bin/postmaster > postgres 2614 0.0 1.9 171200 165528 ? S Oct02 0:10 postgres: writer process > postgres 2615 0.0 0.0 7240 2788 ? S Oct02 0:32 postgres: stats buffer process > postgres 2617 0.0 0.0 6556 1988 ? S Oct02 0:26 postgres: stats collector process > postgres 3158 0.0 0.0 188324 8044 ? S Oct02 0:00 postgres: autovacuum process dtr > postgres 4481 0.0 0.0 2432 988 pts/1 S 10:48 0:00 su postgres > postgres 4482 0.0 0.0 2944 1604 pts/1 S 10:48 0:00 bash > postgres 4499 0.0 0.0 2148 672 pts/1 R+ 10:49 0:00 ps aux > postgres 4500 0.0 0.0 1680 636 pts/1 S+ 10:49 0:00 grep ^postgres > postgres@linpost:/var/pgsql/data> uptime > 10:48am up 6 days 19:20, 2 users, load average: 0.00, 0.28, 0.73 > postgres@linpost:/var/pgsql/data> vmstat 10 > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 8 278376 62676 7770684 0 0 15 7 17 18 11 1 87 1 > 0 0 8 278376 62700 7770660 0 0 0 11 1035 0 1 0 99 0 > 0 0 8 278440 62700 7770660 0 0 0 0 1013 0 0 0 100 0 > 0 0 8 278440 62700 7770660 0 0 0 0 1016 0 1 0 99 0 > 0 0 8 278440 62700 7770660 0 0 0 4 1018 0 1 0 99 0 > > We found the 72 row table somewhat bloated, so we cleaned it up. > > dtr=# vacuum analyze verbose "DbTranImageStatus"; > INFO: vacuuming "public.DbTranImageStatus" > INFO: index "DbTranImageStatusPK" now contains 72 row versions in 6665 pages > DETAIL: 6555 index pages have been deleted, 6555 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.01 sec. > INFO: "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 4244 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 479137 unused item pointers. > 0 pages are entirely empty. > CPU 0.01s/0.01u sec elapsed 0.02 sec. > INFO: vacuuming "pg_toast.pg_toast_272793" > INFO: index "pg_toast_272793_index" now contains 0 row versions in 1 pages > DETAIL: 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 0 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: analyzing "public.DbTranImageStatus" > INFO: "DbTranImageStatus": scanned 3000 of 4244 pages, containing 68 live rows and 0 dead rows; 68 rows in sample, 96estimated total rows > VACUUM > dtr=# vacuum full analyze verbose "DbTranImageStatus"; > INFO: vacuuming "public.DbTranImageStatus" > INFO: "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 4244 pages > DETAIL: 0 dead row versions cannot be removed yet. > Nonremovable row versions range from 68 to 68 bytes long. > There were 479137 unused item pointers. > Total free space (including removable row versions) is 32760236 bytes. > 4227 pages are or will become empty, including 0 at the end of the table. > 4244 pages containing 32760236 free bytes are potential move destinations. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "DbTranImageStatusPK" now contains 72 row versions in 6654 pages > DETAIL: 0 index row versions were removed. > 6544 index pages have been deleted, 6544 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "DbTranImageStatus": moved 22 row versions, truncated 4244 to 1 pages > DETAIL: CPU 0.00s/0.02u sec elapsed 0.03 sec. > INFO: index "DbTranImageStatusPK" now contains 72 row versions in 6654 pages > DETAIL: 22 index row versions were removed. > 6544 index pages have been deleted, 6544 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: vacuuming "pg_toast.pg_toast_272793" > INFO: "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages > DETAIL: 0 dead row versions cannot be removed yet. > Nonremovable row versions range from 0 to 0 bytes long. > There were 0 unused item pointers. > Total free space (including removable row versions) is 0 bytes. > 0 pages are or will become empty, including 0 at the end of the table. > 0 pages containing 0 free bytes are potential move destinations. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "pg_toast_272793_index" now contains 0 row versions in 1 pages > DETAIL: 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: analyzing "public.DbTranImageStatus" > INFO: "DbTranImageStatus": scanned 1 of 1 pages, containing 72 live rows and 0 dead rows; 72 rows in sample, 72 estimatedtotal rows > VACUUM > dtr=# reindex table "DbTranImageStatus"; > REINDEX > dtr=# vacuum analyze verbose "DbTranImageStatus"; > INFO: vacuuming "public.DbTranImageStatus" > INFO: index "DbTranImageStatusPK" now contains 72 row versions in 2 pages > DETAIL: 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 1 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 48 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: vacuuming "pg_toast.pg_toast_272793" > INFO: index "pg_toast_272793_index" now contains 0 row versions in 1 pages > DETAIL: 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 0 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: analyzing "public.DbTranImageStatus" > INFO: "DbTranImageStatus": scanned 1 of 1 pages, containing 72 live rows and 0 dead rows; 72 rows in sample, 72 estimatedtotal rows > VACUUM > > These all ran sub-second. We updated postgresql.conf for more > aggressive autovacuum, and restarted postgres, then restarted the > application. > > This failure seems rather different from the one I previously posted, > since it was blocking on the application table, rather than > pg_constraint_contypid_index, and it did not wake up when all > other processes where stopped. > > As before, both boxes are running 8.1beta2. Windows is not showing > these problems with the autovacuum blocking; just Linux. > > -Kevin > > > >>> Tom Lane <tgl@sss.pgh.pa.us> 10/02/05 8:53 PM >>> > "Jeff Kirby" <Jeff.Kirby@wicourts.gov> writes: > > the Linux box however is still chugging away this morning... and > > appears to be stuck on vacuuming "pg_constraint_contypid_index". How > > do I know... well I don't really... I'm inferring based on the order > > of the log output on the Windows box. > > Looking in pg_locks would give you a more reliable indicator of what the > VACUUM is currently working on. > > Is the Linux box otherwise idle? There was another report recently of a > vacuum hung up for a long time on pg_constraint_contypid_index, but it > seemed to be due to extremely heavy usage of domain types ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
pgsql-admin by date: