Problems Vacuum'ing - Mailing list pgsql-hackers

From jseymour@LinxNet.com (Jim Seymour)
Subject Problems Vacuum'ing
Date
Msg-id 20040401215058.3FBB14307@jimsun.LinxNet.com
Whole thread Raw
List pgsql-hackers
Hi,

[Just so y'all know: This has been discussed extensively in#PostgreSQL and I tried asking the question in both -admin
and-general,first.  Also did some web searching.]
 

Environment:
   PostgreSQL 7.4.2   Locally built with GCC 3.3.1   Solaris 8 (Sparc)

I have a relatively simple database created with...

create table ethers (   hostname   varchar(64) unique not null,   mac        macaddr not null,   created    timestamp
(0)not null default current_timestamp,   changed    timestamp (0),   last_seen  timestamp (0) not null default
current_timestamp
);

create table host_mac_hist (   hostname   varchar(64) not null,   mac        macaddr not null,   created
timestamp(0)not null default current_timestamp,   last_seen  timestamp(0) not null
 
);

I'm populating the data from bunches of existing flat files in such a
manner that the "ethers" table, in particular, is getting updated
literally thousands of times.  It got slow, so I stopped the updating
and went to vacuum.  (Using psql as the user/owner of the db.)

The problem is that attempts to vacuum these tables resulted in "NNN
dead row versions cannot be removed yet."  Went through a lot of
analysis (e.g.: "Any hanging txns?") and trying different things with
folks on the #PostgreSQL IRC channel, all to no avail.

There is a WebObjects application that is the only other thing
accessing pgsql.  It is not accessing the same database, much-less
those tables.  (This was confirmed by enabling connection logging and
checking the log.)  Yet the only way I can successfully vacuum these
tables is to shut-down WebObjects *or* if I vacuum before there are
"too many" dead rows.  (Or so I thought!  Additional info later...)

Output of one attempt...

$ vacuumdb -U sysagent -t ethers --verbose --analyze sysagent
Password: 
INFO:  vacuuming "public.ethers"
INFO:  index "ethers_hostname_key" now contains 114002 row versions in 2389 pages
DETAIL:  1865 index pages have been deleted, 1865 are currently reusable.
CPU 0.18s/0.09u sec elapsed 0.41 sec.
INFO:  "ethers": found 0 removable, 114002 nonremovable row versions in 1114 pages
DETAIL:  113590 dead row versions cannot be removed yet.
There were 2184 unused item pointers.
0 pages are entirely empty.
CPU 0.20s/0.18u sec elapsed 0.54 sec.
INFO:  analyzing "public.ethers"
INFO:  "ethers": 1114 pages, 412 rows sampled, 412 estimated total rows
VACUUM

And...

$ vacuumdb -U sysagent -t ethers --verbose --analyze --full sysagent
Password: 
INFO:  vacuuming "public.ethers"
INFO:  "ethers": found 0 removable, 114002 nonremovable row versions in 1114 pages
DETAIL:  113590 dead row versions cannot be removed yet.
Nonremovable row versions range from 64 to 88 bytes long.
There were 2184 unused item pointers.
Total free space (including removable row versions) is 169880 bytes.
0 pages are or will become empty, including 0 at the end of the table.
816 pages containing 162192 free bytes are potential move destinations.
CPU 0.06s/2.03u sec elapsed 2.11 sec.
INFO:  index "ethers_hostname_key" now contains 114002 row versions in 2389 pages
DETAIL:  0 index row versions were removed.
1865 index pages have been deleted, 1865 are currently reusable.
CPU 0.22s/0.45u sec elapsed 0.73 sec.
INFO:  "ethers": moved 1745 row versions, truncated 1114 to 1114 pages
DETAIL:  CPU 0.39s/0.80u sec elapsed 2.79 sec.
INFO:  index "ethers_hostname_key" now contains 115740 row versions in 2389 pages
DETAIL:  7 index row versions were removed.
1856 index pages have been deleted, 1856 are currently reusable.
CPU 0.30s/0.15u sec elapsed 0.53 sec.
INFO:  analyzing "public.ethers"
INFO:  "ethers": 1114 pages, 412 rows sampled, 412 estimated total rows
VACUUM

I can understand how a non-full vacuum might fail if I have
insufficient FSM.  But "full" should get around that, should it not?

Besides: I did a new test today.  I added to my Perl code a bit that
would vacuum every 10 files read-in.  This would amount to about 5000
dead rows and, IIRC, less than 300 pages.  Much less than the default
FSM.  So even a non-full vacuum should be succeeding, no?  This new
script would succeed in getting everything vacuumed-up for a while and
then, at some point (failed to notice when): The "dead row versions
cannot be removed yet" came back and steadily incremented each time
vacuum was run.  If I were going to guess, I'd *guess* maybe this
started happening about the time somebody queried the WebObjects
application, thus causing it to connect, but I've no way of knowing
after-the-fact.  (Sorry for the vagueness here.)

Any idea of what might be going on here?

TIA,
Jim
-- 
Jim Seymour                  | PGP Public Key available at:
jseymour@LinxNet.com         | http://www.uk.pgp.net/pgpnet/pks-commands.html
http://jimsun.LinxNet.com    |


pgsql-hackers by date:

Previous
From: Ana Cerejo
Date:
Subject: pre-loading a user table.
Next
From: Joe Conway
Date:
Subject: Re: Inconsistent behavior on Array & Is Null?