Thread: vacuum problem
Got something really odd happening here. Simple test program, in java, with jdbc, postgres 7.4.5, on a redhat linux system... app does a heavy loop of a prepared UPDATE, then Commit, 10000s of times. the table has a few columns, nothing fancy at all. On our Redhat Enterprise 2.1 server (dual xeon, 3GB ram, etc), I can't vacuum the table it generates, it won't free the 'dead' rows... $ vacuumdb -f -d test -t test -v INFO: vacuuming "public.test" INFO: "test": found 0 removable, 373553 nonremovable row versions in 3492 pages DETAIL: 373533 dead row versions cannot be removed yet. Nonremovable row versions range from 72 to 72 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 146596 bytes. 0 pages are or will become empty, including 0 at the end of the table. 1 pages containing 6956 free bytes are potential move destinations. CPU 0.05s/0.37u sec elapsed 0.41 sec. INFO: index "test_pkey" now contains 373553 row versions in 1305 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "test": moved 0 row versions, truncated 3492 to 3492 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM yet, on an otherwise identical RH Linux 8 system (same java, same app, same postgres version), it works fine. vacuum gets the free space, I don't even have to use -f there are no other connections to this database. It doesn't matter if the app is still running or not (on the RHL8 system, I can vacuum it while the app is running, and the space is freed immediately, and the app returns to full speed, while on the RHEL2.1 system, I can't vacuum it any which ways, app running or not). now, if I STOP the postgres server process, and restart it, THEN I can vacuum it, but thats not acceptible to my actual application, its a 24/7 data extraction process. we *have* to use RHEL2.1 for this app due to some 3rd party vendor crud thats only supported there (had glibc problems on RHEL3 with this crudware which couldn't be resolved by the libc-compatible libraries) box that works... $ uname -a Linux svxeon1 2.4.20-28.8smp #1 SMP Thu Dec 18 12:25:21 EST 2003 i686 i686 i386 GNU/Linux box that don't work... $ uname -a Linux svxeon2.netsys.stsv.seagate.com 2.4.9-e.43enterprise #1 SMP Tue Jun 29 13:57:26 EDT 2004 i686 unknown And, finally, here's the schema for the test case that shows this problem... CREATE TABLE test ( subject varchar(128) NOT NULL, seq int4 NOT NULL, reset_time varchar(32) NOT NULL, CONSTRAINT test_pkey PRIMARY KEY (subject) ) WITH OIDS; and, here's the java app (compiled and run with Sun JRE 1.4.2_something).. (invoked with command: java -cp . PGTest -r 20 -l 50 -c 4000 //package JavaDBI; //import Flags; import java.io.*; import java.sql.*; public class PGTest { public static void main(String[] args) { int rowCnt = 20, loopCnt = 50, cycle = 4000, delay = 0; try { Connection con_; String driver_; Class.forName( "org.postgresql.Driver" ); con_ = DriverManager.getConnection( "jdbc:postgresql://localhost:5432/test", "test", "test"); PreparedStatement tstStat, updStmt, insStmt, delStmt; tstStat = con_.prepareStatement("SELECT subject, seq, reset_time FROM test"); insStmt = con_.prepareStatement("INSERT INTO test VALUES (?,?,?)"); updStmt = con_.prepareStatement("UPDATE test SET seq = ?, reset_time = ? WHERE subject = ?"); delStmt = con_.prepareStatement("DELETE FROM test"); ResultSet rs1_; Timestamp ts1; java.util.Date dt1 , dt2, st1, st2; st1 = new java.util.Date(); dt1 = new java.util.Date(); ts1 = new Timestamp(dt1.getTime() ); rs1_ = tstStat.executeQuery(); System.out.println( "PGTest App" ); /* while (rs1_.next()) { System.out.println("-row read-"); System.out.println("control_id:"+rs1_.getString(1)); } rs1_.close(); System.out.println( "Read from db"); */ delStmt.execute(); char tmp = 'A'; String[] rowS = new String[rowCnt]; for (int j = 0; j < rowCnt; j++ ) { rowS[j] = Character.toString(tmp); insStmt.setString(1, rowS[j]); insStmt.setInt(2,0); insStmt.setString(3, dt1.toString()); insStmt.execute(); tmp++; } con_.commit(); for (int h = 0; h < cycle; h++ ) { dt1 = new java.util.Date(); for (int j = 0; j < loopCnt; j++ ) { for (int i = 0; i < rowCnt; i++ ) { updStmt.setInt(1, i); updStmt.setString(2, dt1.toString() ); updStmt.setString(3, rowS[i]); updStmt.execute(); con_.commit(); //System.out.println( i ); if (delay > 0) { try { Thread.sleep(delay); // 100 ms sleep } catch (Exception ex) { System.out.println("Waiting"); //blah } } } } dt2 = new java.util.Date(); System.out.println( "Processing took:" + (dt2.getTime() - dt1.getTime()) + " for " + (rowCnt*loopCnt) + " rows"); } con_.commit(); st2 = new java.util.Date(); System.out.println( "Total time:" + (st2.getTime() - st1.getTime()) ); } catch (Exception ex) { System.out.println( "Error WR" ); System.out.println( ex.toString() ); } } } // end PGTest
"John R Pierce" <pierce@hogranch.com> writes: > Got something really odd happening here. > Simple test program, in java, with jdbc, postgres 7.4.5, on a redhat linux > system... app does a heavy loop of a prepared UPDATE, then Commit, 10000s > of times. the table has a few columns, nothing fancy at all. On our > Redhat Enterprise 2.1 server (dual xeon, 3GB ram, etc), I can't vacuum the > table it generates, it won't free the 'dead' rows... You've got some background client holding a transaction open. Or else the test program isn't really committing when you think it is. regards, tom lane
> "John R Pierce" <pierce@hogranch.com> writes: >> Got something really odd happening here. >> Simple test program, in java, with jdbc, postgres 7.4.5, on a redhat >> linux >> system... app does a heavy loop of a prepared UPDATE, then Commit, >> 10000s >> of times. the table has a few columns, nothing fancy at all. On our >> Redhat Enterprise 2.1 server (dual xeon, 3GB ram, etc), I can't vacuum >> the >> table it generates, it won't free the 'dead' rows... > > You've got some background client holding a transaction open. Or else > the test program isn't really committing when you think it is. there are no background programs. I've done all the usual checking of `ps' outputs looking for such. in the test case I mailed to this list, I had created a standalone database with one table, and run the test program directly against it. *HOWEVER*... About an hour after mailing that, I realized that the buggy RHEL2.1 system was running with Intel Hyperthreading enabled (so it appeared as 4 CPUs) while the no-problems RH8.0 system had hyperthreading enabled (we'd previously been benchmarking some multithreaded stuff both ways). So, its *just* possible that the earlier RHEL2.1 (kernel 2.4.9-e.43enterprise) had issues which the later RH8 (2.4.20-28.8smp) were resolved. I'll not be able to test this hypothesis until monday morning.
"John R Pierce" <pierce@hogranch.com> writes: >> You've got some background client holding a transaction open. Or else >> the test program isn't really committing when you think it is. > there are no background programs. I've done all the usual checking of `ps' > outputs looking for such. > in the test case I mailed to this list, I had created a standalone database > with one table, and run the test program directly against it. That sounds suspiciously like you think that only background clients connected to the same database count. Actually, any other client connected to the same *postmaster* can look like an open transaction to VACUUM ... regards, tom lane
John R Pierce wrote: > *HOWEVER*... > > About an hour after mailing that, I realized that the buggy RHEL2.1 > system was running with Intel Hyperthreading enabled (so it appeared as > 4 CPUs) while the no-problems RH8.0 system had hyperthreading enabled > (we'd previously been benchmarking some multithreaded stuff both > ways). So, its *just* possible that the earlier RHEL2.1 (kernel > 2.4.9-e.43enterprise) had issues which the later RH8 (2.4.20-28.8smp) > were resolved. I'll not be able to test this hypothesis until monday > morning. I run RHEL2.1 on the same your HW with HT enabled and I never had a problem with that. Regards Gaetano Mendola
>> About an hour after mailing that, I realized that the buggy RHEL2.1 >> system was running with Intel Hyperthreading enabled (so it appeared as 4 >> CPUs) while the no-problems RH8.0 system had hyperthreading enabled (we'd >> previously been benchmarking some multithreaded stuff both ways). So, >> its *just* possible that the earlier RHEL2.1 (kernel >> 2.4.9-e.43enterprise) had issues which the later RH8 (2.4.20-28.8smp) >> were resolved. I'll not be able to test this hypothesis until monday >> morning. > > I run RHEL2.1 on the same your HW with HT enabled and I never had a > problem > with that. are you running PGSql 7.4.x ? (we have this same issue with 7.4.2 and 7.4.5). fyi, my RHEL2.1 has all available up2date fixes, and I did *NOT* install RH's postgres RPM, rather, I fetched and built 7.4.5 from source could you perhaps give my test program a try? create a database 'test', a user 'test' with password 'test', and table 'test' as described in my original message, and compile and run this program: http://www.hogranch.com/files/PGTest.java compile the java program with Sun J2SE JDK 1.4.x.. $ javac PGTest.java Temporarily disable any autovacuum cronjobs you may have, and invoke the program with $ java -cp . PGTest -r 20 -l 50 -c 4000 (postgresql.jar has to be in your CLASSPATH, or you can specify it as -cp .:/path/postgresql.jar .. I'm using the one that compiled with 7.4.5 and my jdk 1.4.x) and let it run for several hours. On our dual 2.8Ghz Xeon (Intel SE7501WV2 server), it starts out around 13000mS per thousand inserts, after a hour it slows to 20000, and a few more hours later, its up to 40000, thats when I try a `vacuumdb -v -d test` (or any other combination of vacuum options that include this table in this database). It doesn't matter if the app is running or not. I am unable to vacuum the freespace in this table until I stop and restart the postgresql service. put your autovacuum back online when you're done messing. I can not reproduce this problem on a RH8 (2.4.20.x) system
John R Pierce wrote: >>> About an hour after mailing that, I realized that the buggy RHEL2.1 >>> system was running with Intel Hyperthreading enabled (so it appeared >>> as 4 CPUs) while the no-problems RH8.0 system had hyperthreading >>> enabled (we'd previously been benchmarking some multithreaded stuff >>> both ways). So, its *just* possible that the earlier RHEL2.1 >>> (kernel 2.4.9-e.43enterprise) had issues which the later RH8 >>> (2.4.20-28.8smp) were resolved. I'll not be able to test this >>> hypothesis until monday morning. >> >> >> I run RHEL2.1 on the same your HW with HT enabled and I never had a >> problem >> with that. > > > are you running PGSql 7.4.x ? (we have this same issue with 7.4.2 and > 7.4.5). fyi, my RHEL2.1 has all available up2date fixes, and I did > *NOT* install RH's postgres RPM, rather, I fetched and built 7.4.5 from > source I was running 7.4.2 and since 3 weeks we are running a 7.4.5 Suddenly after the upgrade an ill coded servlets ( in place long time ) started to drive Postgres to PANIC; now the servlet is fixed and the problem disappeared. See my post: "invalid string enlargement". Tom Lane also was able to reproduce it and I bet already he fixed the problem. > could you perhaps give my test program a try? create a database > 'test', a user 'test' with password 'test', and table 'test' as > described in my original message, and compile and run this program: > > http://www.hogranch.com/files/PGTest.java > > compile the java program with Sun J2SE JDK 1.4.x.. > > $ javac PGTest.java Unfortunatelly I can not try it for two reasons: the machine is a production machine and I can not use it for experiments and the other reason is that our Java reference is the 1.3.x :-( > (postgresql.jar has to be in your CLASSPATH, or you can specify it as > -cp .:/path/postgresql.jar .. I'm using the one that compiled with 7.4.5 > and my jdk 1.4.x) Which one ? I hope you are using pg74.213.jdbc3.jar > I am unable to vacuum the freespace in this table until I stop and > restart the postgresql service. put your autovacuum back online when > you're done messing. I can not reproduce this problem on a RH8 > (2.4.20.x) system Do you mean that is not enough stop your java application in order to retrieve the space with a vacuum ? Do you have some connection in status: "Idle in transaction" ? Regards Gaetano Mendola
>> there are no background programs. I've done all the usual checking of >> `ps' >> outputs looking for such. > >> in the test case I mailed to this list, I had created a standalone >> database >> with one table, and run the test program directly against it. > > That sounds suspiciously like you think that only background clients > connected to the same database count. Actually, any other client > connected to the same *postmaster* can look like an open transaction > to VACUUM ... indeed, I did think this. A completely different development project on the same server was in fact leaving a postmaster 'idle in transaction'... I didn't realize this would matter. I had that developer kill his 'idle in transaction' process, and voila, I can vacuum my table now on the fly. I told him to fix his logic to STOP doing that, he's a SQL novice. so, my huge apologies for wasting the buglist's time on this.