Thread: Vacuum Question
I have a Postgres application that must run 24x7. If postgres needs to be vacuumed periodically, must I take the application offline completely, or is it enough to disallow write (INSERT/UPDATE) access while allowing read access? I hope it is the latter, as I have a large amount of data and there are transactions going on all the time. Brian -- Brian Baquiran <brianb@edsamail.com> http://www.baquiran.com/ AIM: bbaquiran
[ Charset ISO-8859-1 unsupported, converting... ] > > I have a Postgres application that must run 24x7. If postgres needs to be > vacuumed periodically, must I take the application offline completely, or > is it enough to disallow write (INSERT/UPDATE) access while allowing read > access? > > I hope it is the latter, as I have a large amount of data and there are > transactions going on all the time. Applications will hang until the vacuum is completed. No changes required. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
"brianb" <brianb-pggeneral@evoserve.com> writes: > I have a Postgres application that must run 24x7. If postgres needs to be > vacuumed periodically, must I take the application offline completely, or > is it enough to disallow write (INSERT/UPDATE) access while allowing read > access? You're wasting your time to think of allowing or disallowing access; VACUUM is quite capable of locking out other accesses without help ;-) The short answer is that a table being vacuumed is locked against any other access, read or write; but only that one table. You might as well leave the rest of the database open for use. regards, tom lane
Tom Lane wrote: > > "brianb" <brianb-pggeneral@evoserve.com> writes: > > I have a Postgres application that must run 24x7. If postgres needs to be > > vacuumed periodically, must I take the application offline completely, or > > is it enough to disallow write (INSERT/UPDATE) access while allowing read > > access? > > The short answer is that a table being vacuumed is locked against any > other access, read or write; but only that one table. You might as > well leave the rest of the database open for use. I take it you don't give much credence to the previous reports that concurrently writing during vacuuming caused corruption? Was that a previous problem that's fixed now? Regards, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: >> The short answer is that a table being vacuumed is locked against any >> other access, read or write; but only that one table. You might as >> well leave the rest of the database open for use. > I take it you don't give much credence to the previous reports that > concurrently writing during vacuuming caused corruption? No, not really, for the simple reason that there aren't going to *be* any concurrent writes (or reads either) during a vacuum ... unless our table locking system fails, in which case we've got lots worse problems than vacuum. There are scenarios involving known (and fixed!) bugs that might be stressed by vacuum running concurrently with operations on other tables. I'm thinking particularly of shared cache invalidation: a vacuum on a system table will spit out a lot of SI messages for the tuples it moves around, and it used to be that we had bugs that could be triggered by SI buffer overflow, or more accurately by the cache flush that's used to recover from SI overflow. (Those bugs seem to be fixed in 7.0. I've successfully run parallel regression tests with the SI buffer reduced to a mere dozen messages or so. It spits out an impressive number of "SI buffer overflow" notices, but keeps going...) If you are feeling conservative you might want to vacuum just selected user tables during normal usage, and do a database-wide vacuum only at low load --- if there are any low-probability bugs still lurking in there, they're almost surely things that would be triggered by system table vacuums and not user-table vacuums. That'd probably be a good idea for performance reasons anyway, since locking a system table is much more likely to block other backends than locking any one user table. regards, tom lane
Tom Lane wrote: > > Ed Loehr <eloehr@austin.rr.com> writes: > >> The short answer is that a table being vacuumed is locked against any > >> other access, read or write; but only that one table. You might as > >> well leave the rest of the database open for use. > > > I take it you don't give much credence to the previous reports that > > concurrently writing during vacuuming caused corruption? > > No, not really, for the simple reason that there aren't going to *be* > any concurrent writes (or reads either) during a vacuum ... unless our > table locking system fails, in which case we've got lots worse problems > than vacuum. On my way to some performance testing I ran into this repeatable crash scenario. Just wondering if anyone else can reproduce the same or explain what is happening. Could very well be something peculiar/buggy about the bash/perl commands... First, start up this rather obfuscated perl/bash command which just inserts random records forever... % createdb pdb; psql -d pdb -c "create table foo(id serial, t timestamp not null default current_timestamp, f float, s varchar, i integer, l int8);"; perl -e 'for (my $j = 0; $j < 10000000; $j++ ) { print "-- Insert #$j\n"; $f = rand(2000000); $i = int(rand(2000000)) % 16000; $l = int(rand(2000000000)); $s = chr(65 + int(rand(20))) x int(rand(128)); $delim = chr(39); print "insert into foo (f,s,i,l) values ($f, $delim${s}$delim, $i, $l);\n"; }' | psql -d pdb Then, start this one in another bash window/terminal/whatever... % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze; select count(*) from foo;"; sleep 3; done This seems to consistently crash after the first vacuum with the following message: NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. connection to server was lost Regards, Ed Loehr
Ed Loehr wrote: > > Tom Lane wrote: > > > > Ed Loehr <eloehr@austin.rr.com> writes: > > >> The short answer is that a table being vacuumed is locked against any > > >> other access, read or write; but only that one table. You might as > > >> well leave the rest of the database open for use. > > > > > I take it you don't give much credence to the previous reports that > > > concurrently writing during vacuuming caused corruption? > > > > No, not really, for the simple reason that there aren't going to *be* > > any concurrent writes (or reads either) during a vacuum ... unless our > > table locking system fails, in which case we've got lots worse problems > > than vacuum. > > On my way to some performance testing I ran into this repeatable crash > scenario. Just wondering if anyone else can reproduce the same or > explain what is happening. Could very well be something peculiar/buggy > about the bash/perl commands... > > First, start up this rather obfuscated perl/bash command which just > inserts random records forever... > > % createdb pdb; psql -d pdb -c "create table foo(id serial, t timestamp > not null default current_timestamp, f float, s varchar, i integer, l > int8);"; perl -e 'for (my $j = 0; $j < 10000000; $j++ ) { print "-- > Insert #$j\n"; $f = rand(2000000); $i = int(rand(2000000)) % 16000; $l = > int(rand(2000000000)); $s = chr(65 + int(rand(20))) x int(rand(128)); > $delim = chr(39); print "insert into foo (f,s,i,l) values ($f, > $delim${s}$delim, $i, $l);\n"; }' | psql -d pdb Sorry...word-wrap killed that one's cut-n-paste-ability...try this if you're interested: createdb pdb; psql -d pdb -c "create table foo( id serial, t timestamp not null default current_timestamp, f float, s varchar, i integer, l int8);"; perl -e 'for (my $j = 0; $j < 10000000; $j++ ) { print "-- Insert #$j\n"; $f = rand(2000000); $i = int(rand(2000000)) % 16000; $l = int(rand(2000000000)); $s = chr(65 + int(rand(20))) x int(rand(128)); $delim = chr(39); print "insert into foo (f,s,i,l) values \n"; print "($f, $delim${s}$delim, $i, $l);\n"; }' | psql -d pdb > > Then, start this one in another bash window/terminal/whatever... > > % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze; > select count(*) from foo;"; sleep 3; done > > This seems to consistently crash after the first vacuum with the > following message: > > NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend died > abnormally and possibly corrupted shared memory. > I have rolled back the current transaction and am going to > terminate your database system connection and exit. > Please reconnect to the database system and repeat your query. > pqReadData() -- backend closed the channel unexpectedly. > This probably means the backend terminated abnormally > before or while processing the request. > connection to server was lost > > Regards, > Ed Loehr
Ed Loehr wrote: > > On my way to some performance testing I ran into this repeatable crash > scenario. Just wondering if anyone else can reproduce the same or > explain what is happening. Could very well be something peculiar/buggy > about the bash/perl commands... > > First, start up this rather obfuscated perl/bash command which just > inserts random records forever... > > % createdb pdb; psql -d pdb -c "create table foo(id serial, t timestamp > not null default current_timestamp, f float, s varchar, i integer, l > int8);"; perl -e 'for (my $j = 0; $j < 10000000; $j++ ) { print "-- > Insert #$j\n"; $f = rand(2000000); $i = int(rand(2000000)) % 16000; $l = > int(rand(2000000000)); $s = chr(65 + int(rand(20))) x int(rand(128)); > $delim = chr(39); print "insert into foo (f,s,i,l) values ($f, > $delim${s}$delim, $i, $l);\n"; }' | psql -d pdb > > Then, start this one in another bash window/terminal/whatever... > > % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze; > select count(*) from foo;"; sleep 3; done > > This seems to consistently crash after the first vacuum ... Oh, and the possibly relevant snippet from the server log: 20000606.01:13:27.116 [8659] query: insert into foo (f,s,i,l) values (941636.60146296, 'HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH', 7223, 754988800); 20000606.01:13:27.164 [8679] DEBUG: --Relation pg_aggregate-- 20000606.01:13:27.164 [8679] DEBUG: Pages 1: Changed 0, reaped 0, Empty 0, New 0; Tup 45: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 100, MaxLen 107; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. 20000606.01:13:27.165 [8679] DEBUG: Index pg_aggregate_name_type_index: Pages 2; Tuples 45. CPU 0.00s/0.00u sec. 20000606.01:13:27.172 [8679] DEBUG: --Relation pg_ipl-- 20000606.01:13:27.172 [8679] DEBUG: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-u sing: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. 20000606.01:13:27.175 [8679] DEBUG: --Relation pg_inheritproc-- 20000606.01:13:27.176 [8679] DEBUG: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-u sing: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. 20000606.01:13:27.180 [8679] DEBUG: --Relation pg_rewrite-- 20000606.01:13:27.184 [8679] DEBUG: Pages 1: Changed 0, reaped 0, Empty 0, New 0; Tup 5: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 903, MaxLen 1236; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. 20000606.01:13:27.186 [8679] DEBUG: Index pg_rewrite_rulename_index: Pages 2; Tuples 5. CPU 0.00s/0.00u sec. 20000606.01:13:27.187 [8679] DEBUG: Index pg_rewrite_oid_index: Pages 2; Tuples 5. CPU 0.00s/0.00u sec. 20000606.01:13:27.188 [8659] ERROR: Function '<FF><FF><FF><FF>^Ap(int4)' does not exist Unable to identify a function that satisfies the given argument types You may need to add explicit typecasts One day maybe I'll learn to include all the relevant stuff in one post...nah. Regards, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: >> Then, start this one in another bash window/terminal/whatever... >> >> % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze; >> select count(*) from foo;"; sleep 3; done >> >> This seems to consistently crash after the first vacuum with the >> following message: This is a known gotcha that's got nothing to do with any sort of concurrency. You can't safely send a VACUUM followed by anything else in a single query string. The problem is that VACUUM forces a transaction commit, which releases all transiently allocated memory in its backend ... including the already-parsed querytrees for the rest of the query string. Oops. (cf. comment near line 560 in src/backend/tcop/postgres.c) You won't see the problem if you enter "vacuum analyze; select ..." interactively or as a script in psql, because it chops up the commands into separate query submittals. But apparently psql doesn't chop up a -c string. Non-psql frontends can expose the bug as well. It's possible that this will get cleaned up as a byproduct of the proposed rework of transaction-local memory contexts. But it's not a real high-priority problem, at least not IMHO. For now, the answer is "if it hurts, don't do it ;-)" regards, tom lane
Tom Lane wrote: > > Ed Loehr <eloehr@austin.rr.com> writes: > >> Then, start this one in another bash window/terminal/whatever... > >> > >> % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze; > >> select count(*) from foo;"; sleep 3; done > >> > >> This seems to consistently crash after the first vacuum with the > >> following message: > > This is a known gotcha that's got nothing to do with any sort of > concurrency. You can't safely send a VACUUM followed by anything > else in a single query string. Well, I thought that select count(*) might've been causing a problem, so I experimented without it and found the same problem. Doesn't seem to happen with 'vacuum'or 'vacuum analyze foo', only 'vacuum analyze'... Regards, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: >>>>> % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze; >>>>> select count(*) from foo;"; sleep 3; done >>>>> >>>>> This seems to consistently crash after the first vacuum with the >>>>> following message: >> >> This is a known gotcha that's got nothing to do with any sort of >> concurrency. You can't safely send a VACUUM followed by anything >> else in a single query string. > Well, I thought that select count(*) might've been causing a problem, so > I experimented without it and found the same problem. Doesn't seem to > happen with 'vacuum'or 'vacuum analyze foo', only 'vacuum analyze'... I can't reproduce any problem with just a "vacuum" (with or without analyze) and no following command. I did, however, notice that very occasionally the inserting process would spit out weird error messages like "Function '(int4)' does not exist" and "init_fcache: null probin for procedure 481". This seems to be due to VACUUM (on system tables) causing syscache entries to be flushed at unexpected times. I've committed patches for the two cases I observed, but there may be more lurking... regards, tom lane
Tom Lane wrote: > > Ed Loehr <eloehr@austin.rr.com> writes: > >>>>> % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze; > >>>>> select count(*) from foo;"; sleep 3; done > >>>>> > >>>>> This seems to consistently crash after the first vacuum with the > >>>>> following message: > >> > >> This is a known gotcha that's got nothing to do with any sort of > >> concurrency. You can't safely send a VACUUM followed by anything > >> else in a single query string. > > > Well, I thought that select count(*) might've been causing a problem, so > > I experimented without it and found the same problem. Doesn't seem to > > happen with 'vacuum'or 'vacuum analyze foo', only 'vacuum analyze'... > > I can't reproduce any problem with just a "vacuum" (with or without > analyze) and no following command. > > I did, however, notice that very occasionally the inserting process > would spit out weird error messages like "Function '(int4)' does not > exist" and null probin for procedure 481". This seems > to be due to VACUUM (on system tables) causing syscache entries to be > flushed at unexpected times. I've committed patches for the two cases > I observed, but there may be more lurking... Yes, I was getting a similar Function error message, though I never saw the init_fcache message. And it appeared the backend would crash coincidentally with the function error message. If the patch can be applied to 7.0 (beta3), and you don't mind posting it, I could test it here... Regards, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: > Tom Lane wrote: >> I can't reproduce any problem with just a "vacuum" (with or without >> analyze) and no following command. >> >> I did, however, notice that very occasionally the inserting process >> would spit out weird error messages like "Function '(int4)' does not >> exist" and null probin for procedure 481". This seems >> to be due to VACUUM (on system tables) causing syscache entries to be >> flushed at unexpected times. I've committed patches for the two cases >> I observed, but there may be more lurking... > Yes, I was getting a similar Function error message, though I never saw > the init_fcache message. And it appeared the backend would crash > coincidentally with the function error message. If the patch can be > applied to 7.0 (beta3), and you don't mind posting it, I could test it > here... Hmm, I only saw error messages, no crashes --- but I suppose a crash is possible, since the root of the problem here is a dangling pointer. Patches for 7.0.2 are attached. Not sure if they will apply perfectly cleanly to beta3, but you should be able to make the right mods by hand if patch doesn't cope... regards, tom lane *** src/backend/parser/parse_type.c.orig Tue May 30 00:24:49 2000 --- src/backend/parser/parse_type.c Tue Jun 6 11:41:08 2000 *************** *** 48,54 **** return NULL; } typetuple = (Form_pg_type) GETSTRUCT(tup); ! return NameStr(typetuple->typname); } /* return a Type structure, given a type id */ --- 48,55 ---- return NULL; } typetuple = (Form_pg_type) GETSTRUCT(tup); ! /* pstrdup here because result may need to outlive the syscache entry */ ! return pstrdup(NameStr(typetuple->typname)); } /* return a Type structure, given a type id */ *************** *** 119,125 **** Form_pg_type typ; typ = (Form_pg_type) GETSTRUCT(t); ! return NameStr(typ->typname); } /* given a type, return its typetype ('c' for 'c'atalog types) */ --- 120,127 ---- Form_pg_type typ; typ = (Form_pg_type) GETSTRUCT(t); ! /* pstrdup here because result may need to outlive the syscache entry */ ! return pstrdup(NameStr(typ->typname)); } /* given a type, return its typetype ('c' for 'c'atalog types) */ *** src/backend/utils/cache/fcache.c~ Wed Apr 12 13:15:53 2000 --- src/backend/utils/cache/fcache.c Tue Jun 6 13:39:03 2000 *************** *** 14,19 **** --- 14,20 ---- */ #include "postgres.h" + #include "access/heapam.h" #include "catalog/pg_language.h" #include "catalog/pg_proc.h" #include "catalog/pg_type.h" *************** *** 89,97 **** if (!use_syscache) elog(ERROR, "what the ????, init the fcache without the catalogs?"); ! procedureTuple = SearchSysCacheTuple(PROCOID, ! ObjectIdGetDatum(foid), ! 0, 0, 0); if (!HeapTupleIsValid(procedureTuple)) elog(ERROR, "init_fcache: Cache lookup failed for procedure %u", --- 90,98 ---- if (!use_syscache) elog(ERROR, "what the ????, init the fcache without the catalogs?"); ! procedureTuple = SearchSysCacheTupleCopy(PROCOID, ! ObjectIdGetDatum(foid), ! 0, 0, 0); if (!HeapTupleIsValid(procedureTuple)) elog(ERROR, "init_fcache: Cache lookup failed for procedure %u", *************** *** 258,263 **** --- 259,266 ---- } else retval->func.fn_addr = (func_ptr) NULL; + + heap_freetuple(procedureTuple); return retval; }