Thread: Vacuum Question

Vacuum Question

From

"brianb"

Date:

02 June 2000, 00:58:28

I have a Postgres application that must run 24x7. If postgres needs to be
vacuumed periodically, must I take the application offline completely, or
is it enough to disallow write (INSERT/UPDATE) access while allowing read
access?

I hope it is the latter, as I have a large amount of data and there are
transactions going on all the time.

Brian
--
Brian Baquiran <brianb@edsamail.com>
http://www.baquiran.com/
AIM: bbaquiran

Re: Vacuum Question

From

Bruce Momjian

Date:

02 June 2000, 02:21:19

[ Charset ISO-8859-1 unsupported, converting... ]
>
> I have a Postgres application that must run 24x7. If postgres needs to be
> vacuumed periodically, must I take the application offline completely, or
> is it enough to disallow write (INSERT/UPDATE) access while allowing read
> access?
>
> I hope it is the latter, as I have a large amount of data and there are
> transactions going on all the time.

Applications will hang until the vacuum is completed.  No changes
required.


--
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Vacuum Question

From

Tom Lane

Date:

02 June 2000, 02:42:05

"brianb" <brianb-pggeneral@evoserve.com> writes:
> I have a Postgres application that must run 24x7. If postgres needs to be
> vacuumed periodically, must I take the application offline completely, or
> is it enough to disallow write (INSERT/UPDATE) access while allowing read
> access?

You're wasting your time to think of allowing or disallowing access;
VACUUM is quite capable of locking out other accesses without help ;-)

The short answer is that a table being vacuumed is locked against any
other access, read or write; but only that one table.  You might as
well leave the rest of the database open for use.

            regards, tom lane

Re: Vacuum Question

From

Ed Loehr

Date:

02 June 2000, 10:51:15

Tom Lane wrote:
>
> "brianb" <brianb-pggeneral@evoserve.com> writes:
> > I have a Postgres application that must run 24x7. If postgres needs to be
> > vacuumed periodically, must I take the application offline completely, or
> > is it enough to disallow write (INSERT/UPDATE) access while allowing read
> > access?
>
> The short answer is that a table being vacuumed is locked against any
> other access, read or write; but only that one table.  You might as
> well leave the rest of the database open for use.

I take it you don't give much credence to the previous reports that
concurrently writing during vacuuming caused corruption?  Was that a
previous problem that's fixed now?

Regards,
Ed Loehr

Re: Vacuum Question

From

Tom Lane

Date:

02 June 2000, 12:00:22

Ed Loehr <eloehr@austin.rr.com> writes:
>> The short answer is that a table being vacuumed is locked against any
>> other access, read or write; but only that one table.  You might as
>> well leave the rest of the database open for use.

> I take it you don't give much credence to the previous reports that
> concurrently writing during vacuuming caused corruption?

No, not really, for the simple reason that there aren't going to *be*
any concurrent writes (or reads either) during a vacuum ... unless our
table locking system fails, in which case we've got lots worse problems
than vacuum.

There are scenarios involving known (and fixed!) bugs that might be
stressed by vacuum running concurrently with operations on other tables.
I'm thinking particularly of shared cache invalidation: a vacuum on a
system table will spit out a lot of SI messages for the tuples it moves
around, and it used to be that we had bugs that could be triggered by
SI buffer overflow, or more accurately by the cache flush that's used to
recover from SI overflow.  (Those bugs seem to be fixed in 7.0.  I've
successfully run parallel regression tests with the SI buffer reduced to
a mere dozen messages or so.  It spits out an impressive number of "SI
buffer overflow" notices, but keeps going...)

If you are feeling conservative you might want to vacuum just selected
user tables during normal usage, and do a database-wide vacuum only at
low load --- if there are any low-probability bugs still lurking in
there, they're almost surely things that would be triggered by system
table vacuums and not user-table vacuums.  That'd probably be a good
idea for performance reasons anyway, since locking a system table is
much more likely to block other backends than locking any one user
table.

            regards, tom lane

Re: Vacuum Question

From

Ed Loehr

Date:

06 June 2000, 02:00:18

Tom Lane wrote:
>
> Ed Loehr <eloehr@austin.rr.com> writes:
> >> The short answer is that a table being vacuumed is locked against any
> >> other access, read or write; but only that one table.  You might as
> >> well leave the rest of the database open for use.
>
> > I take it you don't give much credence to the previous reports that
> > concurrently writing during vacuuming caused corruption?
>
> No, not really, for the simple reason that there aren't going to *be*
> any concurrent writes (or reads either) during a vacuum ... unless our
> table locking system fails, in which case we've got lots worse problems
> than vacuum.

On my way to some performance testing I ran into this repeatable crash
scenario.  Just wondering if anyone else can reproduce the same or
explain what is happening.  Could very well be something peculiar/buggy
about the bash/perl commands...

First, start up this rather obfuscated perl/bash command which just
inserts random records forever...

%  createdb pdb; psql -d pdb -c "create table foo(id serial, t timestamp
not null default current_timestamp, f float, s varchar, i integer, l
int8);"; perl -e 'for (my $j = 0; $j < 10000000; $j++ ) { print "--
Insert #$j\n"; $f = rand(2000000); $i = int(rand(2000000)) % 16000; $l =
int(rand(2000000000)); $s = chr(65 + int(rand(20))) x int(rand(128));
$delim = chr(39); print "insert into foo (f,s,i,l) values ($f,
$delim${s}$delim, $i, $l);\n"; }' | psql -d pdb

Then, start this one in another bash window/terminal/whatever...

% while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze;
select count(*) from foo;"; sleep 3; done

This seems to consistently crash after the first vacuum with the
following message:

NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend died
abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am going to
terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
pqReadData() -- backend closed the channel unexpectedly.
        This probably means the backend terminated abnormally
        before or while processing the request.
connection to server was lost


Regards,
Ed Loehr

Re: Vacuum Question

From

Ed Loehr

Date:

06 June 2000, 02:06:33

Ed Loehr wrote:
>
> Tom Lane wrote:
> >
> > Ed Loehr <eloehr@austin.rr.com> writes:
> > >> The short answer is that a table being vacuumed is locked against any
> > >> other access, read or write; but only that one table.  You might as
> > >> well leave the rest of the database open for use.
> >
> > > I take it you don't give much credence to the previous reports that
> > > concurrently writing during vacuuming caused corruption?
> >
> > No, not really, for the simple reason that there aren't going to *be*
> > any concurrent writes (or reads either) during a vacuum ... unless our
> > table locking system fails, in which case we've got lots worse problems
> > than vacuum.
>
> On my way to some performance testing I ran into this repeatable crash
> scenario.  Just wondering if anyone else can reproduce the same or
> explain what is happening.  Could very well be something peculiar/buggy
> about the bash/perl commands...
>
> First, start up this rather obfuscated perl/bash command which just
> inserts random records forever...
>
> %  createdb pdb; psql -d pdb -c "create table foo(id serial, t timestamp
> not null default current_timestamp, f float, s varchar, i integer, l
> int8);"; perl -e 'for (my $j = 0; $j < 10000000; $j++ ) { print "--
> Insert #$j\n"; $f = rand(2000000); $i = int(rand(2000000)) % 16000; $l =
> int(rand(2000000000)); $s = chr(65 + int(rand(20))) x int(rand(128));
> $delim = chr(39); print "insert into foo (f,s,i,l) values ($f,
> $delim${s}$delim, $i, $l);\n"; }' | psql -d pdb

Sorry...word-wrap killed that one's cut-n-paste-ability...try this if
you're interested:

createdb pdb;
psql -d pdb -c "create table foo(
    id serial,
    t timestamp not null default current_timestamp,
    f float,
    s varchar,
    i integer,
    l int8);";
perl -e 'for (my $j = 0; $j < 10000000; $j++ ) {
    print "-- Insert #$j\n";
    $f = rand(2000000);
    $i = int(rand(2000000)) % 16000;
    $l = int(rand(2000000000));
    $s = chr(65 + int(rand(20))) x int(rand(128));
    $delim = chr(39);
    print "insert into foo (f,s,i,l) values \n";
    print "($f, $delim${s}$delim, $i, $l);\n"; }' | psql -d pdb
>
> Then, start this one in another bash window/terminal/whatever...
>
> % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze;
> select count(*) from foo;"; sleep 3; done
>
> This seems to consistently crash after the first vacuum with the
> following message:
>
> NOTICE:  Message from PostgreSQL backend:
>         The Postmaster has informed me that some other backend died
> abnormally and possibly corrupted shared memory.
>         I have rolled back the current transaction and am going to
> terminate your database system connection and exit.
>         Please reconnect to the database system and repeat your query.
> pqReadData() -- backend closed the channel unexpectedly.
>         This probably means the backend terminated abnormally
>         before or while processing the request.
> connection to server was lost
>
> Regards,
> Ed Loehr

Re: Vacuum Question

From

Ed Loehr

Date:

06 June 2000, 02:12:22

Ed Loehr wrote:
>
> On my way to some performance testing I ran into this repeatable crash
> scenario.  Just wondering if anyone else can reproduce the same or
> explain what is happening.  Could very well be something peculiar/buggy
> about the bash/perl commands...
>
> First, start up this rather obfuscated perl/bash command which just
> inserts random records forever...
>
> %  createdb pdb; psql -d pdb -c "create table foo(id serial, t timestamp
> not null default current_timestamp, f float, s varchar, i integer, l
> int8);"; perl -e 'for (my $j = 0; $j < 10000000; $j++ ) { print "--
> Insert #$j\n"; $f = rand(2000000); $i = int(rand(2000000)) % 16000; $l =
> int(rand(2000000000)); $s = chr(65 + int(rand(20))) x int(rand(128));
> $delim = chr(39); print "insert into foo (f,s,i,l) values ($f,
> $delim${s}$delim, $i, $l);\n"; }' | psql -d pdb
>
> Then, start this one in another bash window/terminal/whatever...
>
> % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze;
> select count(*) from foo;"; sleep 3; done
>
> This seems to consistently crash after the first vacuum ...

Oh, and the possibly relevant snippet from the server log:

20000606.01:13:27.116  [8659] query: insert into foo (f,s,i,l) values
(941636.60146296, 'HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH',
7223, 754988800);
20000606.01:13:27.164  [8679] DEBUG:  --Relation pg_aggregate--
20000606.01:13:27.164  [8679] DEBUG:  Pages 1: Changed 0, reaped 0, Empty
0, New 0; Tup 45: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 100,
MaxLen 107;
 Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU
0.00s/0.00u sec.
20000606.01:13:27.165  [8679] DEBUG:  Index pg_aggregate_name_type_index:
Pages 2; Tuples 45. CPU 0.00s/0.00u sec.
20000606.01:13:27.172  [8679] DEBUG:  --Relation pg_ipl--
20000606.01:13:27.172  [8679] DEBUG:  Pages 0: Changed 0, reaped 0, Empty
0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen
0; Re-u
sing: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u
sec.
20000606.01:13:27.175  [8679] DEBUG:  --Relation pg_inheritproc--
20000606.01:13:27.176  [8679] DEBUG:  Pages 0: Changed 0, reaped 0, Empty
0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen
0; Re-u
sing: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u
sec.
20000606.01:13:27.180  [8679] DEBUG:  --Relation pg_rewrite--
20000606.01:13:27.184  [8679] DEBUG:  Pages 1: Changed 0, reaped 0, Empty
0, New 0; Tup 5: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 903,
MaxLen 1236;
 Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU
0.00s/0.00u sec.
20000606.01:13:27.186  [8679] DEBUG:  Index pg_rewrite_rulename_index:
Pages 2; Tuples 5. CPU 0.00s/0.00u sec.
20000606.01:13:27.187  [8679] DEBUG:  Index pg_rewrite_oid_index: Pages
2; Tuples 5. CPU 0.00s/0.00u sec.
20000606.01:13:27.188  [8659] ERROR:  Function
'<FF><FF><FF><FF>^Ap(int4)' does not exist
        Unable to identify a function that satisfies the given argument
types
        You may need to add explicit typecasts

One day maybe I'll learn to include all the relevant stuff in one
post...nah.

Regards,
Ed Loehr

Re: Vacuum Question

From

Tom Lane

Date:

06 June 2000, 03:19:05

Ed Loehr <eloehr@austin.rr.com> writes:
>> Then, start this one in another bash window/terminal/whatever...
>>
>> % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze;
>> select count(*) from foo;"; sleep 3; done
>>
>> This seems to consistently crash after the first vacuum with the
>> following message:

This is a known gotcha that's got nothing to do with any sort of
concurrency.  You can't safely send a VACUUM followed by anything
else in a single query string.  The problem is that VACUUM forces a
transaction commit, which releases all transiently allocated memory
in its backend ... including the already-parsed querytrees for the
rest of the query string.  Oops.  (cf. comment near line 560 in
src/backend/tcop/postgres.c)

You won't see the problem if you enter "vacuum analyze; select ..."
interactively or as a script in psql, because it chops up the
commands into separate query submittals.  But apparently psql
doesn't chop up a -c string.  Non-psql frontends can expose the bug
as well.

It's possible that this will get cleaned up as a byproduct of the
proposed rework of transaction-local memory contexts.  But it's
not a real high-priority problem, at least not IMHO.  For now,
the answer is "if it hurts, don't do it ;-)"

            regards, tom lane

Re: Vacuum Question

From

Ed Loehr

Date:

06 June 2000, 10:12:11

Tom Lane wrote:
>
> Ed Loehr <eloehr@austin.rr.com> writes:
> >> Then, start this one in another bash window/terminal/whatever...
> >>
> >> % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze;
> >> select count(*) from foo;"; sleep 3; done
> >>
> >> This seems to consistently crash after the first vacuum with the
> >> following message:
>
> This is a known gotcha that's got nothing to do with any sort of
> concurrency.  You can't safely send a VACUUM followed by anything
> else in a single query string.

Well, I thought that select count(*) might've been causing a problem, so
I experimented without it and found the same problem.  Doesn't seem to
happen with 'vacuum'or 'vacuum analyze foo', only 'vacuum analyze'...

Regards,
Ed Loehr

Re: Vacuum Question

From

Tom Lane

Date:

06 June 2000, 13:53:50

Ed Loehr <eloehr@austin.rr.com> writes:
>>>>> % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze;
>>>>> select count(*) from foo;"; sleep 3; done
>>>>>
>>>>> This seems to consistently crash after the first vacuum with the
>>>>> following message:
>>
>> This is a known gotcha that's got nothing to do with any sort of
>> concurrency.  You can't safely send a VACUUM followed by anything
>> else in a single query string.

> Well, I thought that select count(*) might've been causing a problem, so
> I experimented without it and found the same problem.  Doesn't seem to
> happen with 'vacuum'or 'vacuum analyze foo', only 'vacuum analyze'...

I can't reproduce any problem with just a "vacuum" (with or without
analyze) and no following command.

I did, however, notice that very occasionally the inserting process
would spit out weird error messages like "Function '(int4)' does not
exist" and "init_fcache: null probin for procedure 481".  This seems
to be due to VACUUM (on system tables) causing syscache entries to be
flushed at unexpected times.  I've committed patches for the two cases
I observed, but there may be more lurking...

            regards, tom lane

Re: Vacuum Question

From

Ed Loehr

Date:

06 June 2000, 14:15:08

Tom Lane wrote:
>
> Ed Loehr <eloehr@austin.rr.com> writes:
> >>>>> % while test 1; do echo -n "`date`: "; psql -d pdb -c "vacuum analyze;
> >>>>> select count(*) from foo;"; sleep 3; done
> >>>>>
> >>>>> This seems to consistently crash after the first vacuum with the
> >>>>> following message:
> >>
> >> This is a known gotcha that's got nothing to do with any sort of
> >> concurrency.  You can't safely send a VACUUM followed by anything
> >> else in a single query string.
>
> > Well, I thought that select count(*) might've been causing a problem, so
> > I experimented without it and found the same problem.  Doesn't seem to
> > happen with 'vacuum'or 'vacuum analyze foo', only 'vacuum analyze'...
>
> I can't reproduce any problem with just a "vacuum" (with or without
> analyze) and no following command.
>
> I did, however, notice that very occasionally the inserting process
> would spit out weird error messages like "Function '(int4)' does not
> exist" and  null probin for procedure 481".  This seems
> to be due to VACUUM (on system tables) causing syscache entries to be
> flushed at unexpected times.  I've committed patches for the two cases
> I observed, but there may be more lurking...

Yes, I was getting a similar Function error message, though I never saw
the init_fcache message.  And it appeared the backend would crash
coincidentally with the function error message.  If the patch can be
applied to 7.0 (beta3), and you don't mind posting it, I could test it
here...

Regards,
Ed Loehr

Re: Vacuum Question

From

Tom Lane

Date:

06 June 2000, 17:25:12

Ed Loehr <eloehr@austin.rr.com> writes:
> Tom Lane wrote:
>> I can't reproduce any problem with just a "vacuum" (with or without
>> analyze) and no following command.
>>
>> I did, however, notice that very occasionally the inserting process
>> would spit out weird error messages like "Function '(int4)' does not
>> exist" and  null probin for procedure 481".  This seems
>> to be due to VACUUM (on system tables) causing syscache entries to be
>> flushed at unexpected times.  I've committed patches for the two cases
>> I observed, but there may be more lurking...

> Yes, I was getting a similar Function error message, though I never saw
> the init_fcache message.  And it appeared the backend would crash
> coincidentally with the function error message.  If the patch can be
> applied to 7.0 (beta3), and you don't mind posting it, I could test it
> here...

Hmm, I only saw error messages, no crashes --- but I suppose a crash is
possible, since the root of the problem here is a dangling pointer.

Patches for 7.0.2 are attached.  Not sure if they will apply perfectly
cleanly to beta3, but you should be able to make the right mods by hand
if patch doesn't cope...

            regards, tom lane

*** src/backend/parser/parse_type.c.orig    Tue May 30 00:24:49 2000
--- src/backend/parser/parse_type.c    Tue Jun  6 11:41:08 2000
***************
*** 48,54 ****
          return NULL;
      }
      typetuple = (Form_pg_type) GETSTRUCT(tup);
!     return NameStr(typetuple->typname);
  }

  /* return a Type structure, given a type id */
--- 48,55 ----
          return NULL;
      }
      typetuple = (Form_pg_type) GETSTRUCT(tup);
!     /* pstrdup here because result may need to outlive the syscache entry */
!     return pstrdup(NameStr(typetuple->typname));
  }

  /* return a Type structure, given a type id */
***************
*** 119,125 ****
      Form_pg_type typ;

      typ = (Form_pg_type) GETSTRUCT(t);
!     return NameStr(typ->typname);
  }

  /* given a type, return its typetype ('c' for 'c'atalog types) */
--- 120,127 ----
      Form_pg_type typ;

      typ = (Form_pg_type) GETSTRUCT(t);
!     /* pstrdup here because result may need to outlive the syscache entry */
!     return pstrdup(NameStr(typ->typname));
  }

  /* given a type, return its typetype ('c' for 'c'atalog types) */

*** src/backend/utils/cache/fcache.c~    Wed Apr 12 13:15:53 2000
--- src/backend/utils/cache/fcache.c    Tue Jun  6 13:39:03 2000
***************
*** 14,19 ****
--- 14,20 ----
   */
  #include "postgres.h"

+ #include "access/heapam.h"
  #include "catalog/pg_language.h"
  #include "catalog/pg_proc.h"
  #include "catalog/pg_type.h"
***************
*** 89,97 ****
      if (!use_syscache)
          elog(ERROR, "what the ????, init the fcache without the catalogs?");

!     procedureTuple = SearchSysCacheTuple(PROCOID,
!                                          ObjectIdGetDatum(foid),
!                                          0, 0, 0);

      if (!HeapTupleIsValid(procedureTuple))
          elog(ERROR, "init_fcache: Cache lookup failed for procedure %u",
--- 90,98 ----
      if (!use_syscache)
          elog(ERROR, "what the ????, init the fcache without the catalogs?");

!     procedureTuple = SearchSysCacheTupleCopy(PROCOID,
!                                              ObjectIdGetDatum(foid),
!                                              0, 0, 0);

      if (!HeapTupleIsValid(procedureTuple))
          elog(ERROR, "init_fcache: Cache lookup failed for procedure %u",
***************
*** 258,263 ****
--- 259,266 ----
      }
      else
          retval->func.fn_addr = (func_ptr) NULL;
+
+     heap_freetuple(procedureTuple);

      return retval;
  }