Thread: delete from ..;vacuum crashes

delete from ..;vacuum crashes

From
Tatsuo Ishii
Date:
Here is a report regarding the backend-crash from a user in Japan.
Included shell script should reproduce the phenomenon.
Note that
    select * from getting; vacuum;
does cause a crash, while
    select * from getting;
    vacuum;
not.
--
Tatsuo Ishii
t-ishii@sra.co.jp

========================================================================
#!/bin/sh

DBNAME=ptest

destroydb $DBNAME
createdb $DBNAME
psql -e $DBNAME <<EOF
create table header
(
    host    text    not null,
    port    int    not null,
    path    text    not null,
    file    text    not null,
    extra    text    not null,
    name    text    not null,
    value    text    not null
);
create index header_url_idx on header (host, port, path, file, extra);
create unique index header_uniq_idx on header (host, port, path, file, extra, name);

create table reference
(
    f_url    text    not null,
    t_url    text    not null
);
create index reference_from_idx on reference (f_url);
create index reference_to_idx on reference (t_url);
create unique index reference_uniq_idx on reference (f_url, t_url);

create table extension
(
    ext    text    not null,
    note    text
);
create unique index extension_ext_idx on extension (ext);

create table getting
(
    host    text    not null,
    port    int    not null,
    ip    text    not null,
    when    datetime    not null
);
create unique index getting_ip_idx on getting (ip);
EOF
#psql -c "delete from getting; vacuum;" $DBNAME
psql -c "select * from getting; vacuum;" $DBNAME
#psql -c "delete from getting;" $DBNAME
#psql -c "select * from getting;" $DBNAME
#psql -c "vacuum;" $DBNAME
#psql -c "vacuum; vacuum;" $DBNAME

Re: [HACKERS] delete from ..;vacuum crashes

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> Here is a report regarding the backend-crash from a user in Japan.
> Included shell script should reproduce the phenomenon.

On which postgres version(s)?

> Note that
>     select * from getting; vacuum;
> does cause a crash, while
>     select * from getting;
>     vacuum;
> not.

Specifically I see you are using

> psql -c "select * from getting; vacuum;" $DBNAME

rather than entering the commands at the psql prompt.  The -c option
works differently from entering multiple commands at psql's prompt.
In ordinary interactive use, psql will break what you type at
semicolon boundaries and send each SQL command to the backend
separately, even if you typed several commands on one line.
*But* the -c option doesn't work that way --- it just sends the
whole given string to the backend as one query.

The implication of this is that psql -c "select * from getting; vacuum;"
executes the select and the vacuum as part of a single transaction,
whereas any other way of doing it with psql will make the commands be
two separate transactions.  I speculate that this has something to do
with the different behavior you see.

Exactly what the bug is is beyond my abilities, but perhaps that tidbit
will help someone more competent to find it.

            regards, tom lane

Re: [HACKERS] delete from ..;vacuum crashes

From
t-ishii@sra.co.jp (Tatsuo Ishii)
Date:
At 2:14 AM 98.10.2 -0400, Tom Lane wrote:
>Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> Here is a report regarding the backend-crash from a user in Japan.
>> Included shell script should reproduce the phenomenon.
>
>On which postgres version(s)?

The latest cvs source tree (I have not tried this for 6.3.2).

>The implication of this is that psql -c "select * from getting; vacuum;"
>executes the select and the vacuum as part of a single transaction,
>whereas any other way of doing it with psql will make the commands be
>two separate transactions.  I speculate that this has something to do
>with the different behavior you see.

So basically the backend treats multiple SQL statements conjucted with ";"
as a single transaction?

If this is the cause of the problem, following SQLs should produce
the backend death too. I will try this.

begin;
select * from getting;
vacuum;
end;
--
Tatsuo Ishii
t-ishii@sra.co.jp


Re: [HACKERS] delete from ..;vacuum crashes

From
Tom Lane
Date:
t-ishii@sra.co.jp (Tatsuo Ishii) writes:
> So basically the backend treats multiple SQL statements conjucted with ";"
> as a single transaction?

If they arrive in a single query string, as you can cause with psql -c
or with a direct PQexec() call.

Actually it's even more subtle than that: multiple statements in a
single query string act like a *single statement* as far as the
transaction mechanism is concerned.  There's one StartTransactionCommand
in postgres.c before the string starts to execute, and one
CommitTransactionCommand after it's done.

It is possible that that is a bug, and that we need to take the start/
commit calls out of the postgres.c main loop and put them somewhere
down inside the parsing/execution code, at a point where the software
has parsed off a single SQL statement.  As it stands, if there are any
statements in the system that assume there is a StartTransactionCommand
just before they begin and a CommitTransactionCommand just after they
finish, those statements will break when executed as part of a
multi-statement query.  Question for the gurus: would you consider this
a bug in the particular statement (it shouldn't assume that), or a bug
in the outer layers (they should make that be true)?

In particular, since vacuum.c does some peculiar things with transaction
boundaries, it seems to me that it might be an example of such a
statement and that what I just described is the root cause of your bug.
But someone who knows the system better than me will have to figure out
just what's going on.

> If this is the cause of the problem, following SQLs should produce
> the backend death too. I will try this.

> begin;
> select * from getting;
> vacuum;
> end;

You should try it and let us know.  But that is a different test case,
because there will be CommitTransactionCommand & StartTransactionCommand
between the select and the vacuum.

            regards, tom lane

select * from ..;vacuum crashes

From
Tatsuo Ishii
Date:
(I have changed the subject "delete from" to "select * from" )

As I reported,

    select * from getting; vacuum;

does crash the backend with included test data.

This time I have tried:

begin;
select * from getting;
vacuum;
end;

and have a crash too.

(using current source tree + FreeBSD)

I think this should be added to the Open 6.4 items list.

>Here is a report regarding the backend-crash from a user in Japan.
>Included shell script should reproduce the phenomenon.
>Note that
>    select * from getting; vacuum;
>does cause a crash, while
>    select * from getting;
>    vacuum;
>not.
>--
>Tatsuo Ishii
>t-ishii@sra.co.jp
>
>========================================================================
>#!/bin/sh
>
>DBNAME=ptest
>
>destroydb $DBNAME
>createdb $DBNAME
>psql -e $DBNAME <<EOF
>create table header
>(
>    host    text    not null,
>    port    int    not null,
>    path    text    not null,
>    file    text    not null,
>    extra    text    not null,
>    name    text    not null,
>    value    text    not null
>);
>create index header_url_idx on header (host, port, path, file, extra);
>create unique index header_uniq_idx on header (host, port, path, file, extra, name);
>
>create table reference
>(
>    f_url    text    not null,
>    t_url    text    not null
>);
>create index reference_from_idx on reference (f_url);
>create index reference_to_idx on reference (t_url);
>create unique index reference_uniq_idx on reference (f_url, t_url);
>
>create table extension
>(
>    ext    text    not null,
>    note    text
>);
>create unique index extension_ext_idx on extension (ext);
>
>create table getting
>(
>    host    text    not null,
>    port    int    not null,
>    ip    text    not null,
>    when    datetime    not null
>);
>create unique index getting_ip_idx on getting (ip);
>EOF
>#psql -c "delete from getting; vacuum;" $DBNAME
>psql -c "select * from getting; vacuum;" $DBNAME
>#psql -c "delete from getting;" $DBNAME
>#psql -c "select * from getting;" $DBNAME
>#psql -c "vacuum;" $DBNAME
>#psql -c "vacuum; vacuum;" $DBNAME
>


Re: [HACKERS] select * from ..;vacuum crashes

From
Bruce Momjian
Date:
> (I have changed the subject "delete from" to "select * from" )
>
> As I reported,
>
>     select * from getting; vacuum;
>
> does crash the backend with included test data.
>
> This time I have tried:
>
> begin;
> select * from getting;
> vacuum;
> end;

I am attaching the original test script that will crash the backend.
The backtrace is:

    exceptionP=0x8152500, detail=0x0, fileName=0x8113761 "heapam.c",
    lineNumber=1055) at assert.c:74
#6  0x805a3ea in heap_fetch (relation=0x8187310, snapshot=0x0, tid=0x82f1128,
    userbuf=0x8045430) at heapam.c:1055
#7  0x8081986 in vc_updstats (relid=141974, num_pages=0, num_tuples=0,
    hasindex=1 '\001', vacrelstats=0x8186890) at vacuum.c:1767
#8  0x807ef8d in vc_vacone (relid=141974, analyze=0, va_cols=0x0)
    at vacuum.c:579
#9  0x807e6f1 in vc_vacuum (VacRelP=0x0, analyze=0 '\000', va_cols=0x0)
    at vacuum.c:257
#10 0x807e5ce in vacuum (vacrel=0x0, verbose=0, analyze=0 '\000', va_spec=0x0)
    at vacuum.c:160
#11 0x80e2d07 in ProcessUtility (parsetree=0x8185950, dest=Debug)
    at utility.c:644
#12 0x80e0745 in pg_exec_query_dest (query_string=0x80455f8 "vacuum;\n",
    dest=Debug, aclOverride=0) at postgres.c:758
#13 0x80e0664 in pg_exec_query (query_string=0x80455f8 "vacuum;\n")
    at postgres.c:699
#14 0x80e1708 in PostgresMain (argc=4, argv=0x8047644, real_argc=4,
    real_argv=0x8047644) at postgres.c:1622
#15 0x809ae39 in main (argc=4, argv=0x8047644) at main.c:103
#16 0x804a96c in __start ()

Something in the heap fetch it does not like.  I am kind of lost in this
part of the code.

The Assert line is:

    Assert(ItemIdIsUsed(lp));

which is checking for:

    (bool) (((itemId)->lp_flags & LP_USED) != 0)

which is saying the disk identifer should be in use, but is not during
the vacuum, for some reason.

You must enable Assert to see the crash.

The cause may be because you are doing a vacuum INSIDE a transaction.  I
think that also explains the psql -e thing, because that does both
commands in the same transaction.

Perhaps we need to disable vacuum inside transactions.  Vadim?

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#!/bin/sh

DBNAME=ptest

destroydb $DBNAME
createdb $DBNAME
psql -e $DBNAME <<EOF
create table header
(
    host    text    not null,
    port    int    not null,
    path    text    not null,
    file    text    not null,
    extra    text    not null,
    name    text    not null,
    value    text    not null
);
create index header_url_idx on header (host, port, path, file, extra);
create unique index header_uniq_idx on header (host, port, path, file, extra, name);

create table reference
(
    f_url    text    not null,
    t_url    text    not null
);
create index reference_from_idx on reference (f_url);
create index reference_to_idx on reference (t_url);
create unique index reference_uniq_idx on reference (f_url, t_url);

create table extension
(
    ext    text    not null,
    note    text
);
create unique index extension_ext_idx on extension (ext);

create table getting
(
    host    text    not null,
    port    int    not null,
    ip    text    not null,
    when    datetime    not null
);
create unique index getting_ip_idx on getting (ip);
EOF
#psql -c "delete from getting; vacuum;" $DBNAME
psql -c "select * from getting; vacuum;" $DBNAME
#psql -c "delete from getting;" $DBNAME
#psql -c "select * from getting;" $DBNAME
#psql -c "vacuum;" $DBNAME
#psql -c "vacuum; vacuum;" $DBNAME



RE: [HACKERS] select * from ..;vacuum crashes

From
"Taral"
Date:
> begin;
> select * from getting;
> vacuum;
> end;

Question: Does the following really make sense?

begin transaction;
select * from table;
vacuum;
abort transaction;

Taral

Re: [HACKERS] select * from ..;vacuum crashes

From
Tatsuo Ishii
Date:
>You must enable Assert to see the crash.

I saw the crash without assertion enabled? This is FreeBSD 2.2.6.

>The cause may be because you are doing a vacuum INSIDE a transaction.  I
>think that also explains the psql -e thing, because that does both
>commands in the same transaction.
>
>Perhaps we need to disable vacuum inside transactions.  Vadim?

FYI, it is reported that 6.3.2 does not have the crash.
--
Tatsuo Ishii
t-ishii@sra.co.jp

Re: [HACKERS] select * from ..;vacuum crashes

From
Nick Bastin
Date:
Taral wrote:
>
> > begin;
> > select * from getting;
> > vacuum;
> > end;
>
> Question: Does the following really make sense?

I'm glad somebody asked this, since I was beginning to wonder if I was missing
something.  My vote would  be to disable vacuum in a transaction, since it
doesn't make a whole lot of sense anyhow.

--
Nick Bastin
RBB Systems, Inc.

Re: [HACKERS] select * from ..;vacuum crashes

From
Bruce Momjian
Date:
> >You must enable Assert to see the crash.
>
> I saw the crash without assertion enabled? This is FreeBSD 2.2.6.
>
> >The cause may be because you are doing a vacuum INSIDE a transaction.  I
> >think that also explains the psql -e thing, because that does both
> >commands in the same transaction.
> >
> >Perhaps we need to disable vacuum inside transactions.  Vadim?
>
> FYI, it is reported that 6.3.2 does not have the crash.

Of course, you are right.  I can reproduce it with the SELECT, then
VACUUM, with no transactions at all.  VACUUM alone works, but not with
the SELECT before it.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] select * from ..;vacuum crashes

From
"Thomas G. Lockhart"
Date:
> >       select * from getting; vacuum;
> > does crash the backend with included test data.
> > This time I have tried:
> > begin;
> > select * from getting;
> > vacuum;
> > end;

Oleg, can you try compiling with asserts enabled and see if you can
learn anything new about your vacuum problems? It's probably not related
to this report, but you never know for sure. I can send you this
original message if you did not receive it...

                  - Tom

Re: [HACKERS] select * from ..;vacuum crashes

From
Bruce Momjian
Date:
> >You must enable Assert to see the crash.
>
> I saw the crash without assertion enabled? This is FreeBSD 2.2.6.
>
> >The cause may be because you are doing a vacuum INSIDE a transaction.  I
> >think that also explains the psql -e thing, because that does both
> >commands in the same transaction.
> >
> >Perhaps we need to disable vacuum inside transactions.  Vadim?
>
> FYI, it is reported that 6.3.2 does not have the crash.

I think I will be able to fix this if no one gets to it first.  Looks
like a problem with the cache lookup and updating class statistics.
Could take me a few days until I can get to it.  If someone else wants
to debug it, go ahead.

I am on jury duty.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] select * from ..;vacuum crashes

From
Bruce Momjian
Date:
> >You must enable Assert to see the crash.
>
> I saw the crash without assertion enabled? This is FreeBSD 2.2.6.
>
> >The cause may be because you are doing a vacuum INSIDE a transaction.  I
> >think that also explains the psql -e thing, because that does both
> >commands in the same transaction.
> >
> >Perhaps we need to disable vacuum inside transactions.  Vadim?

Turns out getting is being removed from pg_class.  If you run the
script, without the final psql command, and go into ptest, and look do:

    select oid, relname from pg_class

you see the table getting.  If you then run the 'select * from getting'
and 'vacuum' you will see from the backtrace it is trying to update
statistics on the 'getting' table, but it is gone.  If you go back into
ptest after the vacuum crash, 'getting' is gone from pg_class.

Looks like I may need help on this one.  How does that happen?

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] select * from ..;vacuum crashes

From
Vadim Mikheev
Date:
Taral wrote:
>
> > begin;
> > select * from getting;
> > vacuum;
> > end;
>
> Question: Does the following really make sense?
>
> begin transaction;
> select * from table;
> vacuum;
> abort transaction;

Using vacuum inside BEGIN/END is Bad Idea!!!
MUST be disabled: vacuum uses MANY transaction but
CommitTransactionCommand() used by vacuum does nothing
inside BEGIN/END (only increments command counter).

Vadim

Re: [HACKERS] select * from ..;vacuum crashes

From
Bruce Momjian
Date:
> >You must enable Assert to see the crash.
>
> I saw the crash without assertion enabled? This is FreeBSD 2.2.6.
>
> >The cause may be because you are doing a vacuum INSIDE a transaction.  I
> >think that also explains the psql -e thing, because that does both
> >commands in the same transaction.
> >
> >Perhaps we need to disable vacuum inside transactions.  Vadim?
>
> FYI, it is reported that 6.3.2 does not have the crash.

I am still working on it, but it appears that the vacuuming of pg_class
is causing the failure of the vacuum on 'getting'.  Probably the
vacuuming of pg_class it not invalidating the pg_class cache values.  I
added additional debugging statements to find out exactly when the cache
lookups start to fail.


    UnUsed 0, MinLen 110, MaxLen 144; Re-using: Free/Avail. Space 1496/1496; EndEmpty/Avail. Pages 0/1. Elapsed 0/0
sec.
    DEBUG:  Index pg_class_relname_index: Pages 2; Tuples 58: Deleted 12. Elapsed 0/0 sec.
    DEBUG:  Index pg_class_oid_index: Pages 2; Tuples 58: Deleted 12. Elapsed 0/0 sec.
    DEBUG:  Rel pg_class: Pages: 2 --> 1; Tuple(s) moved: 2. Elapsed 0/0 sec.
    DEBUG:  Index pg_class_relname_index: Pages 2; Tuples 58: Deleted 2. Elapsed 0/0 sec.
    DEBUG:  Index pg_class_oid_index: Pages 2; Tuples 58: Deleted 2. Elapsed 0/0 sec.
    TRAP: Failed Assertion("!(( (void)((bool) ((! assert_enabled) || (! !((bool)((void*)(lp) != 0))) ||
(ExceptionalCondition("!((bool)((void*)(lp)!= 0))", &( FailedAssertion), (char*) 0, "heapam.c", 1057)))), (bool)
(((lp)->lp_flags& 0x01) != 0) )):", File: "heapam.c", Line: 1057) 

    !(( (void)((bool) ((! assert_enabled) || (! !((bool)((void*)(lp) != 0))) ||
(ExceptionalCondition("!((bool)((void*)(lp)!= 0))", &( FailedAssertion), (char*) 0, "heapam.c", 1057)))), (bool)
(((lp)->lp_flags& 0x01) != 0) )) (0) [Permission denied] 


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] select * from ..;vacuum crashes

From
Bruce Momjian
Date:
> >You must enable Assert to see the crash.
>
> I saw the crash without assertion enabled? This is FreeBSD 2.2.6.
>
> >The cause may be because you are doing a vacuum INSIDE a transaction.  I
> >think that also explains the psql -e thing, because that does both
> >commands in the same transaction.
> >
> >Perhaps we need to disable vacuum inside transactions.  Vadim?
>
> FYI, it is reported that 6.3.2 does not have the crash.

I have a fix and will apply tomorrow with a posting.  Heading to bed.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] select * from ..;vacuum crashes

From
Bruce Momjian
Date:
> >You must enable Assert to see the crash.
>
> I saw the crash without assertion enabled? This is FreeBSD 2.2.6.
>
> >The cause may be because you are doing a vacuum INSIDE a transaction.  I
> >think that also explains the psql -e thing, because that does both
> >commands in the same transaction.
> >
> >Perhaps we need to disable vacuum inside transactions.  Vadim?
>
> FYI, it is reported that 6.3.2 does not have the crash.

OK, this is fixed now.  The problem is that my new cache-use code finds
tuples by looking in the cache, getting the t_ctid value, and using
heap_fetch to get the tuple, rather than the older sequential
scan/ScanKey method.

However, when you vacuum a table, as the rows are moved, the t_ctid
changes, but there was no call to invalidate the system cache for the
row, so when you vacuum pg_class, and later use the cache to look up
something, the cache points to an old tuple.

This is fixed now, along with a little cache code cleanup that removes
some unused code that was confusing things.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] select * from ..;vacuum crashes

From
Oleg Bartunov
Date:
Bruce,

I was hoping this will fix 'vacuum analyze' problem on
my Linux box when I run postmaster with -B 1024 option, but it doesn't :-(
After clean reinstalling I still get error message:
psg post
 9080  p1 S    0:00 ssh dv -l postgres
13916  ?  S    0:00 /usr/local/pgsql/bin/postmaster -i -B 1024 -S -D/usr/local/p
dv:~$ !psq

regression=> vacuum analyze;
NOTICE:  AbortTransaction and not in in-progress state
NOTICE:  AbortTransaction and not in in-progress state
regression=> \q


    Regards,

        Oleg


On Sun, 11 Oct 1998, Bruce Momjian wrote:

> Date: Sun, 11 Oct 1998 21:16:10 -0400 (EDT)
> From: Bruce Momjian <maillist@candle.pha.pa.us>
> To: t-ishii@sra.co.jp
> Cc: t-ishii@sra.co.jp, pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] select * from ..;vacuum crashes
>
> > >You must enable Assert to see the crash.
> >
> > I saw the crash without assertion enabled? This is FreeBSD 2.2.6.
> >
> > >The cause may be because you are doing a vacuum INSIDE a transaction.  I
> > >think that also explains the psql -e thing, because that does both
> > >commands in the same transaction.
> > >
> > >Perhaps we need to disable vacuum inside transactions.  Vadim?
> >
> > FYI, it is reported that 6.3.2 does not have the crash.
>
> OK, this is fixed now.  The problem is that my new cache-use code finds
> tuples by looking in the cache, getting the t_ctid value, and using
> heap_fetch to get the tuple, rather than the older sequential
> scan/ScanKey method.
>
> However, when you vacuum a table, as the rows are moved, the t_ctid
> changes, but there was no call to invalidate the system cache for the
> row, so when you vacuum pg_class, and later use the cache to look up
> something, the cache points to an old tuple.
>
> This is fixed now, along with a little cache code cleanup that removes
> some unused code that was confusing things.
>
> --
>   Bruce Momjian                        |  http://www.op.net/~candle
>   maillist@candle.pha.pa.us            |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
>
>

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83