Thread: delete from ..;vacuum crashes
Here is a report regarding the backend-crash from a user in Japan. Included shell script should reproduce the phenomenon. Note that select * from getting; vacuum; does cause a crash, while select * from getting; vacuum; not. -- Tatsuo Ishii t-ishii@sra.co.jp ======================================================================== #!/bin/sh DBNAME=ptest destroydb $DBNAME createdb $DBNAME psql -e $DBNAME <<EOF create table header ( host text not null, port int not null, path text not null, file text not null, extra text not null, name text not null, value text not null ); create index header_url_idx on header (host, port, path, file, extra); create unique index header_uniq_idx on header (host, port, path, file, extra, name); create table reference ( f_url text not null, t_url text not null ); create index reference_from_idx on reference (f_url); create index reference_to_idx on reference (t_url); create unique index reference_uniq_idx on reference (f_url, t_url); create table extension ( ext text not null, note text ); create unique index extension_ext_idx on extension (ext); create table getting ( host text not null, port int not null, ip text not null, when datetime not null ); create unique index getting_ip_idx on getting (ip); EOF #psql -c "delete from getting; vacuum;" $DBNAME psql -c "select * from getting; vacuum;" $DBNAME #psql -c "delete from getting;" $DBNAME #psql -c "select * from getting;" $DBNAME #psql -c "vacuum;" $DBNAME #psql -c "vacuum; vacuum;" $DBNAME
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > Here is a report regarding the backend-crash from a user in Japan. > Included shell script should reproduce the phenomenon. On which postgres version(s)? > Note that > select * from getting; vacuum; > does cause a crash, while > select * from getting; > vacuum; > not. Specifically I see you are using > psql -c "select * from getting; vacuum;" $DBNAME rather than entering the commands at the psql prompt. The -c option works differently from entering multiple commands at psql's prompt. In ordinary interactive use, psql will break what you type at semicolon boundaries and send each SQL command to the backend separately, even if you typed several commands on one line. *But* the -c option doesn't work that way --- it just sends the whole given string to the backend as one query. The implication of this is that psql -c "select * from getting; vacuum;" executes the select and the vacuum as part of a single transaction, whereas any other way of doing it with psql will make the commands be two separate transactions. I speculate that this has something to do with the different behavior you see. Exactly what the bug is is beyond my abilities, but perhaps that tidbit will help someone more competent to find it. regards, tom lane
At 2:14 AM 98.10.2 -0400, Tom Lane wrote: >Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> Here is a report regarding the backend-crash from a user in Japan. >> Included shell script should reproduce the phenomenon. > >On which postgres version(s)? The latest cvs source tree (I have not tried this for 6.3.2). >The implication of this is that psql -c "select * from getting; vacuum;" >executes the select and the vacuum as part of a single transaction, >whereas any other way of doing it with psql will make the commands be >two separate transactions. I speculate that this has something to do >with the different behavior you see. So basically the backend treats multiple SQL statements conjucted with ";" as a single transaction? If this is the cause of the problem, following SQLs should produce the backend death too. I will try this. begin; select * from getting; vacuum; end; -- Tatsuo Ishii t-ishii@sra.co.jp
t-ishii@sra.co.jp (Tatsuo Ishii) writes: > So basically the backend treats multiple SQL statements conjucted with ";" > as a single transaction? If they arrive in a single query string, as you can cause with psql -c or with a direct PQexec() call. Actually it's even more subtle than that: multiple statements in a single query string act like a *single statement* as far as the transaction mechanism is concerned. There's one StartTransactionCommand in postgres.c before the string starts to execute, and one CommitTransactionCommand after it's done. It is possible that that is a bug, and that we need to take the start/ commit calls out of the postgres.c main loop and put them somewhere down inside the parsing/execution code, at a point where the software has parsed off a single SQL statement. As it stands, if there are any statements in the system that assume there is a StartTransactionCommand just before they begin and a CommitTransactionCommand just after they finish, those statements will break when executed as part of a multi-statement query. Question for the gurus: would you consider this a bug in the particular statement (it shouldn't assume that), or a bug in the outer layers (they should make that be true)? In particular, since vacuum.c does some peculiar things with transaction boundaries, it seems to me that it might be an example of such a statement and that what I just described is the root cause of your bug. But someone who knows the system better than me will have to figure out just what's going on. > If this is the cause of the problem, following SQLs should produce > the backend death too. I will try this. > begin; > select * from getting; > vacuum; > end; You should try it and let us know. But that is a different test case, because there will be CommitTransactionCommand & StartTransactionCommand between the select and the vacuum. regards, tom lane
(I have changed the subject "delete from" to "select * from" ) As I reported, select * from getting; vacuum; does crash the backend with included test data. This time I have tried: begin; select * from getting; vacuum; end; and have a crash too. (using current source tree + FreeBSD) I think this should be added to the Open 6.4 items list. >Here is a report regarding the backend-crash from a user in Japan. >Included shell script should reproduce the phenomenon. >Note that > select * from getting; vacuum; >does cause a crash, while > select * from getting; > vacuum; >not. >-- >Tatsuo Ishii >t-ishii@sra.co.jp > >======================================================================== >#!/bin/sh > >DBNAME=ptest > >destroydb $DBNAME >createdb $DBNAME >psql -e $DBNAME <<EOF >create table header >( > host text not null, > port int not null, > path text not null, > file text not null, > extra text not null, > name text not null, > value text not null >); >create index header_url_idx on header (host, port, path, file, extra); >create unique index header_uniq_idx on header (host, port, path, file, extra, name); > >create table reference >( > f_url text not null, > t_url text not null >); >create index reference_from_idx on reference (f_url); >create index reference_to_idx on reference (t_url); >create unique index reference_uniq_idx on reference (f_url, t_url); > >create table extension >( > ext text not null, > note text >); >create unique index extension_ext_idx on extension (ext); > >create table getting >( > host text not null, > port int not null, > ip text not null, > when datetime not null >); >create unique index getting_ip_idx on getting (ip); >EOF >#psql -c "delete from getting; vacuum;" $DBNAME >psql -c "select * from getting; vacuum;" $DBNAME >#psql -c "delete from getting;" $DBNAME >#psql -c "select * from getting;" $DBNAME >#psql -c "vacuum;" $DBNAME >#psql -c "vacuum; vacuum;" $DBNAME >
> (I have changed the subject "delete from" to "select * from" ) > > As I reported, > > select * from getting; vacuum; > > does crash the backend with included test data. > > This time I have tried: > > begin; > select * from getting; > vacuum; > end; I am attaching the original test script that will crash the backend. The backtrace is: exceptionP=0x8152500, detail=0x0, fileName=0x8113761 "heapam.c", lineNumber=1055) at assert.c:74 #6 0x805a3ea in heap_fetch (relation=0x8187310, snapshot=0x0, tid=0x82f1128, userbuf=0x8045430) at heapam.c:1055 #7 0x8081986 in vc_updstats (relid=141974, num_pages=0, num_tuples=0, hasindex=1 '\001', vacrelstats=0x8186890) at vacuum.c:1767 #8 0x807ef8d in vc_vacone (relid=141974, analyze=0, va_cols=0x0) at vacuum.c:579 #9 0x807e6f1 in vc_vacuum (VacRelP=0x0, analyze=0 '\000', va_cols=0x0) at vacuum.c:257 #10 0x807e5ce in vacuum (vacrel=0x0, verbose=0, analyze=0 '\000', va_spec=0x0) at vacuum.c:160 #11 0x80e2d07 in ProcessUtility (parsetree=0x8185950, dest=Debug) at utility.c:644 #12 0x80e0745 in pg_exec_query_dest (query_string=0x80455f8 "vacuum;\n", dest=Debug, aclOverride=0) at postgres.c:758 #13 0x80e0664 in pg_exec_query (query_string=0x80455f8 "vacuum;\n") at postgres.c:699 #14 0x80e1708 in PostgresMain (argc=4, argv=0x8047644, real_argc=4, real_argv=0x8047644) at postgres.c:1622 #15 0x809ae39 in main (argc=4, argv=0x8047644) at main.c:103 #16 0x804a96c in __start () Something in the heap fetch it does not like. I am kind of lost in this part of the code. The Assert line is: Assert(ItemIdIsUsed(lp)); which is checking for: (bool) (((itemId)->lp_flags & LP_USED) != 0) which is saying the disk identifer should be in use, but is not during the vacuum, for some reason. You must enable Assert to see the crash. The cause may be because you are doing a vacuum INSIDE a transaction. I think that also explains the psql -e thing, because that does both commands in the same transaction. Perhaps we need to disable vacuum inside transactions. Vadim? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 #!/bin/sh DBNAME=ptest destroydb $DBNAME createdb $DBNAME psql -e $DBNAME <<EOF create table header ( host text not null, port int not null, path text not null, file text not null, extra text not null, name text not null, value text not null ); create index header_url_idx on header (host, port, path, file, extra); create unique index header_uniq_idx on header (host, port, path, file, extra, name); create table reference ( f_url text not null, t_url text not null ); create index reference_from_idx on reference (f_url); create index reference_to_idx on reference (t_url); create unique index reference_uniq_idx on reference (f_url, t_url); create table extension ( ext text not null, note text ); create unique index extension_ext_idx on extension (ext); create table getting ( host text not null, port int not null, ip text not null, when datetime not null ); create unique index getting_ip_idx on getting (ip); EOF #psql -c "delete from getting; vacuum;" $DBNAME psql -c "select * from getting; vacuum;" $DBNAME #psql -c "delete from getting;" $DBNAME #psql -c "select * from getting;" $DBNAME #psql -c "vacuum;" $DBNAME #psql -c "vacuum; vacuum;" $DBNAME
> begin; > select * from getting; > vacuum; > end; Question: Does the following really make sense? begin transaction; select * from table; vacuum; abort transaction; Taral
>You must enable Assert to see the crash. I saw the crash without assertion enabled? This is FreeBSD 2.2.6. >The cause may be because you are doing a vacuum INSIDE a transaction. I >think that also explains the psql -e thing, because that does both >commands in the same transaction. > >Perhaps we need to disable vacuum inside transactions. Vadim? FYI, it is reported that 6.3.2 does not have the crash. -- Tatsuo Ishii t-ishii@sra.co.jp
Taral wrote: > > > begin; > > select * from getting; > > vacuum; > > end; > > Question: Does the following really make sense? I'm glad somebody asked this, since I was beginning to wonder if I was missing something. My vote would be to disable vacuum in a transaction, since it doesn't make a whole lot of sense anyhow. -- Nick Bastin RBB Systems, Inc.
> >You must enable Assert to see the crash. > > I saw the crash without assertion enabled? This is FreeBSD 2.2.6. > > >The cause may be because you are doing a vacuum INSIDE a transaction. I > >think that also explains the psql -e thing, because that does both > >commands in the same transaction. > > > >Perhaps we need to disable vacuum inside transactions. Vadim? > > FYI, it is reported that 6.3.2 does not have the crash. Of course, you are right. I can reproduce it with the SELECT, then VACUUM, with no transactions at all. VACUUM alone works, but not with the SELECT before it. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> > select * from getting; vacuum; > > does crash the backend with included test data. > > This time I have tried: > > begin; > > select * from getting; > > vacuum; > > end; Oleg, can you try compiling with asserts enabled and see if you can learn anything new about your vacuum problems? It's probably not related to this report, but you never know for sure. I can send you this original message if you did not receive it... - Tom
> >You must enable Assert to see the crash. > > I saw the crash without assertion enabled? This is FreeBSD 2.2.6. > > >The cause may be because you are doing a vacuum INSIDE a transaction. I > >think that also explains the psql -e thing, because that does both > >commands in the same transaction. > > > >Perhaps we need to disable vacuum inside transactions. Vadim? > > FYI, it is reported that 6.3.2 does not have the crash. I think I will be able to fix this if no one gets to it first. Looks like a problem with the cache lookup and updating class statistics. Could take me a few days until I can get to it. If someone else wants to debug it, go ahead. I am on jury duty. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> >You must enable Assert to see the crash. > > I saw the crash without assertion enabled? This is FreeBSD 2.2.6. > > >The cause may be because you are doing a vacuum INSIDE a transaction. I > >think that also explains the psql -e thing, because that does both > >commands in the same transaction. > > > >Perhaps we need to disable vacuum inside transactions. Vadim? Turns out getting is being removed from pg_class. If you run the script, without the final psql command, and go into ptest, and look do: select oid, relname from pg_class you see the table getting. If you then run the 'select * from getting' and 'vacuum' you will see from the backtrace it is trying to update statistics on the 'getting' table, but it is gone. If you go back into ptest after the vacuum crash, 'getting' is gone from pg_class. Looks like I may need help on this one. How does that happen? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Taral wrote: > > > begin; > > select * from getting; > > vacuum; > > end; > > Question: Does the following really make sense? > > begin transaction; > select * from table; > vacuum; > abort transaction; Using vacuum inside BEGIN/END is Bad Idea!!! MUST be disabled: vacuum uses MANY transaction but CommitTransactionCommand() used by vacuum does nothing inside BEGIN/END (only increments command counter). Vadim
> >You must enable Assert to see the crash. > > I saw the crash without assertion enabled? This is FreeBSD 2.2.6. > > >The cause may be because you are doing a vacuum INSIDE a transaction. I > >think that also explains the psql -e thing, because that does both > >commands in the same transaction. > > > >Perhaps we need to disable vacuum inside transactions. Vadim? > > FYI, it is reported that 6.3.2 does not have the crash. I am still working on it, but it appears that the vacuuming of pg_class is causing the failure of the vacuum on 'getting'. Probably the vacuuming of pg_class it not invalidating the pg_class cache values. I added additional debugging statements to find out exactly when the cache lookups start to fail. UnUsed 0, MinLen 110, MaxLen 144; Re-using: Free/Avail. Space 1496/1496; EndEmpty/Avail. Pages 0/1. Elapsed 0/0 sec. DEBUG: Index pg_class_relname_index: Pages 2; Tuples 58: Deleted 12. Elapsed 0/0 sec. DEBUG: Index pg_class_oid_index: Pages 2; Tuples 58: Deleted 12. Elapsed 0/0 sec. DEBUG: Rel pg_class: Pages: 2 --> 1; Tuple(s) moved: 2. Elapsed 0/0 sec. DEBUG: Index pg_class_relname_index: Pages 2; Tuples 58: Deleted 2. Elapsed 0/0 sec. DEBUG: Index pg_class_oid_index: Pages 2; Tuples 58: Deleted 2. Elapsed 0/0 sec. TRAP: Failed Assertion("!(( (void)((bool) ((! assert_enabled) || (! !((bool)((void*)(lp) != 0))) || (ExceptionalCondition("!((bool)((void*)(lp)!= 0))", &( FailedAssertion), (char*) 0, "heapam.c", 1057)))), (bool) (((lp)->lp_flags& 0x01) != 0) )):", File: "heapam.c", Line: 1057) !(( (void)((bool) ((! assert_enabled) || (! !((bool)((void*)(lp) != 0))) || (ExceptionalCondition("!((bool)((void*)(lp)!= 0))", &( FailedAssertion), (char*) 0, "heapam.c", 1057)))), (bool) (((lp)->lp_flags& 0x01) != 0) )) (0) [Permission denied] -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> >You must enable Assert to see the crash. > > I saw the crash without assertion enabled? This is FreeBSD 2.2.6. > > >The cause may be because you are doing a vacuum INSIDE a transaction. I > >think that also explains the psql -e thing, because that does both > >commands in the same transaction. > > > >Perhaps we need to disable vacuum inside transactions. Vadim? > > FYI, it is reported that 6.3.2 does not have the crash. I have a fix and will apply tomorrow with a posting. Heading to bed. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> >You must enable Assert to see the crash. > > I saw the crash without assertion enabled? This is FreeBSD 2.2.6. > > >The cause may be because you are doing a vacuum INSIDE a transaction. I > >think that also explains the psql -e thing, because that does both > >commands in the same transaction. > > > >Perhaps we need to disable vacuum inside transactions. Vadim? > > FYI, it is reported that 6.3.2 does not have the crash. OK, this is fixed now. The problem is that my new cache-use code finds tuples by looking in the cache, getting the t_ctid value, and using heap_fetch to get the tuple, rather than the older sequential scan/ScanKey method. However, when you vacuum a table, as the rows are moved, the t_ctid changes, but there was no call to invalidate the system cache for the row, so when you vacuum pg_class, and later use the cache to look up something, the cache points to an old tuple. This is fixed now, along with a little cache code cleanup that removes some unused code that was confusing things. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce, I was hoping this will fix 'vacuum analyze' problem on my Linux box when I run postmaster with -B 1024 option, but it doesn't :-( After clean reinstalling I still get error message: psg post 9080 p1 S 0:00 ssh dv -l postgres 13916 ? S 0:00 /usr/local/pgsql/bin/postmaster -i -B 1024 -S -D/usr/local/p dv:~$ !psq regression=> vacuum analyze; NOTICE: AbortTransaction and not in in-progress state NOTICE: AbortTransaction and not in in-progress state regression=> \q Regards, Oleg On Sun, 11 Oct 1998, Bruce Momjian wrote: > Date: Sun, 11 Oct 1998 21:16:10 -0400 (EDT) > From: Bruce Momjian <maillist@candle.pha.pa.us> > To: t-ishii@sra.co.jp > Cc: t-ishii@sra.co.jp, pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] select * from ..;vacuum crashes > > > >You must enable Assert to see the crash. > > > > I saw the crash without assertion enabled? This is FreeBSD 2.2.6. > > > > >The cause may be because you are doing a vacuum INSIDE a transaction. I > > >think that also explains the psql -e thing, because that does both > > >commands in the same transaction. > > > > > >Perhaps we need to disable vacuum inside transactions. Vadim? > > > > FYI, it is reported that 6.3.2 does not have the crash. > > OK, this is fixed now. The problem is that my new cache-use code finds > tuples by looking in the cache, getting the t_ctid value, and using > heap_fetch to get the tuple, rather than the older sequential > scan/ScanKey method. > > However, when you vacuum a table, as the rows are moved, the t_ctid > changes, but there was no call to invalidate the system cache for the > row, so when you vacuum pg_class, and later use the cache to look up > something, the cache points to an old tuple. > > This is fixed now, along with a little cache code cleanup that removes > some unused code that was confusing things. > > -- > Bruce Momjian | http://www.op.net/~candle > maillist@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83