Vacuum analyze bug CAUGHT - Mailing list pgsql-hackers

From Michael Simms
Subject Vacuum analyze bug CAUGHT
Date
Msg-id 199909090143.CAA11936@argh.demon.co.uk
Whole thread Raw
Responses Re: [HACKERS] Vacuum analyze bug CAUGHT  (Tom Lane <tgl@sss.pgh.pa.us>)
RE: [HACKERS] Vacuum analyze bug CAUGHT  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
List pgsql-hackers
Hi

Okee, I have caught the vacuum analyse crash that was giving me a load of
grief

Firstly, I create the following database

$psql crashtest
Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL
[PostgreSQL 6.5.1 on i586-pc-linux-gnu, compiled by gcc -ggdb ]
  type \? for help on slash commands  type \q to quit  type \g or terminate with semicolon to execute queryYou are
currentlyconnected to the database: crashtest
 

crashtest=> select version();
version                                                      
-------------------------------------------------------------
PostgreSQL 6.5.1 on i586-pc-linux-gnu, compiled by gcc -ggdb 
(1 row)

crashtest=> \d
Database    = crashtest+------------------+----------------------------------+----------+|  Owner           |
 Relation             |   Type   |+------------------+----------------------------------+----------+| test
|testtable                        | table    |+------------------+----------------------------------+----------+
 

crashtest=> select distinct * from testtable;
vara|varb
----+----
abc |def 
(1 row)

crashtest=> select count(*) from testtable;count
------
319800
(1 row)

-----------------------------------

Now, I then create a text file with 780 (just a random large number) new
lines that are exactly the same as the ones already in the database, so they
can be added with the copy command.

(the 319,800 rows I have in my database were actually created using copy
of this file, but I think that that is irrelavent).

I then run the followiung script (which you will note just runs forever:

------------------------------------------------------
#!/bin/sh -

(   while [ "1" = "1" ]   do
   echo "create temp table temptest ( vara text, varb text );"   echo "copy temptest from '/tmp/copyeffort';"
   echo "insert into testtable select * from temptest;"      echo "drop table temptest;"
   sleep 1
   done

) | psql crashtest

-----------------------------------------------------

I attach gdb to the backend that is handling this task

I then start a psql session to the same database

Then I attach another GDB to this backend too, so I have GDB on both

Then, in psql, I run and get the following result:

-----------

crashtest=> vacuum analyze;
NOTICE:  Rel pg_type: TID 4/3: InsertTransactionInProgress 129915 - can't shrink relation
NOTICE:  Rel pg_attribute: TID 23/5: InsertTransactionInProgress 129915 - can't shrink relation
NOTICE:  Rel pg_attribute: TID 23/6: InsertTransactionInProgress 129915 - can't shrink relation
NOTICE:  Rel pg_attribute: TID 23/7: InsertTransactionInProgress 129915 - can't shrink relation
NOTICE:  Rel pg_attribute: TID 23/8: InsertTransactionInProgress 129915 - can't shrink relation
NOTICE:  Rel pg_attribute: TID 23/9: InsertTransactionInProgress 129915 - can't shrink relation
NOTICE:  Rel pg_attribute: TID 23/10: InsertTransactionInProgress 129915 - can't shrink relation
NOTICE:  Rel pg_attribute: TID 23/11: InsertTransactionInProgress 129915 - can't shrink relation
NOTICE:  Rel pg_attribute: TID 23/12: InsertTransactionInProgress 129915 - can't shrink relation
NOTICE:  Rel pg_class: TID 3/22: InsertTransactionInProgress 129915 - can't shrink relation
NOTICE:  AbortTransaction and not in in-progress state 

------------

At this point, the backend attatched to the script doing the creates inserts
and drops has the following:


Program received signal SIGUSR2, User defined signal 2.
0x4018db4d in __libc_fsync ()
(gdb) where
#0  0x4018db4d in __libc_fsync ()
#1  0x80d5908 in pg_fsync (fd=3) at fd.c:202
#2  0x80d615a in FileSync (file=3) at fd.c:876
#3  0x80dc9ed in mdcommit () at md.c:796
#4  0x80dd1bc in smgrcommit () at smgr.c:375
#5  0x80d47f7 in FlushBufferPool (StableMainMemoryFlag=0) at bufmgr.c:1260
#6  0x8078755 in RecordTransactionCommit () at xact.c:636
#7  0x8078919 in CommitTransaction () at xact.c:940
#8  0x8078ac1 in CommitTransactionCommand () at xact.c:1177
#9  0x80df0cf in PostgresMain (argc=-1073742286, argv=0xbffff7a0, real_argc=7,    real_argv=0xbffffd04) at
postgres.c:1679
#10 0x80c91ba in DoBackend (port=0x81ea4a0) at postmaster.c:1628
#11 0x80c8cda in BackendStartup (port=0x81ea4a0) at postmaster.c:1373
#12 0x80c8429 in ServerLoop () at postmaster.c:823
#13 0x80c7f67 in PostmasterMain (argc=7, argv=0xbffffd04) at postmaster.c:616
#14 0x80a0986 in main (argc=7, argv=0xbffffd04) at main.c:97
#15 0x400fccb3 in __libc_start_main (main=0x80a0920 <main>, argc=7,    argv=0xbffffd04, init=0x8061360 <_init>,
fini=0x810d63c<_fini>,    rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffffcfc)   at
../sysdeps/generic/libc-start.c:78
(gdb) 
Program received signal SIGUSR2, User defined signal 2.
0x4018db4d in __libc_fsync ()
(gdb) where
#0  0x4018db4d in __libc_fsync ()
#1  0x80d5908 in pg_fsync (fd=3) at fd.c:202
#2  0x80d615a in FileSync (file=3) at fd.c:876
#3  0x80dc9ed in mdcommit () at md.c:796
#4  0x80dd1bc in smgrcommit () at smgr.c:375
#5  0x80d47f7 in FlushBufferPool (StableMainMemoryFlag=0) at bufmgr.c:1260
#6  0x8078755 in RecordTransactionCommit () at xact.c:636
#7  0x8078919 in CommitTransaction () at xact.c:940
#8  0x8078ac1 in CommitTransactionCommand () at xact.c:1177
#9  0x80df0cf in PostgresMain (argc=-1073742286, argv=0xbffff7a0, real_argc=7,    real_argv=0xbffffd04) at
postgres.c:1679
#10 0x80c91ba in DoBackend (port=0x81ea4a0) at postmaster.c:1628
#11 0x80c8cda in BackendStartup (port=0x81ea4a0) at postmaster.c:1373
#12 0x80c8429 in ServerLoop () at postmaster.c:823
#13 0x80c7f67 in PostmasterMain (argc=7, argv=0xbffffd04) at postmaster.c:616
#14 0x80a0986 in main (argc=7, argv=0xbffffd04) at main.c:97
#15 0x400fccb3 in __libc_start_main (main=0x80a0920 <main>, argc=7,    argv=0xbffffd04, init=0x8061360 <_init>,
fini=0x810d63c<_fini>,    rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffffcfc)   at
../sysdeps/generic/libc-start.c:78
(gdb) 

About 3 - 5 seconds later, the gdb attached to the vacuuming psql gets the
following - Bear in mind this is a DIFFERENT BACKEND

Program received signal SIGSEGV, Segmentation fault.
AllocSetReset (set=0x0) at aset.c:159
aset.c:159: No such file or directory.
(gdb) where
#0  AllocSetReset (set=0x0) at aset.c:159
#1  0x810ac12 in EndPortalAllocMode () at portalmem.c:938
#2  0x8078833 in AtAbort_Memory () at xact.c:800
#3  0x80789ff in AbortTransaction () at xact.c:1026
#4  0x8078aef in AbortCurrentTransaction () at xact.c:1243
#5  0x80deed6 in PostgresMain (argc=-1073742288, argv=0xbffff7a0, real_argc=7,    real_argv=0xbffffd04) at
postgres.c:1550
#6  0x80c91ba in DoBackend (port=0x81ea4a0) at postmaster.c:1628
#7  0x80c8cda in BackendStartup (port=0x81ea4a0) at postmaster.c:1373
#8  0x80c8429 in ServerLoop () at postmaster.c:823
#9  0x80c7f67 in PostmasterMain (argc=7, argv=0xbffffd04) at postmaster.c:616
#10 0x80a0986 in main (argc=7, argv=0xbffffd04) at main.c:97
#11 0x400fccb3 in __libc_start_main (main=0x80a0920 <main>, argc=7,    argv=0xbffffd04, init=0x8061360 <_init>,
fini=0x810d63c<_fini>,    rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffffcfc)   at
../sysdeps/generic/libc-start.c:78
(gdb) 


------------------------------------------------------------------------

This is REPRODUCABLE

I have run this through 5 times and it only managed to vacuum two times.

Now, I have the two GDBs still running, so if you need any values such
as variable states, let me know and I will supply them. (I'll be out of
the house till about 7.30pm UK time tomorrow, so it will have to be after
then).

Hopefully, this will be something can get fixed for 6.5.2 as it is a BIG
problem for me, as it happens when I am building my postgresql search
engine.

Thanks                    M Simms


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] PG_UPGRADE status
Next
From: "Damond Walker"
Date:
Subject: Sparc stuff again.