Thread: SIGSEGV taken on 8.1 during dump/reload

SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
Hey all,

I was doing a test run of a live dump from 8.0.2 to 8.1.0, and 8.1.0 took a
segmentation violation 1 hour into the operation.  My plan is to re-do the
dump/restore, and if it fails again, to re-compile with debug and cassert, and
try to get a core.

The command line was (8.1.0 is on port 5433):

time pg_dumpall -c -v | psql -p 5433 -d template1

template1=# select version();                                                version
          
 
-------------------------------------------------------------------------------
--------------------------PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.2
(Mandrake Linux 10.0 3.3.2-6mdk)
(1 row)

Config is:

BINDIR = /usr/local/pgsql810/bin
DOCDIR = /usr/local/pgsql810/doc
INCLUDEDIR = /usr/local/pgsql810/include
PKGINCLUDEDIR = /usr/local/pgsql810/include
INCLUDEDIR-SERVER = /usr/local/pgsql810/include/server
LIBDIR = /usr/local/pgsql810/lib
PKGLIBDIR = /usr/local/pgsql810/lib
LOCALEDIR = 
MANDIR = /usr/local/pgsql810/man
SHAREDIR = /usr/local/pgsql810/share
SYSCONFDIR = /usr/local/pgsql810/etc
PGXS = /usr/local/pgsql810/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--enable-syslog' '--prefix=/usr/local/pgsql810'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wendif-labels
-fno-strict-aliasing
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,/usr/local/pgsql810/lib
LDFLAGS_SL = 
LIBS = -lpgport -lz -lreadline -lncurses -lcrypt -lresolv -lnsl -ldl -lm -lbsd 
VERSION = PostgreSQL 8.1.0

Log snippet as follows (serverlog is empty).  postgres810 is 8.1.0, postgres is
8.0.2.

Nov  6 16:02:09 thunder postgres810[5238]: [1-1] LOG:  autovacuum: processing
database "tassiv"
Nov  6 16:03:09 thunder postgres810[5306]: [1-1] LOG:  autovacuum: processing
database "bacula"
Nov  6 16:03:12 thunder postgres[1772]: [6-1] tassiv LOG:  duration: 1539387.072
ms  statement: COPY public.obs_v (x, y, imag, smag, sky, chi, sharp, iter, loc,
obs_id,
Nov  6 16:03:12 thunder postgres[1772]: [6-2]  file_id, use, solve, star_id,
mag) TO stdout;
Nov  6 16:04:09 thunder postgres810[5359]: [1-1] LOG:  autovacuum: processing
database "cpan"
Nov  6 16:05:09 thunder postgres[1772]: [7-1] tassiv LOG:  duration: 98330.722
ms  statement: COPY public.tycho2 (star_id, gsc, loc, bt, e_bt, vt, e_vt, prox)
TO stdout;
Nov  6 16:05:09 thunder postgres810[5418]: [1-1] LOG:  autovacuum: processing
database "dspam"
Nov  6 16:05:15 thunder postgres810[1773]: [20-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "catalog_pkey" for table "catalog"
Nov  6 16:05:32 thunder postgres810[1773]: [21-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "color_groups_pkey" for table
"color_groups"
Nov  6 16:05:32 thunder postgres810[1773]: [22-1] tassivNOTICE:  ALTER TABLE /
ADD UNIQUE will create implicit index "files_name_key" for table "files"
Nov  6 16:05:32 thunder postgres810[1773]: [23-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "files_pkey" for table "files"
Nov  6 16:05:32 thunder postgres810[1773]: [24-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "groups_pkey" for table "groups"
Nov  6 16:05:32 thunder postgres810[1773]: [25-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "new_reference_loc_pkey" for table
"new_reference_loc"
Nov  6 16:05:32 thunder postgres810[1773]: [26-1] tassivNOTICE:  ALTER TABLE /
ADD UNIQUE will create implicit index "nights_night_key" for table "nights"
Nov  6 16:05:32 thunder postgres810[1773]: [27-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "nights_pkey" for table "nights"
Nov  6 16:05:32 thunder postgres810[1773]: [28-1] tassivNOTICE:  ALTER TABLE /
ADD UNIQUE will create implicit index "obs_root_obs_id_key" for table "obs_root"
Nov  6 16:05:32 thunder postgres810[1773]: [29-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "pairs_pkey" for table "pairs"
Nov  6 16:05:32 thunder postgres810[1773]: [30-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "reference_ubvri_pkey" for table
"reference_ubvri"
Nov  6 16:05:34 thunder postgres810[1773]: [31-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "sites_pkey" for table "sites"
Nov  6 16:05:34 thunder postgres810[1773]: [32-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "tycho2_pkey" for table "tycho2"
Nov  6 16:05:55 thunder postgres810[1773]: [33-1] tassivNOTICE:  ALTER TABLE /
ADD PRIMARY KEY will create implicit index "zero_pair_pkey" for table
"zero_pair"
Nov  6 16:06:10 thunder postgres810[5489]: [1-1] LOG:  autovacuum: processing
database "template1"
Nov  6 16:06:27 thunder postgres810[32258]: [1-1] LOG:  server process (PID
1773) was terminated by signal 11
Nov  6 16:06:27 thunder postgres810[32258]: [2-1] LOG:  terminating any other
active server processes
Nov  6 16:06:27 thunder postgres810[32258]: [3-1] LOG:  all server processes
terminated; reinitializing
Nov  6 16:06:27 thunder postgres[1772]: [8-1] tassiv LOG:  unexpected EOF on
client connection
Nov  6 16:06:28 thunder postgres810[5508]: [4-1] LOG:  database system was
interrupted at 2005-11-06 16:05:15 MST
Nov  6 16:06:28 thunder postgres810[5508]: [5-1] LOG:  checkpoint record is at
1/BA12B8B4
Nov  6 16:06:28 thunder postgres810[5508]: [6-1] LOG:  redo record is at
1/BA020058; undo record is at 0/0; shutdown FALSE
Nov  6 16:06:28 thunder postgres810[5508]: [7-1] LOG:  next transaction ID:
625556; next OID: 33061
Nov  6 16:06:28 thunder postgres810[5508]: [8-1] LOG:  next MultiXactId: 1153;
next MultiXactOffset: 11782
Nov  6 16:06:28 thunder postgres810[5508]: [9-1] LOG:  database system was not
properly shut down; automatic recovery in progress
Nov  6 16:06:28 thunder postgres810[5508]: [10-1] LOG:  redo starts at
1/BA020058
Nov  6 16:06:28 thunder postgres[1373]: [4-1] template1 LOG:  unexpected EOF on
client connection
Nov  6 16:06:42 thunder postgres810[5508]: [11-1] LOG:  record with zero length
at 1/BF1DFB44
Nov  6 16:06:42 thunder postgres810[5508]: [12-1] LOG:  redo done at 1/BF1DFB1C
Nov  6 16:06:44 thunder postgres810[5508]: [13-1] LOG:  database system is ready
Nov  6 16:06:44 thunder postgres810[5508]: [14-1] LOG:  transaction ID wrap
limit is 2147484146, limited by database "template1"


-- 16:09:17 up 35 days,  8:43,  8 users,  load average: 4.56, 5.83, 6.47
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


Re: SIGSEGV taken on 8.1 during dump/reload

From
Andrew Dunstan
Date:
Which version is first in your path, 8.0 or 8.1? If 8.0, do you get a 
different result from the 8.1 binaries?

cheers

andrew

Robert Creager wrote:

>Hey all,
>
>I was doing a test run of a live dump from 8.0.2 to 8.1.0, and 8.1.0 took a
>segmentation violation 1 hour into the operation.  My plan is to re-do the
>dump/restore, and if it fails again, to re-compile with debug and cassert, and
>try to get a core.
>
>The command line was (8.1.0 is on port 5433):
>
>time pg_dumpall -c -v | psql -p 5433 -d template1
>
>
>  
>


Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
When grilled further on (Sun, 06 Nov 2005 18:52:40 -0500),
Andrew Dunstan <andrew@dunslane.net> confessed:

> 
> Which version is first in your path, 8.0 or 8.1? If 8.0, do you get a 
> different result from the 8.1 binaries?
> 

8.0 was first.  I've specified the correct full path now for the executables. 
Also, I've actually installed the shared libraries for the types and triggers
that I use on that DB.  I always seem to forget that :-(

But, the table/index that it dies on is not using either the trigger or non
native types, unless PG isn't getting the chance to emit that it's working on
the next one before it goes out to lunch?  The second reload died also.  If the
third dies (now that the type is in place), I'll do the re-compile and core.

tassiv=# \d zero_pair     Table "public.zero_pair"   Column    |  Type   | Modifiers
--------------+---------+-----------pair_id      | integer | not nullgroup_id     | integer |zero_v       | real    |
default0zero_v_sigma | real    | default 0zero_i       | real    | default 0zero_i_sigma | real    | default 0
 
Indexes:   "zero_pair_pkey" PRIMARY KEY, btree (pair_id)   "zero_pair_group_id" btree (group_id)
Foreign-key constraints:   "zero_pair_group_id_fkey" FOREIGN KEY (group_id) REFERENCES
color_groups(group_id) ON DELETE CASCADE   "zero_pair_pair_id_fkey" FOREIGN KEY (pair_id) REFERENCES pairs(pair_id) ON
DELETE CASCADE

tassiv=# \d zero_pair_pkey
Index "public.zero_pair_pkey"Column  |  Type
---------+---------pair_id | integer
primary key, btree, for table "public.zero_pair"

Cheers,
Rob

-- 19:49:33 up 35 days, 12:24,  8 users,  load average: 2.93, 2.51, 2.30
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
When grilled further on (Sun, 6 Nov 2005 20:00:38 -0700),
Robert Creager <Robert_Creager@logicalchaos.org> confessed:

Didn't set the core big enough (1Mb).  It's now at 50Mb.

I am using PGSphere, which should be the only gist indexes in use.

gdb /usr/local/pgsql810/bin/postgres core.28053
...
warning: core file may not match specified executable file.
Core was generated by `postgres: robert tassiv [local] CREATE INDEX              '.
Program terminated with signal 11, Segmentation fault.

warning: current_sos: Can't read pathname for load map: Input/output error

Cannot access memory at address 0x400d8000
#0  0x08082057 in gistUserPicksplit (r=Cannot access memory at address
0xbfffcb28
) at gistutil.c:833
833             if (v->spl_right[v->spl_nright - 1] == InvalidOffsetNumber)
(gdb) bt
#0  0x08082057 in gistUserPicksplit (r=Cannot access memory at address
0xbfffcb28
) at gistutil.c:833
Cannot access memory at address 0xbfffcb3c


Unfortunately, I have to run shortly.  If someone want's a 1Mb core, I have one.I'll have (presumably) more info this
eveningwith the bigger core, 

Cheers,
Rob
-- 07:56:01 up 36 days, 30 min,  7 users,  load average: 2.25, 2.31, 2.23
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
When grilled further on (Mon, 7 Nov 2005 08:07:14 -0700),
Robert Creager <Robert_Creager@logicalchaos.org> confessed:

I'm currently attached to the dead (dying) process.  spl_nright seems pretty large...

(gdb) print v->spl_nright
$3 = 138311580

Program received signal SIGSEGV, Segmentation fault.
0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227,
giststate=0xbfffd120)at gistutil.c:833 
833             if (v->spl_right[v->spl_nright - 1] == InvalidOffsetNumber)
(gdb) bt
#0  0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227,
giststate=0xbfffd120)at gistutil.c:833 
#1  0x0807f249 in gistSplit (r=0x48f3f1e4, buffer=8917, itup=0x83e3454, len=0xbfffcea4, dist=0xbfffcea0,
giststate=0xbfffd120)at gist.c:1083 
#2  0x0807c8ab in gistplacetopage (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:331
#3  0x0807e2cd in gistmakedeal (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:878
#4  0x0807c7e1 in gistdoinsert (r=0x48f3f1e4, itup=0x83e339c, giststate=0xbfffd120) at gist.c:299
#5  0x0807c5a6 in gistbuildCallback (index=0x48f3f1e4, htup=0x83c3de8, values=0xbfffd020, isnull=0xbfffd000 "",
tupleIsAlive=1'\001', state=0xbfffd120)   at gist.c:207 
#6  0x080cbb14 in IndexBuildHeapScan (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c,
callback=0x807c4f0<gistbuildCallback>,    callback_state=0xbfffd120) at index.c:1573 
#7  0x0807c3b5 in gistbuild (fcinfo=0xbfffe670) at gist.c:145
#8  0x08234dfd in OidFunctionCall3 (functionId=782, arg1=1223942604, arg2=1223946724, arg3=138165100) at fmgr.c:1460
#9  0x080cb8d3 in index_build (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c) at index.c:1353
#10 0x080cacdc in index_create (heapRelationId=128249, indexRelationName=0x83a0b94 "catalog_ra_decl_index",
indexRelationId=128443,indexInfo=0x83c3b6c,    accessMethodObjectId=783, tableSpaceId=0, classObjectId=0x83c9cfc,
primary=0'\0', isconstraint=0 '\0', allow_system_table_mods=0 '\0',    skip_build=0 '\0') at index.c:757 
#11 0x08110671 in DefineIndex (heapRelation=0x30f, indexRelationName=0x83a0b94 "catalog_ra_decl_index",
indexRelationId=0,   accessMethodName=0x83a0c00 "gist", tableSpaceName=0x0, attributeList=0x83a0c58, predicate=0x0,
rangetable=0x0,unique=0 '\0', primary=0 '\0',    isconstraint=0 '\0', is_alter_table=0 '\0', check_rights=1 '\001',
skip_build=0'\0', quiet=0 '\0') at indexcmds.c:383 
#12 0x081c409b in ProcessUtility (parsetree=0x83a0c74, params=0x0, dest=0x83a0cf0, completionTag=0xbfffec00 "") at
utility.c:748
#13 0x081c2b84 in PortalRunUtility (portal=0x83aad14, query=0x83a0a7c, dest=0x83a0cf0, completionTag=0xbfffec00 "") at
pquery.c:987
#14 0x081c2e0b in PortalRunMulti (portal=0x83aad14, dest=0x83a0cf0, altdest=0x83a0cf0, completionTag=0xbfffec00 "") at
pquery.c:1054
#15 0x081c26a6 in PortalRun (portal=0x83aad14, count=2147483647, dest=0x83a0cf0, altdest=0x83a0cf0,
completionTag=0xbfffec00"") at pquery.c:665 
#16 0x081be579 in exec_simple_query (query_string=0x83a0864 "CREATE INDEX catalog_ra_decl_index ON catalog USING gist
(loc);")at postgres.c:1014 
#17 0x081c1377 in PostgresMain (argc=4, argv=0x8345f3c, username=0x8345f14 "robert") at postgres.c:3168
#18 0x08198692 in BackendRun (port=0x835ea08) at postmaster.c:2854
#19 0x081980a5 in BackendStartup (port=0x835ea08) at postmaster.c:2498
#20 0x081963fe in ServerLoop () at postmaster.c:1231
#21 0x081957aa in PostmasterMain (argc=3, argv=0x8344788) at postmaster.c:943
#22 0x08158b49 in main (argc=3, argv=0x8344788) at main.c:256

-- 22:06:46 up 36 days, 14:41,  7 users,  load average: 2.22, 2.55, 3.26
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
When grilled further on (Mon, 7 Nov 2005 22:25:17 -0700),
Robert Creager <Robert_Creager@LogicalChaos.org> confessed:

Sorry, I'll just trickle out the information.

tassiv=# \d catalog_ra_decl_index
Index "public.catalog_ra_decl_index"Column |   Type
--------+-----------loc    | spherekey
gist, for table "public.catalog"

v->spl_right is address 0xbp - uninitialized?

(gdb) print *v
$2 = {spl_left = 0x83e1308, spl_nleft = 8, spl_ldatum = 138286880, spl_lattr = {3930298096, 3929693296, 1075344513,
3928483696,3927878896, 50331648, 1076099872, 1076099872, 1076100640, 1076099944, 1076099872, 0, 0, 0, 1, 1076099872,
46088,24, 138269392, 108, 8205, 1076099872, 1076097560, 1077018624, 1223005861, 2281761506, 1072462523, 8192,
1076979200,1348122942, 3218058668, 3588489616}, spl_lattrsize = {1072628007, 1223130252, 0, -1073754968, 1223107331,
-1073755008,1196715552, 4033364, 1076979200, 8132, 32, 138269400, 58657919, 717016950, 1071875034, 1883413536,
-1077677968,-817345387, 1072225709, 138175768, 138175768, 1223130252, 1223130252, -1073754936, 1223083881, 138269472,
1196715552,138269472, 138269428, -1073754256, -1073754256, -1073754376}, spl_lisnull =
"ÍD#\bàÌÿ¿\000\000\000\000(Íÿ¿\2004;\b×ÿ¿\000\000\000\000\000\000\000", spl_leftvalid = 20 '\024', spl_right = 0xdb,
spl_nright= 138286924, spl_rdatum = 11, spl_rattr = {3463747944, 3883728496, 0, 3882518896, 3881914096, 1, 3221212568,
138097456,138251092, 3878890096, 0, 0, 1222988060, 1222974760, 1222960776, 138097456, 3, 1075321604, 0, 1073825468,
1076097560,3221212576, 3221212540, 1075326465, 3221212576, 909216680, 825503793, 0, 138251202, 1076097560, 136751593,
3221212860},spl_rattrsize = {-1073754484, 1075303286, -1073754720, 136751593, -1073754428, 138251176, 0, -1073754560,
136027536,1196670896, 138269580, 32, 1196670856, 138251176, 138251194, 138251202, 226, 138251008, 0, 0, 0, 7904, 1024,
138269400,138269700, 138269688, 908, -1073754600, 136599995, 138175768, 138269700, 908}, spl_risnull =
"\030e<\b\000¼SG\001\000\000\000XÎÿ¿¤Îÿ¿\001\000\000\000Ñÿ¿\004Ô=\b", spl_rightvalid = 108 'l', spl_idgrp = 0x83dd78c,
spl_ngrp= 0x83dd378, spl_grpflag = 0x4 <Address 0x4 out of bounds>} 

> When grilled further on (Mon, 7 Nov 2005 08:07:14 -0700),
> Robert Creager <Robert_Creager@logicalchaos.org> confessed:
>
> I'm currently attached to the dead (dying) process.  spl_nright seems pretty large...
>
> (gdb) print v->spl_nright
> $3 = 138311580
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227,
giststate=0xbfffd120)at gistutil.c:833 
> 833             if (v->spl_right[v->spl_nright - 1] == InvalidOffsetNumber)
> (gdb) bt
> #0  0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227,
giststate=0xbfffd120)at gistutil.c:833 
> #1  0x0807f249 in gistSplit (r=0x48f3f1e4, buffer=8917, itup=0x83e3454, len=0xbfffcea4, dist=0xbfffcea0,
giststate=0xbfffd120)at gist.c:1083 
> #2  0x0807c8ab in gistplacetopage (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:331
> #3  0x0807e2cd in gistmakedeal (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:878
> #4  0x0807c7e1 in gistdoinsert (r=0x48f3f1e4, itup=0x83e339c, giststate=0xbfffd120) at gist.c:299
> #5  0x0807c5a6 in gistbuildCallback (index=0x48f3f1e4, htup=0x83c3de8, values=0xbfffd020, isnull=0xbfffd000 "",
tupleIsAlive=1'\001', state=0xbfffd120) 
>     at gist.c:207
> #6  0x080cbb14 in IndexBuildHeapScan (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c,
callback=0x807c4f0<gistbuildCallback>,  
>     callback_state=0xbfffd120) at index.c:1573
> #7  0x0807c3b5 in gistbuild (fcinfo=0xbfffe670) at gist.c:145
> #8  0x08234dfd in OidFunctionCall3 (functionId=782, arg1=1223942604, arg2=1223946724, arg3=138165100) at fmgr.c:1460
> #9  0x080cb8d3 in index_build (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c) at
index.c:1353
> #10 0x080cacdc in index_create (heapRelationId=128249, indexRelationName=0x83a0b94 "catalog_ra_decl_index",
indexRelationId=128443,indexInfo=0x83c3b6c,  
>     accessMethodObjectId=783, tableSpaceId=0, classObjectId=0x83c9cfc, primary=0 '\0', isconstraint=0 '\0',
allow_system_table_mods=0'\0',  
>     skip_build=0 '\0') at index.c:757
> #11 0x08110671 in DefineIndex (heapRelation=0x30f, indexRelationName=0x83a0b94 "catalog_ra_decl_index",
indexRelationId=0, 
>     accessMethodName=0x83a0c00 "gist", tableSpaceName=0x0, attributeList=0x83a0c58, predicate=0x0, rangetable=0x0,
unique=0'\0', primary=0 '\0',  
>     isconstraint=0 '\0', is_alter_table=0 '\0', check_rights=1 '\001', skip_build=0 '\0', quiet=0 '\0') at
indexcmds.c:383
> #12 0x081c409b in ProcessUtility (parsetree=0x83a0c74, params=0x0, dest=0x83a0cf0, completionTag=0xbfffec00 "") at
utility.c:748
> #13 0x081c2b84 in PortalRunUtility (portal=0x83aad14, query=0x83a0a7c, dest=0x83a0cf0, completionTag=0xbfffec00 "")
atpquery.c:987 
> #14 0x081c2e0b in PortalRunMulti (portal=0x83aad14, dest=0x83a0cf0, altdest=0x83a0cf0, completionTag=0xbfffec00 "")
atpquery.c:1054 
> #15 0x081c26a6 in PortalRun (portal=0x83aad14, count=2147483647, dest=0x83a0cf0, altdest=0x83a0cf0,
completionTag=0xbfffec00"") at pquery.c:665 
> #16 0x081be579 in exec_simple_query (query_string=0x83a0864 "CREATE INDEX catalog_ra_decl_index ON catalog USING gist
(loc);")at postgres.c:1014 
> #17 0x081c1377 in PostgresMain (argc=4, argv=0x8345f3c, username=0x8345f14 "robert") at postgres.c:3168
> #18 0x08198692 in BackendRun (port=0x835ea08) at postmaster.c:2854
> #19 0x081980a5 in BackendStartup (port=0x835ea08) at postmaster.c:2498
> #20 0x081963fe in ServerLoop () at postmaster.c:1231
> #21 0x081957aa in PostmasterMain (argc=3, argv=0x8344788) at postmaster.c:943
> #22 0x08158b49 in main (argc=3, argv=0x8344788) at main.c:256
>
> --
>  22:06:46 up 36 days, 14:41,  7 users,  load average: 2.22, 2.55, 3.26
> Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


-- 23:44:24 up 36 days, 16:19,  7 users,  load average: 2.35, 2.43, 3.13
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Re: SIGSEGV taken on 8.1 during dump/reload

From
Teodor Sigaev
Date:
Hmm, did you recompile pg_sphere module for 8.1?

Robert Creager wrote:
> When grilled further on (Mon, 7 Nov 2005 22:25:17 -0700),
> Robert Creager <Robert_Creager@LogicalChaos.org> confessed:
> 
> Sorry, I'll just trickle out the information.
> 
> tassiv=# \d catalog_ra_decl_index 
> Index "public.catalog_ra_decl_index"
>  Column |   Type    
> --------+-----------
>  loc    | spherekey
> gist, for table "public.catalog"
> 
> v->spl_right is address 0xbp - uninitialized?
> 
> (gdb) print *v
> $2 = {spl_left = 0x83e1308, spl_nleft = 8, spl_ldatum = 138286880, spl_lattr = {3930298096, 3929693296, 1075344513,
3928483696,3927878896, 50331648, 1076099872, 1076099872, 1076100640, 1076099944, 1076099872, 0, 0, 0, 1, 1076099872,
46088,24, 138269392, 108, 8205, 1076099872, 1076097560, 1077018624, 1223005861, 2281761506, 1072462523, 8192,
1076979200,1348122942, 3218058668, 3588489616}, spl_lattrsize = {1072628007, 1223130252, 0, -1073754968, 1223107331,
-1073755008,1196715552, 4033364, 1076979200, 8132, 32, 138269400, 58657919, 717016950, 1071875034, 1883413536,
-1077677968,-817345387, 1072225709, 138175768, 138175768, 1223130252, 1223130252, -1073754936, 1223083881, 138269472,
1196715552,138269472, 138269428, -1073754256, -1073754256, -1073754376}, spl_lisnull =
"ÍD#\bàÌÿ¿\000\000\000\000(Íÿ¿\2004;\b×ÿ¿\000\000\000\000\000\000\000", spl_leftvalid = 20 '\024', spl_right = 0xdb,
spl_nright= 138286924, spl_rdatum = 11, spl_rattr = {3463747944, 3883728496,0, 3882518896, 3881914096, 1, 3221212568,
138097456,138251092, 3878890096, 0, 0, 1222988060, 1222974760, 1222960776, 138097456, 3, 1075321604, 0, 1073825468,
1076097560,3221212576, 3221212540, 1075326465, 3221212576, 909216680, 825503793, 0, 138251202, 1076097560, 136751593,
3221212860},spl_rattrsize = {-1073754484, 1075303286, -1073754720, 136751593, -1073754428, 138251176, 0, -1073754560,
136027536,1196670896, 138269580, 32, 1196670856, 138251176, 138251194, 138251202, 226, 138251008, 0, 0, 0, 7904, 1024,
138269400,138269700, 138269688, 908, -1073754600, 136599995, 138175768, 138269700, 908}, spl_risnull =
"\030e<\b\000¼SG\001\000\000\000XÎÿ¿¤Îÿ¿\001\000\000\000Ñÿ¿\004Ô=\b", spl_rightvalid = 108 'l', spl_idgrp = 0x83dd78c,
spl_ngrp= 0x83dd378, spl_grpflag = 0x4 <Address 0x4 out of bounds>}
 
> 
> 
>>When grilled further on (Mon, 7 Nov 2005 08:07:14 -0700),
>>Robert Creager <Robert_Creager@logicalchaos.org> confessed:
>>
>>I'm currently attached to the dead (dying) process.  spl_nright seems pretty large...
>>
>>(gdb) print v->spl_nright
>>$3 = 138311580
>>
>>Program received signal SIGSEGV, Segmentation fault.
>>0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227,
giststate=0xbfffd120)at gistutil.c:833
 
>>833             if (v->spl_right[v->spl_nright - 1] == InvalidOffsetNumber)
>>(gdb) bt
>>#0  0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227,
giststate=0xbfffd120)at gistutil.c:833
 
>>#1  0x0807f249 in gistSplit (r=0x48f3f1e4, buffer=8917, itup=0x83e3454, len=0xbfffcea4, dist=0xbfffcea0,
giststate=0xbfffd120)at gist.c:1083
 
>>#2  0x0807c8ab in gistplacetopage (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:331
>>#3  0x0807e2cd in gistmakedeal (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:878
>>#4  0x0807c7e1 in gistdoinsert (r=0x48f3f1e4, itup=0x83e339c, giststate=0xbfffd120) at gist.c:299
>>#5  0x0807c5a6 in gistbuildCallback (index=0x48f3f1e4, htup=0x83c3de8, values=0xbfffd020, isnull=0xbfffd000 "",
tupleIsAlive=1'\001', state=0xbfffd120)
 
>>    at gist.c:207
>>#6  0x080cbb14 in IndexBuildHeapScan (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c,
callback=0x807c4f0<gistbuildCallback>, 
 
>>    callback_state=0xbfffd120) at index.c:1573
>>#7  0x0807c3b5 in gistbuild (fcinfo=0xbfffe670) at gist.c:145
>>#8  0x08234dfd in OidFunctionCall3 (functionId=782, arg1=1223942604, arg2=1223946724, arg3=138165100) at fmgr.c:1460
>>#9  0x080cb8d3 in index_build (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c) at
index.c:1353
>>#10 0x080cacdc in index_create (heapRelationId=128249, indexRelationName=0x83a0b94 "catalog_ra_decl_index",
indexRelationId=128443,indexInfo=0x83c3b6c, 
 
>>    accessMethodObjectId=783, tableSpaceId=0, classObjectId=0x83c9cfc, primary=0 '\0', isconstraint=0 '\0',
allow_system_table_mods=0'\0', 
 
>>    skip_build=0 '\0') at index.c:757
>>#11 0x08110671 in DefineIndex (heapRelation=0x30f, indexRelationName=0x83a0b94 "catalog_ra_decl_index",
indexRelationId=0,
 
>>    accessMethodName=0x83a0c00 "gist", tableSpaceName=0x0, attributeList=0x83a0c58, predicate=0x0, rangetable=0x0,
unique=0'\0', primary=0 '\0', 
 
>>    isconstraint=0 '\0', is_alter_table=0 '\0', check_rights=1 '\001', skip_build=0 '\0', quiet=0 '\0') at
indexcmds.c:383
>>#12 0x081c409b in ProcessUtility (parsetree=0x83a0c74, params=0x0, dest=0x83a0cf0, completionTag=0xbfffec00 "") at
utility.c:748
>>#13 0x081c2b84 in PortalRunUtility (portal=0x83aad14, query=0x83a0a7c, dest=0x83a0cf0, completionTag=0xbfffec00 "")
atpquery.c:987
 
>>#14 0x081c2e0b in PortalRunMulti (portal=0x83aad14, dest=0x83a0cf0, altdest=0x83a0cf0, completionTag=0xbfffec00 "")
atpquery.c:1054
 
>>#15 0x081c26a6 in PortalRun (portal=0x83aad14, count=2147483647, dest=0x83a0cf0, altdest=0x83a0cf0,
completionTag=0xbfffec00"") at pquery.c:665
 
>>#16 0x081be579 in exec_simple_query (query_string=0x83a0864 "CREATE INDEX catalog_ra_decl_index ON catalog USING gist
(loc);")at postgres.c:1014
 
>>#17 0x081c1377 in PostgresMain (argc=4, argv=0x8345f3c, username=0x8345f14 "robert") at postgres.c:3168
>>#18 0x08198692 in BackendRun (port=0x835ea08) at postmaster.c:2854
>>#19 0x081980a5 in BackendStartup (port=0x835ea08) at postmaster.c:2498
>>#20 0x081963fe in ServerLoop () at postmaster.c:1231
>>#21 0x081957aa in PostmasterMain (argc=3, argv=0x8344788) at postmaster.c:943
>>#22 0x08158b49 in main (argc=3, argv=0x8344788) at main.c:256
>>
>>-- 
>> 22:06:46 up 36 days, 14:41,  7 users,  load average: 2.22, 2.55, 3.26
>>Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
> 
> 
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
When grilled further on (Tue, 08 Nov 2005 15:13:32 +0300),
Teodor Sigaev <teodor@sigaev.ru> confessed:

> Hmm, did you recompile pg_sphere module for 8.1?

Yes I did.  Just did it again to make sure.  Is there any way I can do a <make installcheck> without a
reconfigure/make/installof postgresql?  The db is running on port 5433, not the default of 5432. 

If this is a PGSphere problem, should this conversation be continued there?

Thanks,
Rob

-- 07:01:55 up 36 days, 23:36,  7 users,  load average: 3.80, 3.47, 3.17
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Re: [Pgsphere-dev] Re: SIGSEGV taken on 8.1 during dump/reload

From
Teodor Sigaev
Date:
Robert Creager wrote:
> Yes I did.  Just did it again to make sure.  Is there any way I can do a <make installcheck> without a
reconfigure/make/installof postgresql?  The db is running on port 5433, not the default of 5432.
 

export PGPORT=5433

> If this is a PGSphere problem, should this conversation be continued there?

PGSphere or not it's unknown for now.  Can you prepare minimalist test suite 
reproducing problem?





-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> v->spl_right is address 0xbp - uninitialized?

The whole struct looks pretty uninitialized, which immediately makes me
wonder whether gdb has picked up a wrong value for "v".  Try going down
to a lower stack frame and seeing if you can access the struct from
there.
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
When grilled further on (Tue, 08 Nov 2005 09:20:13 -0500),
Tom Lane <tgl@sss.pgh.pa.us> confessed:

> Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> > v->spl_right is address 0xbp - uninitialized?
>
> The whole struct looks pretty uninitialized, which immediately makes me
> wonder whether gdb has picked up a wrong value for "v".  Try going down
> to a lower stack frame and seeing if you can access the struct from
> there.
>

Well, it's defined the next level up on the stack, and it's still garbage.  The way I read gist.c and how it's calling
gistUserPicksplitat line 1083, it's not initialized prior that else.  So, FunctionCall2 in gistutil.c is supposed to
fillit out?  Presumably a function supplied by PGSphere in this case? 

(gdb) up
#1  0x0807f249 in gistSplit (r=0x48df1e6c, buffer=93, itup=0x83b8e94, len=0xbfffcea4, dist=0xbfffcea0,
giststate=0xbfffd120)at gist.c:1083 
(gdb) print v
$1 = {spl_left = 0x83bcd98, spl_nleft = 8, spl_ldatum = 138138032, spl_lattr = {138089040, 1, 1075344513, 3221212168,
134843567,0, 1076099872, 1076099872, 1076100896, 1076099944, 1076099872, 138072532, 136595410, 138072532, 127, 64,
138072596,137900116, 138120544, 108, 8205, 1076099872, 1076097560, 1077067776, 1222874789, 2281761506, 1072462523,
8192,1076979200, 1348122942, 3218058668, 3588489616}, spl_lattrsize = {1072628007, 1222999180, 0, -1073754968,
1222976259,-1073755008, 1079103008, 3871912, 1076979200, 8132, 32, 138120552, 58657919, 717016950, 1071875034,
1883413536,-1077677968, -817345387, 1072225709, 138043264, 138043264, 1222999180, 1222999180, -1073754936, 1222952809,
138120624,1079103008, 138120624, 138120580, -1073754256, -1073754256, -1073754376}, spl_lisnull =
"ÍD#\bàÌÿ¿\000\000\000\000(Íÿ¿0K;\b×ÿ¿\000\000\000\000\000\000\000", spl_leftvalid = -92 '¤', spl_right = 0xdb,
spl_nright= 138138076, spl_rdatum = 11, spl_rattr = {3463919764, 0, 0, 0, 0, 1, 3221212568, 138103264, 138089640,
434176,0, 0, 1222856988, 1222843688, 1222829704, 138103264, 3, 1075321604, 0, 1073825468, 1076097560, 3221212576,
3221212540,1075326465, 3221212576, 909186620, 825503793, 0, 138090070, 1076097560, 136751593, 3221212860},
spl_rattrsize= {-1073754484, 1075303286, -1073754720, 136751593, -1073754428, 138090044, 0, -1073754560, 136027536,
1079058352,138120732, 32, 1079058312, 138090044, 138090062, 138090070, 226, 138089984, 0, 0, 0, 7904, 1024, 138120552,
138120852,138120840, 908, -1073754600, 136599995, 138043264, 138120852, 908}, spl_risnull =
"\200_:\b\000\034Q@\001\000\000\000XÎÿ¿¤Îÿ¿\001\000\000\000Ñÿ¿\224\216;\b", spl_rightvalid = 108 'l', spl_idgrp =
0x83b921c,spl_ngrp = 0x83b8e08, spl_grpflag = 0x4 <Address 0x4 out of bounds>} 
(gdb)

-- 07:38:26 up 37 days, 13 min,  6 users,  load average: 3.28, 3.42, 3.43
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> Is there any way I can do a <make installcheck> without a
>  reconfigure/make/install of postgresql?  The db is running on port
>  5433, not the default of 5432.

Sure, just "export PGPORT=5433" before "make installcheck".  Doubt it
will prove much, though, because the regression tests contain only
minimal exercising of GIST.

Does PGSphere itself have any regression tests?

(Actually, running the contrib regression tests might be more relevant
than the main PG tests, since several contrib modules with GIST
opclasses have regression tests.)
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Teodor Sigaev
Date:

Tom Lane wrote:
> Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> 
>>v->spl_right is address 0xbp - uninitialized?
> 
> 
> The whole struct looks pretty uninitialized, which immediately makes me
> wonder whether gdb has picked up a wrong value for "v".  Try going down
> to a lower stack frame and seeing if you can access the struct from
> there.
Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that old 
.so is used.  spl_(right|left)valid fields was added to GIST_SPLITVEC.

Looking into

spl_leftvalid = 20 '\024', spl_right = 0xdb, spl_nright = 138286924, spl_rdatum 
= 11,


and GIST_SPLITVEC
        bool            spl_lisnull[INDEX_MAX_KEYS];        bool            spl_leftvalid;
        OffsetNumber *spl_right;        /* array of entries that go right */        int                     spl_nright;
           /* size of the array */        Datum           spl_rdatum;             /* Union of keys in spl_right */
 



It's very like that spl_right contains  correct spl_nright value (0xdb = 219) 
and spl_nright contains correct spl_rdatum (pointer 138286924 = 0x83e174c)




-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [Pgsphere-dev] Re: SIGSEGV taken on 8.1 during

From
Robert Creager
Date:
When grilled further on (Tue, 08 Nov 2005 10:06:38 -0500),
Tom Lane <tgl@sss.pgh.pa.us> confessed:

> Does PGSphere itself have any regression tests?
>
> (Actually, running the contrib regression tests might be more relevant
> than the main PG tests, since several contrib modules with GIST
> opclasses have regression tests.)
>

That's what I was trying to do ;-)  <make installcheck> passes, as does <make crushtest> (within pg_sphere).

I'll work on trying to get a small test case tonight.  Otherwise, we can try SSH to my machine or a DVD.

Cheers,
Rob

-- 08:17:03 up 37 days, 51 min,  6 users,  load average: 3.70, 3.56, 3.41
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that old
> .so is used.  spl_(right|left)valid fields was added to GIST_SPLITVEC.

Does look a bit suspicious ... Robert, are you *sure* you've got the
right version of pgsphere linked in?  Did you compile it against the
right set of Postgres header files?
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
On Tue, 08 Nov 2005 11:12:04 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Teodor Sigaev <teodor@sigaev.ru> writes:
> > Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that
> > old .so is used.  spl_(right|left)valid fields was added to GIST_SPLITVEC.
> 
> Does look a bit suspicious ... Robert, are you *sure* you've got the
> right version of pgsphere linked in?  Did you compile it against the
> right set of Postgres header files?
> 

I copied pg_sphere into the contrib directory in 8.1.0, which is where it was
built.  Last night, I executed a <make clean> from contrib/pg_sphere, re-built
<make> and re-installed.  I checked the pg_sphere Makefile, and it references
local, not absolute paths.

So, I'm as sure as I can be right now.  How can I check the .so files installed
by the build?  Do they reference an absolute path for their dependent .so files
(postgres), or will they use ld.so.conf, which might then explain the problem. 
My ld.so.conf still points to the 8.0.2 version, as I've not switched yet to
8.1.0.

In any case, why would the <make installcheck> work in the pg_sphere directory? 
That would have to use the installed libraries.  I don't have the sources with
me, but I'd think an index would of been created on a spoint column, but maybe
not?

Cheers,
Rob


Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
When grilled further on (Tue, 08 Nov 2005 11:12:04 -0500),
Tom Lane <tgl@sss.pgh.pa.us> confessed:

> Teodor Sigaev <teodor@sigaev.ru> writes:
> > Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that old
> > .so is used.  spl_(right|left)valid fields was added to GIST_SPLITVEC.
>
> Does look a bit suspicious ... Robert, are you *sure* you've got the
> right version of pgsphere linked in?  Did you compile it against the
> right set of Postgres header files?
>

Strings on pg_sphere.so does contain /usr/local/pgsql810/lib.

I've attached a small dump file that when I create an index on the table, it fails.  It works on 225 entries, but
failedon 250.  Don't know if this is data dependent or size.  Is that a page boundary?  It seems to me that unless the
right/leftstuff doesn't come into play for all indexes, that stuff is built correctly. 

Dump command:
/usr/local/pgsql810/bin/pg_dump -F c -p 5433 -d tassiv -t test_data -f index_problem.dump

Created the table and index by:
tassiv=# SELECT loc into test_data from catalog limit 250;
tassiv=# create index test_data_index on test_data using gist( loc );
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

tassiv=# \d test_data
  Table "public.test_data"
 Column |  Type  | Modifiers
--------+--------+-----------
 loc    | spoint |

Cheers,
Rob

--
 19:51:58 up 37 days, 12:26,  6 users,  load average: 2.15, 2.39, 2.41
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: SIGSEGV taken on 8.1 during dump/reload

From
Teodor Sigaev
Date:
works fine....
contrib_regression=# select count(*) from test_data ; count
-------   250
(1 row)

contrib_regression=# create index test_data_index on test_data using gist( loc );
CREATE INDEX


> I've attached a small dump file that when I create an index on the table, it fails.  It works on 225 entries, but
failedon 250.  Don't know if this is data dependent or size.  Is that a page boundary?  It seems to me that unless the
right/leftstuff doesn't come into play for all indexes, that stuff is built correctly.
 
> 
> Dump command:
> /usr/local/pgsql810/bin/pg_dump -F c -p 5433 -d tassiv -t test_data -f index_problem.dump

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: SIGSEGV taken on 8.1 during dump/reload

From
Teodor Sigaev
Date:
> So, I'm as sure as I can be right now.  How can I check the .so files installed
> by the build?  Do they reference an absolute path for their dependent .so files
> (postgres), or will they use ld.so.conf, which might then explain the problem. 
> My ld.so.conf still points to the 8.0.2 version, as I've not switched yet to
> 8.1.0.

The simplest way is just remove pg_sphere.so in 8.1 installaion 
(/usr/local/pgsql810/lib/pg_sphere.so) and try, for example, to create gist 
index on spoint. Response should be:
contrib_regression=# create index test_data_index on test_data using gist( loc );
ERROR:  could not access file "/usr/local/pgsql/lib/pg_sphere": No such file or 
directory


If not - 8.1 use 8.0 .so....





-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
When grilled further on (Wed, 09 Nov 2005 10:54:12 +0300),
Teodor Sigaev <teodor@sigaev.ru> confessed:

> > So, I'm as sure as I can be right now.  How can I check the .so files installed
> > by the build?  Do they reference an absolute path for their dependent .so files
> > (postgres), or will they use ld.so.conf, which might then explain the problem.
> > My ld.so.conf still points to the 8.0.2 version, as I've not switched yet to
> > 8.1.0.
>
> The simplest way is just remove pg_sphere.so in 8.1 installaion
> (/usr/local/pgsql810/lib/pg_sphere.so) and try, for example, to create gist
> index on spoint. Response should be:
> contrib_regression=# create index test_data_index on test_data using gist( loc );
> ERROR:  could not access file "/usr/local/pgsql/lib/pg_sphere": No such file or
> directory
>
>
> If not - 8.1 use 8.0 .so....

Yup.  You're right.  So, what is happening here?  It will be kind of hard to do a live dump/restore on 1 machine if I
cannothave two versions running.  Is something not set up correctly on my machine, or in the build (pg_sphere or
postgresql)that is preventing two copies from...  Sigh.  Never mind.  The dump is spitting out the absolute path for
theshared library (like it should): 

CREATE FUNCTION sbox_in(cstring) RETURNS sbox   AS '/usr/local/pgsql802/lib/pg_sphere', 'spherebox_in'   LANGUAGE c
IMMUTABLESTRICT; 

Now if I can just figure out how to get this egg off my face...

Now I remember the problem I always have, and I have a new trick in my bag:

/usr/local/pgsql802/bin/pg_dumpall -c -v | sed 's/pgsql802/pgsql810/' | /usr/local/pgsql810/bin/psql -p 5433 -d
template1

How do others handle dumping from one version to a new one?  Is there a less error prone way of doing this?  As long as
Idon't have the string pgsql802 anywhere else... 

Sorry for the bandwidth,
Rob

-- 07:14:34 up 37 days, 23:49,  6 users,  load average: 2.20, 2.17, 2.16
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Re: SIGSEGV taken on 8.1 during dump/reload

From
Andrew Dunstan
Date:

Robert Creager wrote:

>Yup.  You're right.  So, what is happening here?  It will be kind of hard to do a live dump/restore on 1 machine if I
cannothave two versions running.  Is something not set up correctly on my machine, or in the build (pg_sphere or
postgresql)that is preventing two copies from...  Sigh.  Never mind.  The dump is spitting out the absolute path for
theshared library (like it should):
 
>
>CREATE FUNCTION sbox_in(cstring) RETURNS sbox
>    AS '/usr/local/pgsql802/lib/pg_sphere', 'spherebox_in'
>    LANGUAGE c IMMUTABLE STRICT;
>
>Now if I can just figure out how to get this egg off my face...
>
>Now I remember the problem I always have, and I have a new trick in my bag:
>
>/usr/local/pgsql802/bin/pg_dumpall -c -v | sed 's/pgsql802/pgsql810/' | /usr/local/pgsql810/bin/psql -p 5433 -d
template1
>
>How do others handle dumping from one version to a new one?  Is there a less error prone way of doing this?  As long
asI don't have the string pgsql802 anywhere else...
 
>
>
>  
>

Why use an absolute path? Why not just give the name of the .so and let 
postgres find it in $libdir (i.e. sed -e 's,/usr/local/pgsql.*/lib/,,' 
on your dump) ?

cheers

andrew


Re: SIGSEGV taken on 8.1 during dump/reload

From
Gregory Maxwell
Date:
On 11/8/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Teodor Sigaev <teodor@sigaev.ru> writes:
> > Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that old
> > .so is used.  spl_(right|left)valid fields was added to GIST_SPLITVEC.
>
> Does look a bit suspicious ... Robert, are you *sure* you've got the
> right version of pgsphere linked in?  Did you compile it against the
> right set of Postgres header files?

So it turned out that he didn't... Is this a sign that we need to
include a versioning symbol in SOs so we can give a nice clear error
message "module foo compiled for PostgreSQL 8.0.2 this is PostgreSQL
8.1." Is there ever a case where we want people using modules compiled
against an old version, are there cases where users can't recompile
their modules but the old ones would work?


Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> CREATE FUNCTION sbox_in(cstring) RETURNS sbox
>     AS '/usr/local/pgsql802/lib/pg_sphere', 'spherebox_in'
>     LANGUAGE c IMMUTABLE STRICT;
> Now if I can just figure out how to get this egg off my face...

You'd be a lot better off to define all your functions as relative to
$libdir, ie,AS '$libdir/pg_sphere', 'spherebox_in'
(note the lack of any .so extension, too)

If pg_sphere is supplying a setup procedure that gets this wrong,
yell at them.
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
Gregory Maxwell <gmaxwell@gmail.com> writes:
> On 11/8/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Does look a bit suspicious ... Robert, are you *sure* you've got the
>> right version of pgsphere linked in?

> So it turned out that he didn't... Is this a sign that we need to
> include a versioning symbol in SOs so we can give a nice clear error
> message "module foo compiled for PostgreSQL 8.0.2 this is PostgreSQL
> 8.1." Is there ever a case where we want people using modules compiled
> against an old version, are there cases where users can't recompile
> their modules but the old ones would work?

There are cases where it would work, and other cases where it wouldn't.
Given the pain involved in debugging when it's wrong, maybe we should
just endeavor to forbid loading of all wrong-version modules.

I'm not sure that there's any real easy way to detect this though.
For V1-style functions we could embed a version number in the
per-function info structs, but that doesn't help for old-style
functions.
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Martijn van Oosterhout
Date:
On Wed, Nov 09, 2005 at 10:57:25AM -0500, Tom Lane wrote:
> There are cases where it would work, and other cases where it wouldn't.
> Given the pain involved in debugging when it's wrong, maybe we should
> just endeavor to forbid loading of all wrong-version modules.
>
> I'm not sure that there's any real easy way to detect this though.
> For V1-style functions we could embed a version number in the
> per-function info structs, but that doesn't help for old-style
> functions.

Given the lack of information you get for old style I'm not sure we
should care. do a lot of people use it still?

I think that if we're going to expand the Pg_finfo_record struct, I
think it could also include (optionally):

- A length field (for future upwardly compatable changes).
- Allow the specification of flags like strict and volatile so the
coder doesn't have to worry about getting the SQL install script right.
- Indication of number of parameters/datatypes
- A description for pg_proc

Ofcourse, then you're getting into the realm of [1]. Still, at least
flags like STRICT would be useful because then the source code can
assert that it can/cannot accept NULLs, so users can't screw it up.

[1] http://archives.postgresql.org/pgsql-hackers/2005-09/msg00476.php
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
On Wed, 09 Nov 2005 09:56:51 -0500
Andrew Dunstan <andrew@dunslane.net> wrote:

> 
> Why use an absolute path? Why not just give the name of the .so and let 
> postgres find it in $libdir (i.e. sed -e 's,/usr/local/pgsql.*/lib/,,' 
> on your dump) ?

'cause I didn't know I could?  I'll go and fix the Makefile in pg_sphere on
GBORG.  I might of even created this problem myself...

Cheers,
Rob



Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
On Wed, 09 Nov 2005 10:42:00 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> 
> If pg_sphere is supplying a setup procedure that gets this wrong,
> yell at them.

I'll just go fix it, now that I know what the right way is ;-)

Thanks,
Rob


Re: SIGSEGV taken on 8.1 during dump/reload

From
Teodor Sigaev
Date:
I fixed path in pg_sphere (and done some more clean up).

BTW, I usially install contrib modules before restoring database (of course, it 
need to dump db without content of modules)...


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: SIGSEGV taken on 8.1 during dump/reload

From
Robert Creager
Date:
I've also modified the Makefile.  I removed the special .sql.in : .sql implicit
rule and re-organized the Makefile.  I didn't commit as it was after 12:00pm
when I finished...

I'll send you what I did when I return home.  If you just replaced the $libdir
with $$libdir, then a merge will be easy.

Cheers,
Rob

On Thu, 10 Nov 2005 14:43:30 +0300
Teodor Sigaev <teodor@sigaev.ru> wrote:

> I fixed path in pg_sphere (and done some more clean up).
> 
> BTW, I usially install contrib modules before restoring database (of course,
> it  need to dump db without content of modules)...
> 
> 
> -- 
> Teodor Sigaev                                   E-mail: teodor@sigaev.ru
>                                                     WWW: http://www.sigaev.ru/
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 


Re: SIGSEGV taken on 8.1 during dump/reload

From
Peter Eisentraut
Date:
Gregory Maxwell wrote:
> So it turned out that he didn't... Is this a sign that we need to
> include a versioning symbol in SOs so we can give a nice clear error
> message "module foo compiled for PostgreSQL 8.0.2 this is PostgreSQL
> 8.1."

I think this would rarely work in practice.  For example, during the 
elog->ereport transition, any module compiled against the wrong server 
would immediately get an "unresolved symbol: elog/ereport" before you 
can run your nice version check.  I had thought about this issue back 
then because it was an extrememly common occurrence; then only thing I 
could think of is that you trick the dynamic loader to first reference 
a symbol with an obvious name like 
"if_you_see_this_in_an_error_message_you_have_a_version_mismatch".  
However, this would likely be platform dependent and maybe confuse 
users even more.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: SIGSEGV taken on 8.1 during dump/reload

From
Martijn van Oosterhout
Date:
On Sat, Nov 12, 2005 at 12:28:48PM +0100, Peter Eisentraut wrote:
> I think this would rarely work in practice.  For example, during the
> elog->ereport transition, any module compiled against the wrong server
> would immediately get an "unresolved symbol: elog/ereport" before you
> can run your nice version check.

Actually, that doesn't worry me. What worries me is that people who
don't use ereport won't get any error messages at all yet have
completely different expectations at to the structure of various
internal structures.

So the idea is to force failure when it would otherwise succeed, not
just for the pretty error messages but for stability of the system. I
would be in favour if storing the CATALOG_VERSION in the pg_finfo
struct and rejecting anything that doesn't match.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> So the idea is to force failure when it would otherwise succeed, not
> just for the pretty error messages but for stability of the system.

Exactly.  Peter's right that we'd not always get a "nice" error message
--- but it's not hard to figure out "unresolved symbol" failures.
As we just were reminded, it can be really hard to figure out minor
incompatibilities with wrong-version libraries, and the real point of
the proposal is to save us from going through *that* again.

> I would be in favour if storing the CATALOG_VERSION in the pg_finfo
> struct and rejecting anything that doesn't match.

Not sure that CATALOG_VERSION is an amazingly useful thing to use.
I think the major version number (eg "8.1") would be sufficient,
and it'd certainly give error messages that meant more to the casual
user.

The problem with CATALOG_VERSION is that we bump it basically for
changes in the on-disk contents of a freshly initdb'd database, which
does not have all that much to do with the ABI seen by a shared library.
To have something useful that is finer-grain than major version number,
I think we'd need to invent a separate version number that could be
bumped whenever we made incompatible changes in in-memory structures
or function APIs.  Which'd be almost every day during development :-(
I don't think it's worth trying to do that.  People who work with
development tip should know to recompile their libraries whenever they
recompile the main system.
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
I thought of an alternative approach to the library version problem:
what about taking a leaf from the usual shared library versioning
approach, ie, put the version number into the library file name?
So instead of loading, say, "plpgsql.so" we'd insist on loading
"plpgsql.so.8.2".

This would avoid Peter's objection that the dynamic linker might give
a hard-to-interpret error message, and it'd not require assuming that
the library uses V1 function call convention either.

On the other hand, it'd be relatively easy for clueless lusers to
defeat; I can readily imagine someone copying foo.so.8.2 to foo.so.8.3
when the backend complained that it couldn't find the latter.  So
maybe it's not what we want.
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Martijn van Oosterhout
Date:
On Sat, Nov 12, 2005 at 10:47:35AM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > I would be in favour if storing the CATALOG_VERSION in the pg_finfo
> > struct and rejecting anything that doesn't match.
>
> Not sure that CATALOG_VERSION is an amazingly useful thing to use.
> I think the major version number (eg "8.1") would be sufficient,
> and it'd certainly give error messages that meant more to the casual
> user.

Sure, CATALOG_VERSION isn't that useful, but it's the only thing in the
header files that gives any kind of indication what version you're
compiling against. PG_VERSION is a string, which diminishes its
usefulness considerably.

> The problem with CATALOG_VERSION is that we bump it basically for
> changes in the on-disk contents of a freshly initdb'd database, which
> does not have all that much to do with the ABI seen by a shared library.
> To have something useful that is finer-grain than major version number,
> I think we'd need to invent a separate version number that could be
> bumped whenever we made incompatible changes in in-memory structures
> or function APIs.  Which'd be almost every day during development :-(
> I don't think it's worth trying to do that.  People who work with
> development tip should know to recompile their libraries whenever they
> recompile the main system.

People working with development versions are more likely to get it
right, and more importantly, they're less likely to complain to the
list as they're likely to know what's happening. What we're dealing
with is laymen crossing completely different versions of postgres, and
the catalog version will catch them. Unless someone is actually willing
to maintain a seperate ABI version, using the catalog version will at
least solve the major problem.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Sure, CATALOG_VERSION isn't that useful, but it's the only thing in the
> header files that gives any kind of indication what version you're
> compiling against. PG_VERSION is a string, which diminishes its
> usefulness considerably.

How so?  All we care about is being able to (1) compare for equality,
and (2) print out something useful in error messages.  I claim that
PG_VERSION does #1 equally well and #2 better.
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Martijn van Oosterhout
Date:
On Sat, Nov 12, 2005 at 11:18:51AM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > Sure, CATALOG_VERSION isn't that useful, but it's the only thing in the
> > header files that gives any kind of indication what version you're
> > compiling against. PG_VERSION is a string, which diminishes its
> > usefulness considerably.
>
> How so?  All we care about is being able to (1) compare for equality,
> and (2) print out something useful in error messages.  I claim that
> PG_VERSION does #1 equally well and #2 better.

I was thinking of compile time. The compiler can compare
CATALOG_VERSION in #if statements, but it can't compare strings. Trying
to make a module that compiles against several different versions of
postgres requires testing against CATALOG_VERSION because there's
nothing else.

However, if we purely want distinguish between major releases in the
loading of modules (thus implying no ABI changes between 8.1.0 and
8.1.7), then PG_VERSION will do fine.

Another way that doesn't require code changes would be to make a dummy symbol
containing the version and referring to it in pg_finfo. Then you'd get
error messages like: Couldn't find symbol 'PG_version_verify_8_1'. i.e.
let the dynamic linker do the work.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Sat, Nov 12, 2005 at 11:18:51AM -0500, Tom Lane wrote:
>> How so?  All we care about is being able to (1) compare for equality,
>> and (2) print out something useful in error messages.  I claim that
>> PG_VERSION does #1 equally well and #2 better.

> I was thinking of compile time. The compiler can compare
> CATALOG_VERSION in #if statements, but it can't compare strings.

We aren't asking the compiler to compare anything, though.  I'm
imagining just that the PG_FUNCTION_INFO_V1 macro will insert the value
into the Pg_finfo_record struct, and the comparison will happen at run
time in dfmgr.c.

> Another way that doesn't require code changes would be to make a dummy symbol
> containing the version and referring to it in pg_finfo. Then you'd get
> error messages like: Couldn't find symbol 'PG_version_verify_8_1'. i.e.
> let the dynamic linker do the work.

That would be attractive if we could get it to happen without the
assumption that the library uses PG_FUNCTION_INFO_V1 ... but if it still
needs that assumption, it doesn't seem like much of an improvement.
It's not always easy for people to see dynamic-linker error messages,
so I'd rather the message were issued under our control when possible.
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Martijn van Oosterhout
Date:
On Sat, Nov 12, 2005 at 12:03:00PM -0500, Tom Lane wrote:
> That would be attractive if we could get it to happen without the
> assumption that the library uses PG_FUNCTION_INFO_V1 ... but if it still
> needs that assumption, it doesn't seem like much of an improvement.
> It's not always easy for people to see dynamic-linker error messages,
> so I'd rather the message were issued under our control when possible.

If you want something that works even if people don't use
PG_FUNCTION_INFO_V1, you need something like the linux kernel source
does. During the main build the kernel generate a vmmagic.o object.
This defines a number of symbols including a block containing flags
about endianness, spinlocks, etc. Any module expecting to be loaded
needs to link it in. While loading you simply memcmp() the block with
what you're expecting and fail if it doesn't match.

Note, this is significantly more finegrained, in that it can pickup
descrepicies in HAVE_INT64_TIMESTAMP, NAMEDATALEN, INDEX_MAX_KEYS, etc.
The kind of things that currently appear in pg_controldata. In the
future maybe a 32/64 bit flag.

If we don't like imposing link time constraints, we could require
people to include:

#ifdef PG_MAGIC_BLOCK
PG_MAGIC_BLOCK;
#endif

In any one of their source files and put the definition in a header
file somewhere. This may be even better.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> If we don't like imposing link time constraints, we could require
> people to include:

> #ifdef PG_MAGIC_BLOCK
> PG_MAGIC_BLOCK;
> #endif

I was hoping to avoid forcing source-code changes, but something like
that might be the best solution.  Anyone think it's unreasonable?
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Martijn van Oosterhout
Date:
On Sat, Nov 12, 2005 at 12:44:23PM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > If we don't like imposing link time constraints, we could require
> > people to include:
>
> > #ifdef PG_MAGIC_BLOCK
> > PG_MAGIC_BLOCK;
> > #endif
>
> I was hoping to avoid forcing source-code changes, but something like
> that might be the best solution.  Anyone think it's unreasonable?

Alternativly, you could make it optional for a release (print warning
that magic block wasn't found). Next release require it. It's a small
enough change that it wouldn't require huge amounts of effort on the
part of module writers.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: SIGSEGV taken on 8.1 during dump/reload

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
>> I was hoping to avoid forcing source-code changes, but something like
>> that might be the best solution.  Anyone think it's unreasonable?

> Alternativly, you could make it optional for a release (print warning
> that magic block wasn't found). Next release require it.

What's the point of waiting?  We'd be forcing people to add it sooner
or later, so why not sooner?
        regards, tom lane


Re: SIGSEGV taken on 8.1 during dump/reload

From
Kevin Brown
Date:
Tom Lane wrote:
> On the other hand, it'd be relatively easy for clueless lusers to
> defeat; I can readily imagine someone copying foo.so.8.2 to foo.so.8.3
> when the backend complained that it couldn't find the latter.  So
> maybe it's not what we want.

Hmm...but isn't the version number also something that can be stored
in the shared library itself during link time (e.g., via the -soname
option to the linker)?  The manpage for ld under Linux implies that
this will cause the executable that's linked against the shared object
to look explicitly for a library with the soname specified by the
shared object.  I don't know if that just causes the dynamic linker to
look for a file with the specified soname or if it will actually
examine the shared object under consideration to make sure it has the
DT_SONAME field in question, however.


-- 
Kevin Brown                          kevin@sysexperts.com


Re: SIGSEGV taken on 8.1 during dump/reload

From
Martijn van Oosterhout
Date:
On Sat, Nov 12, 2005 at 10:46:33PM -0800, Kevin Brown wrote:
> Hmm...but isn't the version number also something that can be stored
> in the shared library itself during link time (e.g., via the -soname
> option to the linker)?  The manpage for ld under Linux implies that
> this will cause the executable that's linked against the shared object
> to look explicitly for a library with the soname specified by the
> shared object.  I don't know if that just causes the dynamic linker to
> look for a file with the specified soname or if it will actually
> examine the shared object under consideration to make sure it has the
> DT_SONAME field in question, however.

No, that's completely unrelated. The soname is what gets put in the
DT_NEEDED field of programs that need it. Thus if you have
libtermcap.so symlinked to libncurses.so, when you link with -ltermcap,
the linker will include a reference to libncurses because that's what
the soname is. The only place version numbers come in is when a library
libfoo.8.2 has a soname libfoo.8 which means that at runtime it will
accept any lib with that soname.

None of this applies to PostgreSQL because we open the modules
directly, and don't rely on the linker loader.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: SIGSEGV taken on 8.1 during dump/reload

From
Kevin Brown
Date:
Martijn van Oosterhout wrote:
> None of this applies to PostgreSQL because we open the modules
> directly, and don't rely on the linker loader.

Ah, right.  I forgot the context was the server, not one of the
utilities...

Sorry for the waste of bandwidth...



-- 
Kevin Brown                          kevin@sysexperts.com