Thread: SIGSEGV taken on 8.1 during dump/reload
Hey all, I was doing a test run of a live dump from 8.0.2 to 8.1.0, and 8.1.0 took a segmentation violation 1 hour into the operation. My plan is to re-do the dump/restore, and if it fails again, to re-compile with debug and cassert, and try to get a core. The command line was (8.1.0 is on port 5433): time pg_dumpall -c -v | psql -p 5433 -d template1 template1=# select version(); version ------------------------------------------------------------------------------- --------------------------PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.2 (Mandrake Linux 10.0 3.3.2-6mdk) (1 row) Config is: BINDIR = /usr/local/pgsql810/bin DOCDIR = /usr/local/pgsql810/doc INCLUDEDIR = /usr/local/pgsql810/include PKGINCLUDEDIR = /usr/local/pgsql810/include INCLUDEDIR-SERVER = /usr/local/pgsql810/include/server LIBDIR = /usr/local/pgsql810/lib PKGLIBDIR = /usr/local/pgsql810/lib LOCALEDIR = MANDIR = /usr/local/pgsql810/man SHAREDIR = /usr/local/pgsql810/share SYSCONFDIR = /usr/local/pgsql810/etc PGXS = /usr/local/pgsql810/lib/pgxs/src/makefiles/pgxs.mk CONFIGURE = '--enable-syslog' '--prefix=/usr/local/pgsql810' CC = gcc CPPFLAGS = -D_GNU_SOURCE CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wendif-labels -fno-strict-aliasing CFLAGS_SL = -fpic LDFLAGS = -Wl,-rpath,/usr/local/pgsql810/lib LDFLAGS_SL = LIBS = -lpgport -lz -lreadline -lncurses -lcrypt -lresolv -lnsl -ldl -lm -lbsd VERSION = PostgreSQL 8.1.0 Log snippet as follows (serverlog is empty). postgres810 is 8.1.0, postgres is 8.0.2. Nov 6 16:02:09 thunder postgres810[5238]: [1-1] LOG: autovacuum: processing database "tassiv" Nov 6 16:03:09 thunder postgres810[5306]: [1-1] LOG: autovacuum: processing database "bacula" Nov 6 16:03:12 thunder postgres[1772]: [6-1] tassiv LOG: duration: 1539387.072 ms statement: COPY public.obs_v (x, y, imag, smag, sky, chi, sharp, iter, loc, obs_id, Nov 6 16:03:12 thunder postgres[1772]: [6-2] file_id, use, solve, star_id, mag) TO stdout; Nov 6 16:04:09 thunder postgres810[5359]: [1-1] LOG: autovacuum: processing database "cpan" Nov 6 16:05:09 thunder postgres[1772]: [7-1] tassiv LOG: duration: 98330.722 ms statement: COPY public.tycho2 (star_id, gsc, loc, bt, e_bt, vt, e_vt, prox) TO stdout; Nov 6 16:05:09 thunder postgres810[5418]: [1-1] LOG: autovacuum: processing database "dspam" Nov 6 16:05:15 thunder postgres810[1773]: [20-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "catalog_pkey" for table "catalog" Nov 6 16:05:32 thunder postgres810[1773]: [21-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "color_groups_pkey" for table "color_groups" Nov 6 16:05:32 thunder postgres810[1773]: [22-1] tassivNOTICE: ALTER TABLE / ADD UNIQUE will create implicit index "files_name_key" for table "files" Nov 6 16:05:32 thunder postgres810[1773]: [23-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "files_pkey" for table "files" Nov 6 16:05:32 thunder postgres810[1773]: [24-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "groups_pkey" for table "groups" Nov 6 16:05:32 thunder postgres810[1773]: [25-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "new_reference_loc_pkey" for table "new_reference_loc" Nov 6 16:05:32 thunder postgres810[1773]: [26-1] tassivNOTICE: ALTER TABLE / ADD UNIQUE will create implicit index "nights_night_key" for table "nights" Nov 6 16:05:32 thunder postgres810[1773]: [27-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "nights_pkey" for table "nights" Nov 6 16:05:32 thunder postgres810[1773]: [28-1] tassivNOTICE: ALTER TABLE / ADD UNIQUE will create implicit index "obs_root_obs_id_key" for table "obs_root" Nov 6 16:05:32 thunder postgres810[1773]: [29-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pairs_pkey" for table "pairs" Nov 6 16:05:32 thunder postgres810[1773]: [30-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "reference_ubvri_pkey" for table "reference_ubvri" Nov 6 16:05:34 thunder postgres810[1773]: [31-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "sites_pkey" for table "sites" Nov 6 16:05:34 thunder postgres810[1773]: [32-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "tycho2_pkey" for table "tycho2" Nov 6 16:05:55 thunder postgres810[1773]: [33-1] tassivNOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "zero_pair_pkey" for table "zero_pair" Nov 6 16:06:10 thunder postgres810[5489]: [1-1] LOG: autovacuum: processing database "template1" Nov 6 16:06:27 thunder postgres810[32258]: [1-1] LOG: server process (PID 1773) was terminated by signal 11 Nov 6 16:06:27 thunder postgres810[32258]: [2-1] LOG: terminating any other active server processes Nov 6 16:06:27 thunder postgres810[32258]: [3-1] LOG: all server processes terminated; reinitializing Nov 6 16:06:27 thunder postgres[1772]: [8-1] tassiv LOG: unexpected EOF on client connection Nov 6 16:06:28 thunder postgres810[5508]: [4-1] LOG: database system was interrupted at 2005-11-06 16:05:15 MST Nov 6 16:06:28 thunder postgres810[5508]: [5-1] LOG: checkpoint record is at 1/BA12B8B4 Nov 6 16:06:28 thunder postgres810[5508]: [6-1] LOG: redo record is at 1/BA020058; undo record is at 0/0; shutdown FALSE Nov 6 16:06:28 thunder postgres810[5508]: [7-1] LOG: next transaction ID: 625556; next OID: 33061 Nov 6 16:06:28 thunder postgres810[5508]: [8-1] LOG: next MultiXactId: 1153; next MultiXactOffset: 11782 Nov 6 16:06:28 thunder postgres810[5508]: [9-1] LOG: database system was not properly shut down; automatic recovery in progress Nov 6 16:06:28 thunder postgres810[5508]: [10-1] LOG: redo starts at 1/BA020058 Nov 6 16:06:28 thunder postgres[1373]: [4-1] template1 LOG: unexpected EOF on client connection Nov 6 16:06:42 thunder postgres810[5508]: [11-1] LOG: record with zero length at 1/BF1DFB44 Nov 6 16:06:42 thunder postgres810[5508]: [12-1] LOG: redo done at 1/BF1DFB1C Nov 6 16:06:44 thunder postgres810[5508]: [13-1] LOG: database system is ready Nov 6 16:06:44 thunder postgres810[5508]: [14-1] LOG: transaction ID wrap limit is 2147484146, limited by database "template1" -- 16:09:17 up 35 days, 8:43, 8 users, load average: 4.56, 5.83, 6.47 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
Which version is first in your path, 8.0 or 8.1? If 8.0, do you get a different result from the 8.1 binaries? cheers andrew Robert Creager wrote: >Hey all, > >I was doing a test run of a live dump from 8.0.2 to 8.1.0, and 8.1.0 took a >segmentation violation 1 hour into the operation. My plan is to re-do the >dump/restore, and if it fails again, to re-compile with debug and cassert, and >try to get a core. > >The command line was (8.1.0 is on port 5433): > >time pg_dumpall -c -v | psql -p 5433 -d template1 > > > >
When grilled further on (Sun, 06 Nov 2005 18:52:40 -0500), Andrew Dunstan <andrew@dunslane.net> confessed: > > Which version is first in your path, 8.0 or 8.1? If 8.0, do you get a > different result from the 8.1 binaries? > 8.0 was first. I've specified the correct full path now for the executables. Also, I've actually installed the shared libraries for the types and triggers that I use on that DB. I always seem to forget that :-( But, the table/index that it dies on is not using either the trigger or non native types, unless PG isn't getting the chance to emit that it's working on the next one before it goes out to lunch? The second reload died also. If the third dies (now that the type is in place), I'll do the re-compile and core. tassiv=# \d zero_pair Table "public.zero_pair" Column | Type | Modifiers --------------+---------+-----------pair_id | integer | not nullgroup_id | integer |zero_v | real | default0zero_v_sigma | real | default 0zero_i | real | default 0zero_i_sigma | real | default 0 Indexes: "zero_pair_pkey" PRIMARY KEY, btree (pair_id) "zero_pair_group_id" btree (group_id) Foreign-key constraints: "zero_pair_group_id_fkey" FOREIGN KEY (group_id) REFERENCES color_groups(group_id) ON DELETE CASCADE "zero_pair_pair_id_fkey" FOREIGN KEY (pair_id) REFERENCES pairs(pair_id) ON DELETE CASCADE tassiv=# \d zero_pair_pkey Index "public.zero_pair_pkey"Column | Type ---------+---------pair_id | integer primary key, btree, for table "public.zero_pair" Cheers, Rob -- 19:49:33 up 35 days, 12:24, 8 users, load average: 2.93, 2.51, 2.30 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
When grilled further on (Sun, 6 Nov 2005 20:00:38 -0700), Robert Creager <Robert_Creager@logicalchaos.org> confessed: Didn't set the core big enough (1Mb). It's now at 50Mb. I am using PGSphere, which should be the only gist indexes in use. gdb /usr/local/pgsql810/bin/postgres core.28053 ... warning: core file may not match specified executable file. Core was generated by `postgres: robert tassiv [local] CREATE INDEX '. Program terminated with signal 11, Segmentation fault. warning: current_sos: Can't read pathname for load map: Input/output error Cannot access memory at address 0x400d8000 #0 0x08082057 in gistUserPicksplit (r=Cannot access memory at address 0xbfffcb28 ) at gistutil.c:833 833 if (v->spl_right[v->spl_nright - 1] == InvalidOffsetNumber) (gdb) bt #0 0x08082057 in gistUserPicksplit (r=Cannot access memory at address 0xbfffcb28 ) at gistutil.c:833 Cannot access memory at address 0xbfffcb3c Unfortunately, I have to run shortly. If someone want's a 1Mb core, I have one.I'll have (presumably) more info this eveningwith the bigger core, Cheers, Rob -- 07:56:01 up 36 days, 30 min, 7 users, load average: 2.25, 2.31, 2.23 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
When grilled further on (Mon, 7 Nov 2005 08:07:14 -0700), Robert Creager <Robert_Creager@logicalchaos.org> confessed: I'm currently attached to the dead (dying) process. spl_nright seems pretty large... (gdb) print v->spl_nright $3 = 138311580 Program received signal SIGSEGV, Segmentation fault. 0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227, giststate=0xbfffd120)at gistutil.c:833 833 if (v->spl_right[v->spl_nright - 1] == InvalidOffsetNumber) (gdb) bt #0 0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227, giststate=0xbfffd120)at gistutil.c:833 #1 0x0807f249 in gistSplit (r=0x48f3f1e4, buffer=8917, itup=0x83e3454, len=0xbfffcea4, dist=0xbfffcea0, giststate=0xbfffd120)at gist.c:1083 #2 0x0807c8ab in gistplacetopage (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:331 #3 0x0807e2cd in gistmakedeal (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:878 #4 0x0807c7e1 in gistdoinsert (r=0x48f3f1e4, itup=0x83e339c, giststate=0xbfffd120) at gist.c:299 #5 0x0807c5a6 in gistbuildCallback (index=0x48f3f1e4, htup=0x83c3de8, values=0xbfffd020, isnull=0xbfffd000 "", tupleIsAlive=1'\001', state=0xbfffd120) at gist.c:207 #6 0x080cbb14 in IndexBuildHeapScan (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c, callback=0x807c4f0<gistbuildCallback>, callback_state=0xbfffd120) at index.c:1573 #7 0x0807c3b5 in gistbuild (fcinfo=0xbfffe670) at gist.c:145 #8 0x08234dfd in OidFunctionCall3 (functionId=782, arg1=1223942604, arg2=1223946724, arg3=138165100) at fmgr.c:1460 #9 0x080cb8d3 in index_build (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c) at index.c:1353 #10 0x080cacdc in index_create (heapRelationId=128249, indexRelationName=0x83a0b94 "catalog_ra_decl_index", indexRelationId=128443,indexInfo=0x83c3b6c, accessMethodObjectId=783, tableSpaceId=0, classObjectId=0x83c9cfc, primary=0'\0', isconstraint=0 '\0', allow_system_table_mods=0 '\0', skip_build=0 '\0') at index.c:757 #11 0x08110671 in DefineIndex (heapRelation=0x30f, indexRelationName=0x83a0b94 "catalog_ra_decl_index", indexRelationId=0, accessMethodName=0x83a0c00 "gist", tableSpaceName=0x0, attributeList=0x83a0c58, predicate=0x0, rangetable=0x0,unique=0 '\0', primary=0 '\0', isconstraint=0 '\0', is_alter_table=0 '\0', check_rights=1 '\001', skip_build=0'\0', quiet=0 '\0') at indexcmds.c:383 #12 0x081c409b in ProcessUtility (parsetree=0x83a0c74, params=0x0, dest=0x83a0cf0, completionTag=0xbfffec00 "") at utility.c:748 #13 0x081c2b84 in PortalRunUtility (portal=0x83aad14, query=0x83a0a7c, dest=0x83a0cf0, completionTag=0xbfffec00 "") at pquery.c:987 #14 0x081c2e0b in PortalRunMulti (portal=0x83aad14, dest=0x83a0cf0, altdest=0x83a0cf0, completionTag=0xbfffec00 "") at pquery.c:1054 #15 0x081c26a6 in PortalRun (portal=0x83aad14, count=2147483647, dest=0x83a0cf0, altdest=0x83a0cf0, completionTag=0xbfffec00"") at pquery.c:665 #16 0x081be579 in exec_simple_query (query_string=0x83a0864 "CREATE INDEX catalog_ra_decl_index ON catalog USING gist (loc);")at postgres.c:1014 #17 0x081c1377 in PostgresMain (argc=4, argv=0x8345f3c, username=0x8345f14 "robert") at postgres.c:3168 #18 0x08198692 in BackendRun (port=0x835ea08) at postmaster.c:2854 #19 0x081980a5 in BackendStartup (port=0x835ea08) at postmaster.c:2498 #20 0x081963fe in ServerLoop () at postmaster.c:1231 #21 0x081957aa in PostmasterMain (argc=3, argv=0x8344788) at postmaster.c:943 #22 0x08158b49 in main (argc=3, argv=0x8344788) at main.c:256 -- 22:06:46 up 36 days, 14:41, 7 users, load average: 2.22, 2.55, 3.26 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
When grilled further on (Mon, 7 Nov 2005 22:25:17 -0700), Robert Creager <Robert_Creager@LogicalChaos.org> confessed: Sorry, I'll just trickle out the information. tassiv=# \d catalog_ra_decl_index Index "public.catalog_ra_decl_index"Column | Type --------+-----------loc | spherekey gist, for table "public.catalog" v->spl_right is address 0xbp - uninitialized? (gdb) print *v $2 = {spl_left = 0x83e1308, spl_nleft = 8, spl_ldatum = 138286880, spl_lattr = {3930298096, 3929693296, 1075344513, 3928483696,3927878896, 50331648, 1076099872, 1076099872, 1076100640, 1076099944, 1076099872, 0, 0, 0, 1, 1076099872, 46088,24, 138269392, 108, 8205, 1076099872, 1076097560, 1077018624, 1223005861, 2281761506, 1072462523, 8192, 1076979200,1348122942, 3218058668, 3588489616}, spl_lattrsize = {1072628007, 1223130252, 0, -1073754968, 1223107331, -1073755008,1196715552, 4033364, 1076979200, 8132, 32, 138269400, 58657919, 717016950, 1071875034, 1883413536, -1077677968,-817345387, 1072225709, 138175768, 138175768, 1223130252, 1223130252, -1073754936, 1223083881, 138269472, 1196715552,138269472, 138269428, -1073754256, -1073754256, -1073754376}, spl_lisnull = "ÍD#\bàÌÿ¿\000\000\000\000(Íÿ¿\2004;\b×ÿ¿\000\000\000\000\000\000\000", spl_leftvalid = 20 '\024', spl_right = 0xdb, spl_nright= 138286924, spl_rdatum = 11, spl_rattr = {3463747944, 3883728496, 0, 3882518896, 3881914096, 1, 3221212568, 138097456,138251092, 3878890096, 0, 0, 1222988060, 1222974760, 1222960776, 138097456, 3, 1075321604, 0, 1073825468, 1076097560,3221212576, 3221212540, 1075326465, 3221212576, 909216680, 825503793, 0, 138251202, 1076097560, 136751593, 3221212860},spl_rattrsize = {-1073754484, 1075303286, -1073754720, 136751593, -1073754428, 138251176, 0, -1073754560, 136027536,1196670896, 138269580, 32, 1196670856, 138251176, 138251194, 138251202, 226, 138251008, 0, 0, 0, 7904, 1024, 138269400,138269700, 138269688, 908, -1073754600, 136599995, 138175768, 138269700, 908}, spl_risnull = "\030e<\b\000¼SG\001\000\000\000XÎÿ¿¤Îÿ¿\001\000\000\000Ñÿ¿\004Ô=\b", spl_rightvalid = 108 'l', spl_idgrp = 0x83dd78c, spl_ngrp= 0x83dd378, spl_grpflag = 0x4 <Address 0x4 out of bounds>} > When grilled further on (Mon, 7 Nov 2005 08:07:14 -0700), > Robert Creager <Robert_Creager@logicalchaos.org> confessed: > > I'm currently attached to the dead (dying) process. spl_nright seems pretty large... > > (gdb) print v->spl_nright > $3 = 138311580 > > Program received signal SIGSEGV, Segmentation fault. > 0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227, giststate=0xbfffd120)at gistutil.c:833 > 833 if (v->spl_right[v->spl_nright - 1] == InvalidOffsetNumber) > (gdb) bt > #0 0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227, giststate=0xbfffd120)at gistutil.c:833 > #1 0x0807f249 in gistSplit (r=0x48f3f1e4, buffer=8917, itup=0x83e3454, len=0xbfffcea4, dist=0xbfffcea0, giststate=0xbfffd120)at gist.c:1083 > #2 0x0807c8ab in gistplacetopage (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:331 > #3 0x0807e2cd in gistmakedeal (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:878 > #4 0x0807c7e1 in gistdoinsert (r=0x48f3f1e4, itup=0x83e339c, giststate=0xbfffd120) at gist.c:299 > #5 0x0807c5a6 in gistbuildCallback (index=0x48f3f1e4, htup=0x83c3de8, values=0xbfffd020, isnull=0xbfffd000 "", tupleIsAlive=1'\001', state=0xbfffd120) > at gist.c:207 > #6 0x080cbb14 in IndexBuildHeapScan (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c, callback=0x807c4f0<gistbuildCallback>, > callback_state=0xbfffd120) at index.c:1573 > #7 0x0807c3b5 in gistbuild (fcinfo=0xbfffe670) at gist.c:145 > #8 0x08234dfd in OidFunctionCall3 (functionId=782, arg1=1223942604, arg2=1223946724, arg3=138165100) at fmgr.c:1460 > #9 0x080cb8d3 in index_build (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c) at index.c:1353 > #10 0x080cacdc in index_create (heapRelationId=128249, indexRelationName=0x83a0b94 "catalog_ra_decl_index", indexRelationId=128443,indexInfo=0x83c3b6c, > accessMethodObjectId=783, tableSpaceId=0, classObjectId=0x83c9cfc, primary=0 '\0', isconstraint=0 '\0', allow_system_table_mods=0'\0', > skip_build=0 '\0') at index.c:757 > #11 0x08110671 in DefineIndex (heapRelation=0x30f, indexRelationName=0x83a0b94 "catalog_ra_decl_index", indexRelationId=0, > accessMethodName=0x83a0c00 "gist", tableSpaceName=0x0, attributeList=0x83a0c58, predicate=0x0, rangetable=0x0, unique=0'\0', primary=0 '\0', > isconstraint=0 '\0', is_alter_table=0 '\0', check_rights=1 '\001', skip_build=0 '\0', quiet=0 '\0') at indexcmds.c:383 > #12 0x081c409b in ProcessUtility (parsetree=0x83a0c74, params=0x0, dest=0x83a0cf0, completionTag=0xbfffec00 "") at utility.c:748 > #13 0x081c2b84 in PortalRunUtility (portal=0x83aad14, query=0x83a0a7c, dest=0x83a0cf0, completionTag=0xbfffec00 "") atpquery.c:987 > #14 0x081c2e0b in PortalRunMulti (portal=0x83aad14, dest=0x83a0cf0, altdest=0x83a0cf0, completionTag=0xbfffec00 "") atpquery.c:1054 > #15 0x081c26a6 in PortalRun (portal=0x83aad14, count=2147483647, dest=0x83a0cf0, altdest=0x83a0cf0, completionTag=0xbfffec00"") at pquery.c:665 > #16 0x081be579 in exec_simple_query (query_string=0x83a0864 "CREATE INDEX catalog_ra_decl_index ON catalog USING gist (loc);")at postgres.c:1014 > #17 0x081c1377 in PostgresMain (argc=4, argv=0x8345f3c, username=0x8345f14 "robert") at postgres.c:3168 > #18 0x08198692 in BackendRun (port=0x835ea08) at postmaster.c:2854 > #19 0x081980a5 in BackendStartup (port=0x835ea08) at postmaster.c:2498 > #20 0x081963fe in ServerLoop () at postmaster.c:1231 > #21 0x081957aa in PostmasterMain (argc=3, argv=0x8344788) at postmaster.c:943 > #22 0x08158b49 in main (argc=3, argv=0x8344788) at main.c:256 > > -- > 22:06:46 up 36 days, 14:41, 7 users, load average: 2.22, 2.55, 3.26 > Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 -- 23:44:24 up 36 days, 16:19, 7 users, load average: 2.35, 2.43, 3.13 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
Hmm, did you recompile pg_sphere module for 8.1? Robert Creager wrote: > When grilled further on (Mon, 7 Nov 2005 22:25:17 -0700), > Robert Creager <Robert_Creager@LogicalChaos.org> confessed: > > Sorry, I'll just trickle out the information. > > tassiv=# \d catalog_ra_decl_index > Index "public.catalog_ra_decl_index" > Column | Type > --------+----------- > loc | spherekey > gist, for table "public.catalog" > > v->spl_right is address 0xbp - uninitialized? > > (gdb) print *v > $2 = {spl_left = 0x83e1308, spl_nleft = 8, spl_ldatum = 138286880, spl_lattr = {3930298096, 3929693296, 1075344513, 3928483696,3927878896, 50331648, 1076099872, 1076099872, 1076100640, 1076099944, 1076099872, 0, 0, 0, 1, 1076099872, 46088,24, 138269392, 108, 8205, 1076099872, 1076097560, 1077018624, 1223005861, 2281761506, 1072462523, 8192, 1076979200,1348122942, 3218058668, 3588489616}, spl_lattrsize = {1072628007, 1223130252, 0, -1073754968, 1223107331, -1073755008,1196715552, 4033364, 1076979200, 8132, 32, 138269400, 58657919, 717016950, 1071875034, 1883413536, -1077677968,-817345387, 1072225709, 138175768, 138175768, 1223130252, 1223130252, -1073754936, 1223083881, 138269472, 1196715552,138269472, 138269428, -1073754256, -1073754256, -1073754376}, spl_lisnull = "ÍD#\bàÌÿ¿\000\000\000\000(Íÿ¿\2004;\b×ÿ¿\000\000\000\000\000\000\000", spl_leftvalid = 20 '\024', spl_right = 0xdb, spl_nright= 138286924, spl_rdatum = 11, spl_rattr = {3463747944, 3883728496,0, 3882518896, 3881914096, 1, 3221212568, 138097456,138251092, 3878890096, 0, 0, 1222988060, 1222974760, 1222960776, 138097456, 3, 1075321604, 0, 1073825468, 1076097560,3221212576, 3221212540, 1075326465, 3221212576, 909216680, 825503793, 0, 138251202, 1076097560, 136751593, 3221212860},spl_rattrsize = {-1073754484, 1075303286, -1073754720, 136751593, -1073754428, 138251176, 0, -1073754560, 136027536,1196670896, 138269580, 32, 1196670856, 138251176, 138251194, 138251202, 226, 138251008, 0, 0, 0, 7904, 1024, 138269400,138269700, 138269688, 908, -1073754600, 136599995, 138175768, 138269700, 908}, spl_risnull = "\030e<\b\000¼SG\001\000\000\000XÎÿ¿¤Îÿ¿\001\000\000\000Ñÿ¿\004Ô=\b", spl_rightvalid = 108 'l', spl_idgrp = 0x83dd78c, spl_ngrp= 0x83dd378, spl_grpflag = 0x4 <Address 0x4 out of bounds>} > > >>When grilled further on (Mon, 7 Nov 2005 08:07:14 -0700), >>Robert Creager <Robert_Creager@logicalchaos.org> confessed: >> >>I'm currently attached to the dead (dying) process. spl_nright seems pretty large... >> >>(gdb) print v->spl_nright >>$3 = 138311580 >> >>Program received signal SIGSEGV, Segmentation fault. >>0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227, giststate=0xbfffd120)at gistutil.c:833 >>833 if (v->spl_right[v->spl_nright - 1] == InvalidOffsetNumber) >>(gdb) bt >>#0 0x08082057 in gistUserPicksplit (r=0x48f3f1e4, entryvec=0x83e534c, v=0xbfffcbc0, itup=0x83e3454, len=227, giststate=0xbfffd120)at gistutil.c:833 >>#1 0x0807f249 in gistSplit (r=0x48f3f1e4, buffer=8917, itup=0x83e3454, len=0xbfffcea4, dist=0xbfffcea0, giststate=0xbfffd120)at gist.c:1083 >>#2 0x0807c8ab in gistplacetopage (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:331 >>#3 0x0807e2cd in gistmakedeal (state=0xbfffcf10, giststate=0xbfffd120) at gist.c:878 >>#4 0x0807c7e1 in gistdoinsert (r=0x48f3f1e4, itup=0x83e339c, giststate=0xbfffd120) at gist.c:299 >>#5 0x0807c5a6 in gistbuildCallback (index=0x48f3f1e4, htup=0x83c3de8, values=0xbfffd020, isnull=0xbfffd000 "", tupleIsAlive=1'\001', state=0xbfffd120) >> at gist.c:207 >>#6 0x080cbb14 in IndexBuildHeapScan (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c, callback=0x807c4f0<gistbuildCallback>, >> callback_state=0xbfffd120) at index.c:1573 >>#7 0x0807c3b5 in gistbuild (fcinfo=0xbfffe670) at gist.c:145 >>#8 0x08234dfd in OidFunctionCall3 (functionId=782, arg1=1223942604, arg2=1223946724, arg3=138165100) at fmgr.c:1460 >>#9 0x080cb8d3 in index_build (heapRelation=0x48f3e1cc, indexRelation=0x48f3f1e4, indexInfo=0x83c3b6c) at index.c:1353 >>#10 0x080cacdc in index_create (heapRelationId=128249, indexRelationName=0x83a0b94 "catalog_ra_decl_index", indexRelationId=128443,indexInfo=0x83c3b6c, >> accessMethodObjectId=783, tableSpaceId=0, classObjectId=0x83c9cfc, primary=0 '\0', isconstraint=0 '\0', allow_system_table_mods=0'\0', >> skip_build=0 '\0') at index.c:757 >>#11 0x08110671 in DefineIndex (heapRelation=0x30f, indexRelationName=0x83a0b94 "catalog_ra_decl_index", indexRelationId=0, >> accessMethodName=0x83a0c00 "gist", tableSpaceName=0x0, attributeList=0x83a0c58, predicate=0x0, rangetable=0x0, unique=0'\0', primary=0 '\0', >> isconstraint=0 '\0', is_alter_table=0 '\0', check_rights=1 '\001', skip_build=0 '\0', quiet=0 '\0') at indexcmds.c:383 >>#12 0x081c409b in ProcessUtility (parsetree=0x83a0c74, params=0x0, dest=0x83a0cf0, completionTag=0xbfffec00 "") at utility.c:748 >>#13 0x081c2b84 in PortalRunUtility (portal=0x83aad14, query=0x83a0a7c, dest=0x83a0cf0, completionTag=0xbfffec00 "") atpquery.c:987 >>#14 0x081c2e0b in PortalRunMulti (portal=0x83aad14, dest=0x83a0cf0, altdest=0x83a0cf0, completionTag=0xbfffec00 "") atpquery.c:1054 >>#15 0x081c26a6 in PortalRun (portal=0x83aad14, count=2147483647, dest=0x83a0cf0, altdest=0x83a0cf0, completionTag=0xbfffec00"") at pquery.c:665 >>#16 0x081be579 in exec_simple_query (query_string=0x83a0864 "CREATE INDEX catalog_ra_decl_index ON catalog USING gist (loc);")at postgres.c:1014 >>#17 0x081c1377 in PostgresMain (argc=4, argv=0x8345f3c, username=0x8345f14 "robert") at postgres.c:3168 >>#18 0x08198692 in BackendRun (port=0x835ea08) at postmaster.c:2854 >>#19 0x081980a5 in BackendStartup (port=0x835ea08) at postmaster.c:2498 >>#20 0x081963fe in ServerLoop () at postmaster.c:1231 >>#21 0x081957aa in PostmasterMain (argc=3, argv=0x8344788) at postmaster.c:943 >>#22 0x08158b49 in main (argc=3, argv=0x8344788) at main.c:256 >> >>-- >> 22:06:46 up 36 days, 14:41, 7 users, load average: 2.22, 2.55, 3.26 >>Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 > > > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
When grilled further on (Tue, 08 Nov 2005 15:13:32 +0300), Teodor Sigaev <teodor@sigaev.ru> confessed: > Hmm, did you recompile pg_sphere module for 8.1? Yes I did. Just did it again to make sure. Is there any way I can do a <make installcheck> without a reconfigure/make/installof postgresql? The db is running on port 5433, not the default of 5432. If this is a PGSphere problem, should this conversation be continued there? Thanks, Rob -- 07:01:55 up 36 days, 23:36, 7 users, load average: 3.80, 3.47, 3.17 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
Robert Creager wrote: > Yes I did. Just did it again to make sure. Is there any way I can do a <make installcheck> without a reconfigure/make/installof postgresql? The db is running on port 5433, not the default of 5432. export PGPORT=5433 > If this is a PGSphere problem, should this conversation be continued there? PGSphere or not it's unknown for now. Can you prepare minimalist test suite reproducing problem? -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Robert Creager <Robert_Creager@LogicalChaos.org> writes: > v->spl_right is address 0xbp - uninitialized? The whole struct looks pretty uninitialized, which immediately makes me wonder whether gdb has picked up a wrong value for "v". Try going down to a lower stack frame and seeing if you can access the struct from there. regards, tom lane
When grilled further on (Tue, 08 Nov 2005 09:20:13 -0500), Tom Lane <tgl@sss.pgh.pa.us> confessed: > Robert Creager <Robert_Creager@LogicalChaos.org> writes: > > v->spl_right is address 0xbp - uninitialized? > > The whole struct looks pretty uninitialized, which immediately makes me > wonder whether gdb has picked up a wrong value for "v". Try going down > to a lower stack frame and seeing if you can access the struct from > there. > Well, it's defined the next level up on the stack, and it's still garbage. The way I read gist.c and how it's calling gistUserPicksplitat line 1083, it's not initialized prior that else. So, FunctionCall2 in gistutil.c is supposed to fillit out? Presumably a function supplied by PGSphere in this case? (gdb) up #1 0x0807f249 in gistSplit (r=0x48df1e6c, buffer=93, itup=0x83b8e94, len=0xbfffcea4, dist=0xbfffcea0, giststate=0xbfffd120)at gist.c:1083 (gdb) print v $1 = {spl_left = 0x83bcd98, spl_nleft = 8, spl_ldatum = 138138032, spl_lattr = {138089040, 1, 1075344513, 3221212168, 134843567,0, 1076099872, 1076099872, 1076100896, 1076099944, 1076099872, 138072532, 136595410, 138072532, 127, 64, 138072596,137900116, 138120544, 108, 8205, 1076099872, 1076097560, 1077067776, 1222874789, 2281761506, 1072462523, 8192,1076979200, 1348122942, 3218058668, 3588489616}, spl_lattrsize = {1072628007, 1222999180, 0, -1073754968, 1222976259,-1073755008, 1079103008, 3871912, 1076979200, 8132, 32, 138120552, 58657919, 717016950, 1071875034, 1883413536,-1077677968, -817345387, 1072225709, 138043264, 138043264, 1222999180, 1222999180, -1073754936, 1222952809, 138120624,1079103008, 138120624, 138120580, -1073754256, -1073754256, -1073754376}, spl_lisnull = "ÍD#\bàÌÿ¿\000\000\000\000(Íÿ¿0K;\b×ÿ¿\000\000\000\000\000\000\000", spl_leftvalid = -92 '¤', spl_right = 0xdb, spl_nright= 138138076, spl_rdatum = 11, spl_rattr = {3463919764, 0, 0, 0, 0, 1, 3221212568, 138103264, 138089640, 434176,0, 0, 1222856988, 1222843688, 1222829704, 138103264, 3, 1075321604, 0, 1073825468, 1076097560, 3221212576, 3221212540,1075326465, 3221212576, 909186620, 825503793, 0, 138090070, 1076097560, 136751593, 3221212860}, spl_rattrsize= {-1073754484, 1075303286, -1073754720, 136751593, -1073754428, 138090044, 0, -1073754560, 136027536, 1079058352,138120732, 32, 1079058312, 138090044, 138090062, 138090070, 226, 138089984, 0, 0, 0, 7904, 1024, 138120552, 138120852,138120840, 908, -1073754600, 136599995, 138043264, 138120852, 908}, spl_risnull = "\200_:\b\000\034Q@\001\000\000\000XÎÿ¿¤Îÿ¿\001\000\000\000Ñÿ¿\224\216;\b", spl_rightvalid = 108 'l', spl_idgrp = 0x83b921c,spl_ngrp = 0x83b8e08, spl_grpflag = 0x4 <Address 0x4 out of bounds>} (gdb) -- 07:38:26 up 37 days, 13 min, 6 users, load average: 3.28, 3.42, 3.43 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
Robert Creager <Robert_Creager@LogicalChaos.org> writes: > Is there any way I can do a <make installcheck> without a > reconfigure/make/install of postgresql? The db is running on port > 5433, not the default of 5432. Sure, just "export PGPORT=5433" before "make installcheck". Doubt it will prove much, though, because the regression tests contain only minimal exercising of GIST. Does PGSphere itself have any regression tests? (Actually, running the contrib regression tests might be more relevant than the main PG tests, since several contrib modules with GIST opclasses have regression tests.) regards, tom lane
Tom Lane wrote: > Robert Creager <Robert_Creager@LogicalChaos.org> writes: > >>v->spl_right is address 0xbp - uninitialized? > > > The whole struct looks pretty uninitialized, which immediately makes me > wonder whether gdb has picked up a wrong value for "v". Try going down > to a lower stack frame and seeing if you can access the struct from > there. Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that old .so is used. spl_(right|left)valid fields was added to GIST_SPLITVEC. Looking into spl_leftvalid = 20 '\024', spl_right = 0xdb, spl_nright = 138286924, spl_rdatum = 11, and GIST_SPLITVEC bool spl_lisnull[INDEX_MAX_KEYS]; bool spl_leftvalid; OffsetNumber *spl_right; /* array of entries that go right */ int spl_nright; /* size of the array */ Datum spl_rdatum; /* Union of keys in spl_right */ It's very like that spl_right contains correct spl_nright value (0xdb = 219) and spl_nright contains correct spl_rdatum (pointer 138286924 = 0x83e174c) -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
When grilled further on (Tue, 08 Nov 2005 10:06:38 -0500), Tom Lane <tgl@sss.pgh.pa.us> confessed: > Does PGSphere itself have any regression tests? > > (Actually, running the contrib regression tests might be more relevant > than the main PG tests, since several contrib modules with GIST > opclasses have regression tests.) > That's what I was trying to do ;-) <make installcheck> passes, as does <make crushtest> (within pg_sphere). I'll work on trying to get a small test case tonight. Otherwise, we can try SSH to my machine or a DVD. Cheers, Rob -- 08:17:03 up 37 days, 51 min, 6 users, load average: 3.70, 3.56, 3.41 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
Teodor Sigaev <teodor@sigaev.ru> writes: > Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that old > .so is used. spl_(right|left)valid fields was added to GIST_SPLITVEC. Does look a bit suspicious ... Robert, are you *sure* you've got the right version of pgsphere linked in? Did you compile it against the right set of Postgres header files? regards, tom lane
On Tue, 08 Nov 2005 11:12:04 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: > > Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that > > old .so is used. spl_(right|left)valid fields was added to GIST_SPLITVEC. > > Does look a bit suspicious ... Robert, are you *sure* you've got the > right version of pgsphere linked in? Did you compile it against the > right set of Postgres header files? > I copied pg_sphere into the contrib directory in 8.1.0, which is where it was built. Last night, I executed a <make clean> from contrib/pg_sphere, re-built <make> and re-installed. I checked the pg_sphere Makefile, and it references local, not absolute paths. So, I'm as sure as I can be right now. How can I check the .so files installed by the build? Do they reference an absolute path for their dependent .so files (postgres), or will they use ld.so.conf, which might then explain the problem. My ld.so.conf still points to the 8.0.2 version, as I've not switched yet to 8.1.0. In any case, why would the <make installcheck> work in the pg_sphere directory? That would have to use the installed libraries. I don't have the sources with me, but I'd think an index would of been created on a spoint column, but maybe not? Cheers, Rob
When grilled further on (Tue, 08 Nov 2005 11:12:04 -0500), Tom Lane <tgl@sss.pgh.pa.us> confessed: > Teodor Sigaev <teodor@sigaev.ru> writes: > > Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that old > > .so is used. spl_(right|left)valid fields was added to GIST_SPLITVEC. > > Does look a bit suspicious ... Robert, are you *sure* you've got the > right version of pgsphere linked in? Did you compile it against the > right set of Postgres header files? > Strings on pg_sphere.so does contain /usr/local/pgsql810/lib. I've attached a small dump file that when I create an index on the table, it fails. It works on 225 entries, but failedon 250. Don't know if this is data dependent or size. Is that a page boundary? It seems to me that unless the right/leftstuff doesn't come into play for all indexes, that stuff is built correctly. Dump command: /usr/local/pgsql810/bin/pg_dump -F c -p 5433 -d tassiv -t test_data -f index_problem.dump Created the table and index by: tassiv=# SELECT loc into test_data from catalog limit 250; tassiv=# create index test_data_index on test_data using gist( loc ); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> tassiv=# \d test_data Table "public.test_data" Column | Type | Modifiers --------+--------+----------- loc | spoint | Cheers, Rob -- 19:51:58 up 37 days, 12:26, 6 users, load average: 2.15, 2.39, 2.41 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
Attachment
works fine.... contrib_regression=# select count(*) from test_data ; count ------- 250 (1 row) contrib_regression=# create index test_data_index on test_data using gist( loc ); CREATE INDEX > I've attached a small dump file that when I create an index on the table, it fails. It works on 225 entries, but failedon 250. Don't know if this is data dependent or size. Is that a page boundary? It seems to me that unless the right/leftstuff doesn't come into play for all indexes, that stuff is built correctly. > > Dump command: > /usr/local/pgsql810/bin/pg_dump -F c -p 5433 -d tassiv -t test_data -f index_problem.dump -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> So, I'm as sure as I can be right now. How can I check the .so files installed > by the build? Do they reference an absolute path for their dependent .so files > (postgres), or will they use ld.so.conf, which might then explain the problem. > My ld.so.conf still points to the 8.0.2 version, as I've not switched yet to > 8.1.0. The simplest way is just remove pg_sphere.so in 8.1 installaion (/usr/local/pgsql810/lib/pg_sphere.so) and try, for example, to create gist index on spoint. Response should be: contrib_regression=# create index test_data_index on test_data using gist( loc ); ERROR: could not access file "/usr/local/pgsql/lib/pg_sphere": No such file or directory If not - 8.1 use 8.0 .so.... -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
When grilled further on (Wed, 09 Nov 2005 10:54:12 +0300), Teodor Sigaev <teodor@sigaev.ru> confessed: > > So, I'm as sure as I can be right now. How can I check the .so files installed > > by the build? Do they reference an absolute path for their dependent .so files > > (postgres), or will they use ld.so.conf, which might then explain the problem. > > My ld.so.conf still points to the 8.0.2 version, as I've not switched yet to > > 8.1.0. > > The simplest way is just remove pg_sphere.so in 8.1 installaion > (/usr/local/pgsql810/lib/pg_sphere.so) and try, for example, to create gist > index on spoint. Response should be: > contrib_regression=# create index test_data_index on test_data using gist( loc ); > ERROR: could not access file "/usr/local/pgsql/lib/pg_sphere": No such file or > directory > > > If not - 8.1 use 8.0 .so.... Yup. You're right. So, what is happening here? It will be kind of hard to do a live dump/restore on 1 machine if I cannothave two versions running. Is something not set up correctly on my machine, or in the build (pg_sphere or postgresql)that is preventing two copies from... Sigh. Never mind. The dump is spitting out the absolute path for theshared library (like it should): CREATE FUNCTION sbox_in(cstring) RETURNS sbox AS '/usr/local/pgsql802/lib/pg_sphere', 'spherebox_in' LANGUAGE c IMMUTABLESTRICT; Now if I can just figure out how to get this egg off my face... Now I remember the problem I always have, and I have a new trick in my bag: /usr/local/pgsql802/bin/pg_dumpall -c -v | sed 's/pgsql802/pgsql810/' | /usr/local/pgsql810/bin/psql -p 5433 -d template1 How do others handle dumping from one version to a new one? Is there a less error prone way of doing this? As long as Idon't have the string pgsql802 anywhere else... Sorry for the bandwidth, Rob -- 07:14:34 up 37 days, 23:49, 6 users, load average: 2.20, 2.17, 2.16 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
Robert Creager wrote: >Yup. You're right. So, what is happening here? It will be kind of hard to do a live dump/restore on 1 machine if I cannothave two versions running. Is something not set up correctly on my machine, or in the build (pg_sphere or postgresql)that is preventing two copies from... Sigh. Never mind. The dump is spitting out the absolute path for theshared library (like it should): > >CREATE FUNCTION sbox_in(cstring) RETURNS sbox > AS '/usr/local/pgsql802/lib/pg_sphere', 'spherebox_in' > LANGUAGE c IMMUTABLE STRICT; > >Now if I can just figure out how to get this egg off my face... > >Now I remember the problem I always have, and I have a new trick in my bag: > >/usr/local/pgsql802/bin/pg_dumpall -c -v | sed 's/pgsql802/pgsql810/' | /usr/local/pgsql810/bin/psql -p 5433 -d template1 > >How do others handle dumping from one version to a new one? Is there a less error prone way of doing this? As long asI don't have the string pgsql802 anywhere else... > > > > Why use an absolute path? Why not just give the name of the .so and let postgres find it in $libdir (i.e. sed -e 's,/usr/local/pgsql.*/lib/,,' on your dump) ? cheers andrew
On 11/8/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: > > Layout of GIST_SPLITVEC struct has been changed from 8.0, I'm afraid that old > > .so is used. spl_(right|left)valid fields was added to GIST_SPLITVEC. > > Does look a bit suspicious ... Robert, are you *sure* you've got the > right version of pgsphere linked in? Did you compile it against the > right set of Postgres header files? So it turned out that he didn't... Is this a sign that we need to include a versioning symbol in SOs so we can give a nice clear error message "module foo compiled for PostgreSQL 8.0.2 this is PostgreSQL 8.1." Is there ever a case where we want people using modules compiled against an old version, are there cases where users can't recompile their modules but the old ones would work?
Robert Creager <Robert_Creager@LogicalChaos.org> writes: > CREATE FUNCTION sbox_in(cstring) RETURNS sbox > AS '/usr/local/pgsql802/lib/pg_sphere', 'spherebox_in' > LANGUAGE c IMMUTABLE STRICT; > Now if I can just figure out how to get this egg off my face... You'd be a lot better off to define all your functions as relative to $libdir, ie,AS '$libdir/pg_sphere', 'spherebox_in' (note the lack of any .so extension, too) If pg_sphere is supplying a setup procedure that gets this wrong, yell at them. regards, tom lane
Gregory Maxwell <gmaxwell@gmail.com> writes: > On 11/8/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Does look a bit suspicious ... Robert, are you *sure* you've got the >> right version of pgsphere linked in? > So it turned out that he didn't... Is this a sign that we need to > include a versioning symbol in SOs so we can give a nice clear error > message "module foo compiled for PostgreSQL 8.0.2 this is PostgreSQL > 8.1." Is there ever a case where we want people using modules compiled > against an old version, are there cases where users can't recompile > their modules but the old ones would work? There are cases where it would work, and other cases where it wouldn't. Given the pain involved in debugging when it's wrong, maybe we should just endeavor to forbid loading of all wrong-version modules. I'm not sure that there's any real easy way to detect this though. For V1-style functions we could embed a version number in the per-function info structs, but that doesn't help for old-style functions. regards, tom lane
On Wed, Nov 09, 2005 at 10:57:25AM -0500, Tom Lane wrote: > There are cases where it would work, and other cases where it wouldn't. > Given the pain involved in debugging when it's wrong, maybe we should > just endeavor to forbid loading of all wrong-version modules. > > I'm not sure that there's any real easy way to detect this though. > For V1-style functions we could embed a version number in the > per-function info structs, but that doesn't help for old-style > functions. Given the lack of information you get for old style I'm not sure we should care. do a lot of people use it still? I think that if we're going to expand the Pg_finfo_record struct, I think it could also include (optionally): - A length field (for future upwardly compatable changes). - Allow the specification of flags like strict and volatile so the coder doesn't have to worry about getting the SQL install script right. - Indication of number of parameters/datatypes - A description for pg_proc Ofcourse, then you're getting into the realm of [1]. Still, at least flags like STRICT would be useful because then the source code can assert that it can/cannot accept NULLs, so users can't screw it up. [1] http://archives.postgresql.org/pgsql-hackers/2005-09/msg00476.php -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On Wed, 09 Nov 2005 09:56:51 -0500 Andrew Dunstan <andrew@dunslane.net> wrote: > > Why use an absolute path? Why not just give the name of the .so and let > postgres find it in $libdir (i.e. sed -e 's,/usr/local/pgsql.*/lib/,,' > on your dump) ? 'cause I didn't know I could? I'll go and fix the Makefile in pg_sphere on GBORG. I might of even created this problem myself... Cheers, Rob
On Wed, 09 Nov 2005 10:42:00 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > > If pg_sphere is supplying a setup procedure that gets this wrong, > yell at them. I'll just go fix it, now that I know what the right way is ;-) Thanks, Rob
I fixed path in pg_sphere (and done some more clean up). BTW, I usially install contrib modules before restoring database (of course, it need to dump db without content of modules)... -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
I've also modified the Makefile. I removed the special .sql.in : .sql implicit rule and re-organized the Makefile. I didn't commit as it was after 12:00pm when I finished... I'll send you what I did when I return home. If you just replaced the $libdir with $$libdir, then a merge will be easy. Cheers, Rob On Thu, 10 Nov 2005 14:43:30 +0300 Teodor Sigaev <teodor@sigaev.ru> wrote: > I fixed path in pg_sphere (and done some more clean up). > > BTW, I usially install contrib modules before restoring database (of course, > it need to dump db without content of modules)... > > > -- > Teodor Sigaev E-mail: teodor@sigaev.ru > WWW: http://www.sigaev.ru/ > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Gregory Maxwell wrote: > So it turned out that he didn't... Is this a sign that we need to > include a versioning symbol in SOs so we can give a nice clear error > message "module foo compiled for PostgreSQL 8.0.2 this is PostgreSQL > 8.1." I think this would rarely work in practice. For example, during the elog->ereport transition, any module compiled against the wrong server would immediately get an "unresolved symbol: elog/ereport" before you can run your nice version check. I had thought about this issue back then because it was an extrememly common occurrence; then only thing I could think of is that you trick the dynamic loader to first reference a symbol with an obvious name like "if_you_see_this_in_an_error_message_you_have_a_version_mismatch". However, this would likely be platform dependent and maybe confuse users even more. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Sat, Nov 12, 2005 at 12:28:48PM +0100, Peter Eisentraut wrote: > I think this would rarely work in practice. For example, during the > elog->ereport transition, any module compiled against the wrong server > would immediately get an "unresolved symbol: elog/ereport" before you > can run your nice version check. Actually, that doesn't worry me. What worries me is that people who don't use ereport won't get any error messages at all yet have completely different expectations at to the structure of various internal structures. So the idea is to force failure when it would otherwise succeed, not just for the pretty error messages but for stability of the system. I would be in favour if storing the CATALOG_VERSION in the pg_finfo struct and rejecting anything that doesn't match. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > So the idea is to force failure when it would otherwise succeed, not > just for the pretty error messages but for stability of the system. Exactly. Peter's right that we'd not always get a "nice" error message --- but it's not hard to figure out "unresolved symbol" failures. As we just were reminded, it can be really hard to figure out minor incompatibilities with wrong-version libraries, and the real point of the proposal is to save us from going through *that* again. > I would be in favour if storing the CATALOG_VERSION in the pg_finfo > struct and rejecting anything that doesn't match. Not sure that CATALOG_VERSION is an amazingly useful thing to use. I think the major version number (eg "8.1") would be sufficient, and it'd certainly give error messages that meant more to the casual user. The problem with CATALOG_VERSION is that we bump it basically for changes in the on-disk contents of a freshly initdb'd database, which does not have all that much to do with the ABI seen by a shared library. To have something useful that is finer-grain than major version number, I think we'd need to invent a separate version number that could be bumped whenever we made incompatible changes in in-memory structures or function APIs. Which'd be almost every day during development :-( I don't think it's worth trying to do that. People who work with development tip should know to recompile their libraries whenever they recompile the main system. regards, tom lane
I thought of an alternative approach to the library version problem: what about taking a leaf from the usual shared library versioning approach, ie, put the version number into the library file name? So instead of loading, say, "plpgsql.so" we'd insist on loading "plpgsql.so.8.2". This would avoid Peter's objection that the dynamic linker might give a hard-to-interpret error message, and it'd not require assuming that the library uses V1 function call convention either. On the other hand, it'd be relatively easy for clueless lusers to defeat; I can readily imagine someone copying foo.so.8.2 to foo.so.8.3 when the backend complained that it couldn't find the latter. So maybe it's not what we want. regards, tom lane
On Sat, Nov 12, 2005 at 10:47:35AM -0500, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I would be in favour if storing the CATALOG_VERSION in the pg_finfo > > struct and rejecting anything that doesn't match. > > Not sure that CATALOG_VERSION is an amazingly useful thing to use. > I think the major version number (eg "8.1") would be sufficient, > and it'd certainly give error messages that meant more to the casual > user. Sure, CATALOG_VERSION isn't that useful, but it's the only thing in the header files that gives any kind of indication what version you're compiling against. PG_VERSION is a string, which diminishes its usefulness considerably. > The problem with CATALOG_VERSION is that we bump it basically for > changes in the on-disk contents of a freshly initdb'd database, which > does not have all that much to do with the ABI seen by a shared library. > To have something useful that is finer-grain than major version number, > I think we'd need to invent a separate version number that could be > bumped whenever we made incompatible changes in in-memory structures > or function APIs. Which'd be almost every day during development :-( > I don't think it's worth trying to do that. People who work with > development tip should know to recompile their libraries whenever they > recompile the main system. People working with development versions are more likely to get it right, and more importantly, they're less likely to complain to the list as they're likely to know what's happening. What we're dealing with is laymen crossing completely different versions of postgres, and the catalog version will catch them. Unless someone is actually willing to maintain a seperate ABI version, using the catalog version will at least solve the major problem. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > Sure, CATALOG_VERSION isn't that useful, but it's the only thing in the > header files that gives any kind of indication what version you're > compiling against. PG_VERSION is a string, which diminishes its > usefulness considerably. How so? All we care about is being able to (1) compare for equality, and (2) print out something useful in error messages. I claim that PG_VERSION does #1 equally well and #2 better. regards, tom lane
On Sat, Nov 12, 2005 at 11:18:51AM -0500, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > Sure, CATALOG_VERSION isn't that useful, but it's the only thing in the > > header files that gives any kind of indication what version you're > > compiling against. PG_VERSION is a string, which diminishes its > > usefulness considerably. > > How so? All we care about is being able to (1) compare for equality, > and (2) print out something useful in error messages. I claim that > PG_VERSION does #1 equally well and #2 better. I was thinking of compile time. The compiler can compare CATALOG_VERSION in #if statements, but it can't compare strings. Trying to make a module that compiles against several different versions of postgres requires testing against CATALOG_VERSION because there's nothing else. However, if we purely want distinguish between major releases in the loading of modules (thus implying no ABI changes between 8.1.0 and 8.1.7), then PG_VERSION will do fine. Another way that doesn't require code changes would be to make a dummy symbol containing the version and referring to it in pg_finfo. Then you'd get error messages like: Couldn't find symbol 'PG_version_verify_8_1'. i.e. let the dynamic linker do the work. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Sat, Nov 12, 2005 at 11:18:51AM -0500, Tom Lane wrote: >> How so? All we care about is being able to (1) compare for equality, >> and (2) print out something useful in error messages. I claim that >> PG_VERSION does #1 equally well and #2 better. > I was thinking of compile time. The compiler can compare > CATALOG_VERSION in #if statements, but it can't compare strings. We aren't asking the compiler to compare anything, though. I'm imagining just that the PG_FUNCTION_INFO_V1 macro will insert the value into the Pg_finfo_record struct, and the comparison will happen at run time in dfmgr.c. > Another way that doesn't require code changes would be to make a dummy symbol > containing the version and referring to it in pg_finfo. Then you'd get > error messages like: Couldn't find symbol 'PG_version_verify_8_1'. i.e. > let the dynamic linker do the work. That would be attractive if we could get it to happen without the assumption that the library uses PG_FUNCTION_INFO_V1 ... but if it still needs that assumption, it doesn't seem like much of an improvement. It's not always easy for people to see dynamic-linker error messages, so I'd rather the message were issued under our control when possible. regards, tom lane
On Sat, Nov 12, 2005 at 12:03:00PM -0500, Tom Lane wrote: > That would be attractive if we could get it to happen without the > assumption that the library uses PG_FUNCTION_INFO_V1 ... but if it still > needs that assumption, it doesn't seem like much of an improvement. > It's not always easy for people to see dynamic-linker error messages, > so I'd rather the message were issued under our control when possible. If you want something that works even if people don't use PG_FUNCTION_INFO_V1, you need something like the linux kernel source does. During the main build the kernel generate a vmmagic.o object. This defines a number of symbols including a block containing flags about endianness, spinlocks, etc. Any module expecting to be loaded needs to link it in. While loading you simply memcmp() the block with what you're expecting and fail if it doesn't match. Note, this is significantly more finegrained, in that it can pickup descrepicies in HAVE_INT64_TIMESTAMP, NAMEDATALEN, INDEX_MAX_KEYS, etc. The kind of things that currently appear in pg_controldata. In the future maybe a 32/64 bit flag. If we don't like imposing link time constraints, we could require people to include: #ifdef PG_MAGIC_BLOCK PG_MAGIC_BLOCK; #endif In any one of their source files and put the definition in a header file somewhere. This may be even better. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > If we don't like imposing link time constraints, we could require > people to include: > #ifdef PG_MAGIC_BLOCK > PG_MAGIC_BLOCK; > #endif I was hoping to avoid forcing source-code changes, but something like that might be the best solution. Anyone think it's unreasonable? regards, tom lane
On Sat, Nov 12, 2005 at 12:44:23PM -0500, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > If we don't like imposing link time constraints, we could require > > people to include: > > > #ifdef PG_MAGIC_BLOCK > > PG_MAGIC_BLOCK; > > #endif > > I was hoping to avoid forcing source-code changes, but something like > that might be the best solution. Anyone think it's unreasonable? Alternativly, you could make it optional for a release (print warning that magic block wasn't found). Next release require it. It's a small enough change that it wouldn't require huge amounts of effort on the part of module writers. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: >> I was hoping to avoid forcing source-code changes, but something like >> that might be the best solution. Anyone think it's unreasonable? > Alternativly, you could make it optional for a release (print warning > that magic block wasn't found). Next release require it. What's the point of waiting? We'd be forcing people to add it sooner or later, so why not sooner? regards, tom lane
Tom Lane wrote: > On the other hand, it'd be relatively easy for clueless lusers to > defeat; I can readily imagine someone copying foo.so.8.2 to foo.so.8.3 > when the backend complained that it couldn't find the latter. So > maybe it's not what we want. Hmm...but isn't the version number also something that can be stored in the shared library itself during link time (e.g., via the -soname option to the linker)? The manpage for ld under Linux implies that this will cause the executable that's linked against the shared object to look explicitly for a library with the soname specified by the shared object. I don't know if that just causes the dynamic linker to look for a file with the specified soname or if it will actually examine the shared object under consideration to make sure it has the DT_SONAME field in question, however. -- Kevin Brown kevin@sysexperts.com
On Sat, Nov 12, 2005 at 10:46:33PM -0800, Kevin Brown wrote: > Hmm...but isn't the version number also something that can be stored > in the shared library itself during link time (e.g., via the -soname > option to the linker)? The manpage for ld under Linux implies that > this will cause the executable that's linked against the shared object > to look explicitly for a library with the soname specified by the > shared object. I don't know if that just causes the dynamic linker to > look for a file with the specified soname or if it will actually > examine the shared object under consideration to make sure it has the > DT_SONAME field in question, however. No, that's completely unrelated. The soname is what gets put in the DT_NEEDED field of programs that need it. Thus if you have libtermcap.so symlinked to libncurses.so, when you link with -ltermcap, the linker will include a reference to libncurses because that's what the soname is. The only place version numbers come in is when a library libfoo.8.2 has a soname libfoo.8 which means that at runtime it will accept any lib with that soname. None of this applies to PostgreSQL because we open the modules directly, and don't rely on the linker loader. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout wrote: > None of this applies to PostgreSQL because we open the modules > directly, and don't rely on the linker loader. Ah, right. I forgot the context was the server, not one of the utilities... Sorry for the waste of bandwidth... -- Kevin Brown kevin@sysexperts.com