Thread: 8.2beta1 crash possibly in libpq
Hi everyone, I'm in the process of generating the Windows installer for the latest PostGIS 1.1.4 release and I'm getting a regression failure in one of libpq applications - the application in question is generating a segfault. Testing so far shows that the regression tests pass without segfaulting in the following scenarios: PostgreSQL 8.2beta1 / PostGIS 1.1.4 / Linux PostgreSQL 8.1 / PostGIS 1.1.4 / Win32 So it appears it is something to do with 8.2beta1 and Win32. I've compiled the application with debugging symbols enabled and get the following backtrace from gdb in MingW: (gdb) set args -f /tmp/pgis_reg_4060/dumper postgis_reg loadedshp (gdb) run Starting program: C:\msys\1.0\home\mca\postgis\pg82\postgis-1.1.4 \regress/../loader/pgsql2shp.exe -f /tmp/pgis_reg_4060/dumper postgis_reg loadedshp Initializing... Program received signal SIGSEGV, Segmentation fault. 0x63512c1c in ?? () (gdb) bt #0 0x63512c1c in ?? () #1 0x0040c69c in _fu8__PQntuples () at pgsql2shp.c:2502 #2 0x00408481 in main (ARGC=5, ARGV=0x3d2750) at pgsql2shp.c:243 (gdb) I also turned on the logging in the server and get the following in the server log: 2006-10-08 12:01:15 LOG: statement: BEGIN; 2006-10-08 12:01:15 LOG: statement: CREATE TABLE "loadedshp" (gid serial PRIMARY KEY); 2006-10-08 12:01:15 NOTICE: CREATE TABLE will create implicit sequence "loadedshp_gid_seq" for serial column "loadedshp.gid" 2006-10-08 12:01:15 NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "loadedshp_pkey" for table "loadedshp" 2006-10-08 12:01:15 LOG: statement: SELECT AddGeometryColumn('','loadedshp','the_geom','-1','POINT',2); 2006-10-08 12:01:17 LOG: statement: INSERT INTO "loadedshp" (the_geom) VALUES ('01010000000000000000000000000000000000F03F'); 2006-10-08 12:01:18 LOG: statement: INSERT INTO "loadedshp" (the_geom) VALUES ('01010000000000000000002240000000000000F0BF'); 2006-10-08 12:01:18 LOG: statement: INSERT INTO "loadedshp" (the_geom) VALUES ('01010000000000000000002240000000000000F0BF'); 2006-10-08 12:01:18 LOG: statement: END; 2006-10-08 12:01:21 LOG: statement: select asewkt(the_geom) from loadedshp; 2006-10-08 12:01:36 LOG: statement: DROP table loadedshp 2006-10-08 12:01:39 LOG: statement: BEGIN; 2006-10-08 12:01:39 LOG: statement: CREATE TABLE "loadedshp" (gid serial PRIMARY KEY); 2006-10-08 12:01:39 NOTICE: CREATE TABLE will create implicit sequence "loadedshp_gid_seq" for serial column "loadedshp.gid" 2006-10-08 12:01:39 NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "loadedshp_pkey" for table "loadedshp" 2006-10-08 12:01:39 LOG: statement: SELECT AddGeometryColumn('','loadedshp','the_geom','-1','POINT',2); 2006-10-08 12:01:41 LOG: statement: COPY "loadedshp" (the_geom) FROM stdin; 2006-10-08 12:01:41 LOG: statement: END; 2006-10-08 12:01:43 LOG: statement: select asewkt(the_geom) from loadedshp; 2006-10-08 12:02:34 LOG: statement: SELECT postgis_version() 2006-10-08 12:02:34 LOG: statement: SELECT a.attname, a.atttypid, a.attlen, a.atttypmod FROM pg_attribute a, pg_class c WHERE a.attrelid = c.oid and a.attnum > 0 AND a.atttypid != 0 AND c.relname = 'loadedshp' 2006-10-08 12:02:48 LOG: could not receive data from client: No connection could be made because the target machine actively refused it. 2006-10-08 12:02:48 LOG: unexpected EOF on client connection AFAICT the backtrace and server log is indicating that the crash is happening somewhere in libpq. If someone can help me figure out how to load the libpq symbols into MingW's gdb then I can get a better backtrace if required as I can reproduce this 100% of the time. For reference, the source for the application in question can be found at http://svn.refractions.net/postgis/tags/1.1.4/loader/pgsql2shp.c. Many thanks, Mark.
> AFAICT the backtrace and server log is indicating that the > crash is happening somewhere in libpq. If someone can help me > figure out how to load the libpq symbols into MingW's gdb > then I can get a better backtrace if required as I can > reproduce this 100% of the time. For reference, the source > for the application in question can be found at > http://svn.refractions.net/postgis/tags/1.1.4/loader/pgsql2shp.c. If you figure out how to make gdb actually work on mingw, let us know - not many has ever managed to get it wokring, and I don't know of anybody who can make it work repeatedly. That said, libpq builds with Visual C++. Could you try building your pgsql2shp with Visual C++ as well, and then use the Visual C++ debugger (or windbg, really). They should give working backtraces. //Magnus
On Sun, 2006-10-08 at 17:53 +0200, Magnus Hagander wrote: > > AFAICT the backtrace and server log is indicating that the > > crash is happening somewhere in libpq. If someone can help me > > figure out how to load the libpq symbols into MingW's gdb > > then I can get a better backtrace if required as I can > > reproduce this 100% of the time. For reference, the source > > for the application in question can be found at > > http://svn.refractions.net/postgis/tags/1.1.4/loader/pgsql2shp.c. > > If you figure out how to make gdb actually work on mingw, let us know - > not many has ever managed to get it wokring, and I don't know of anybody > who can make it work repeatedly. > > That said, libpq builds with Visual C++. Could you try building your > pgsql2shp with Visual C++ as well, and then use the Visual C++ debugger > (or windbg, really). They should give working backtraces. > > //Magnus Hi Magnus, Getting closer I think. I managed to compile a MSVC libpq but it agreed with the MingW backtrace in that it was jumping into the middle of nowhere :( I think I may be getting closer though: I've just done a comparison build with PostgreSQL 8.1 and noticed that there is an error message is being emitted regarding PGntuples (which is where the crash is occuring): PG 8.1: mca@MCAWINXP ~/postgis/pg81/postgis-1.1.4/loader $ make gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o shpopen.o shpopen.c shpopen.c:176: warning: 'rcsid' defined but not used gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o dbfopen.o dbfopen.c dbfopen.c:206: warning: 'rcsid' defined but not used gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o getopt.o getopt.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o shp2pgsql.o shp2pgsql.c shp2pgsql.c: In function `utf8': shp2pgsql.c:1686: warning: passing arg 2 of `libiconv' from incompatible pointer type gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 shpopen.o dbfopen.o getopt.o shp2pgsql.o -liconv -o shp2pgsql.exe gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -IC:/msys/1.0/home/mca/pg81/REL-81~1.4/include -c pgsql2shp.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o PQunescapeBytea.o PQunescapeBytea.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 shpopen.o dbfopen.o getopt.o PQunescapeBytea.o pgsql2shp.o -liconv C:/msys/1.0/home/mca/pg81/REL-81~1.4/lib/libpq.dll -o pgsql2shp.exe PG 8.2: mca@MCAWINXP ~/postgis/pg82/postgis-1.1.4/loader $ make gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -c -o shpopen.o shpopen.c shpopen.c:176: warning: 'rcsid' defined but not used gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -c -o dbfopen.o dbfopen.c dbfopen.c:206: warning: 'rcsid' defined but not used gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -c -o getopt.o getopt.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -c -o shp2pgsql.o shp2pgsql.c shp2pgsql.c: In function `utf8': shp2pgsql.c:1686: warning: passing arg 2 of `libiconv' from incompatible pointer type gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o getopt.o shp2pgsql.o -liconv -o shp2pgsql.exe gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -IC:/msys/1.0/home/mca/pg82/REL-8~1.2BE/include -c pgsql2shp.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -c -o PQunescapeBytea.o PQunescapeBytea.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o getopt.o PQunescapeBytea.o pgsql2shp.o -liconv C:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib/libpq.dll -o pgsql2shp.exe Info: resolving _PQntuples by linking to __imp__PQntuples (auto-import) I think the key part is this line: "Info: resolving _PQntuples by linking to __imp__PQntuples (auto-import)". Could it be that the linker cannot find a reference to PQntuples and hence is jumping into random code? I have verified that PQntuples does exist within libpq.dll using the Microsoft Dependency Walker though. Kind regards, Mark.
> > > AFAICT the backtrace and server log is indicating that the > > > crash is happening somewhere in libpq. If someone can help me > > > figure out how to load the libpq symbols into MingW's gdb > > > then I can get a better backtrace if required as I can > > > reproduce this 100% of the time. For reference, the source > > > for the application in question can be found at > > > > http://svn.refractions.net/postgis/tags/1.1.4/loader/pgsql2shp.c. > > > > If you figure out how to make gdb actually work on mingw, let us > know - > > not many has ever managed to get it wokring, and I don't know of > anybody > > who can make it work repeatedly. > > > > That said, libpq builds with Visual C++. Could you try building > your > > pgsql2shp with Visual C++ as well, and then use the Visual C++ > debugger > > (or windbg, really). They should give working backtraces. > > > > //Magnus > > > Hi Magnus, > > Getting closer I think. I managed to compile a MSVC libpq but it > agreed > with the MingW backtrace in that it was jumping into the middle of > nowhere :( Oops. Sounds like a generic memory corruption then, overwriting the return stack so the backtrace doesn't work. > I think I may be getting closer though: I've just done a comparison > build with PostgreSQL 8.1 and noticed that there is an error > message is > being emitted regarding PGntuples (which is where the crash is > occuring): > mca@MCAWINXP ~/postgis/pg81/postgis-1.1.4/loader > $ make > gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o shpopen.o > shpopen.c A question based on that - are you using gettext? I know gettext, and possibly iconv, breaks if gettext is compiled with one version of VC++ and the program using it a different version. If you are building with it, try to disable it and see if that's where the problem is from. <snip> > C:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib/libpq.dll -o > pgsql2shp.exe > Info: resolving _PQntuples by linking to __imp__PQntuples (auto- > import) > > > I think the key part is this line: "Info: resolving _PQntuples by > linking to __imp__PQntuples (auto-import)". Could it be that the > linker > cannot find a reference to PQntuples and hence is jumping into > random > code? I have verified that PQntuples does exist within libpq.dll > using > the Microsoft Dependency Walker though. This is fairly normal, and it's just info - not even a warning. If it couldn't find the refenrence, you'd get one of those "could not find entrypoint in DLL" errorboxes when you tried to start the program. It absolutely will not just pick a random memory and jump to. You could possibly do that yourself if you were loading the DLL manually, but since you're not doing that... //Magnus
"Magnus Hagander" <mha@sollentuna.net> writes: >> C:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib/libpq.dll -o pgsql2shp.exe >> Info: resolving _PQntuples by linking to __imp__PQntuples (auto-import) > This is fairly normal, and it's just info - not even a warning. It seems pretty odd that it would only be whinging about PQntuples and not any of the other libpq entry points, though. I think Mark should try to figure out why that is. regards, tom lane
Hi Magnus, I finally got to the bottom of this - it seems that the flags being passed to MingW's linker were incorrect, but instead of erroring out it decided to create a corrupt executable. Here is the command line that was being used to link the pgsql2shp.exe executable, along with the associated auto-import warning: gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o getopt.o PQunescapeBytea.o pgsql2shp.o -liconv C:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib/libpq.dll -o pgsql2shp.exe Info: resolving _PQntuples by linking to __imp__PQntuples (auto-import) Note that libpq.dll is referenced directly with -l which I believe should be an invalid syntax. This produces a corrupt executable that crashes whenever PQntuples is accessed. On the other hand, a correct executable can be realised by linking like this: gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o getopt.o PQunescapeBytea.o pgsql2shp.o -liconv -LC:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib -lpq -o pgsql2shp.exe Note there is no auto-import warning, and the use of -L and -l is how I would expect. In actual fact, the incorrect link line was being produced by an error in the configure.in script, so this won't be a scenario that most people will experience. The executables linked using the second method now work properly without crashing during regression. The big mystery is that the command line used to link the executables has been like that for several versions now, and I have absolutely no idea why it only triggered this failure when being linked against 8.2beta1 when it works perfectly on 8.1 and 8.0, and also why only PQntuples was affected. Kind regards, Mark.