Thread: SPI_getvalue problem
Hi, I have the following problem: the backend crashes on Solaris while executing the function SPI_getvalue. I have small database on Intel/Linux that works fine. Now I need to transfer them to Sparc/Solaris. This database contains a small trigger written in C++. After installing postgresql-7.0.3 on Solaris and compiling the trigger function by inserting a new row in table I receive the following: pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. connection to server was lost And in postgres log I receive: DEBUG: start update_prednet_words function DEBUG: before 1 SPI DEBUG: triggered for relation magazine Server process (pid 1397) exited with status 139 at Sun Jan 28 17:23:48 2001 Terminating any active server processes... Server processes were terminated at Sun Jan 28 17:23:48 2001 Reinitializing shared memory and semaphores DEBUG: Data Base System is starting up at Sun Jan 28 17:23:48 2001 DEBUG: Data Base System was interrupted being in production at Sun Jan 28 17:23:30 2001 DEBUG: Data Base System is in production state at Sun Jan 28 17:23:48 2001 First 3 lines are from my C-function, after logging the string "triggered for relation ..." it calls SPI_getvalue. And the backend crashes. And this happens on Solaris 2.6 and Solaris 2.8 for both postgresql-7.0.2 and 7.0.3. At the same time on Linux this trigger works fine. Is this a bug or my misconfuguration? My setup is Linux: RedHat 6.2, Intel Pentium II-400, 128 Mb Solris: SunOS 5.8 Generic_108528-02 sun4u sparc SUNW,Ultra-60, SunOS 5.6 Generic_105181-19 sun4u sparc SUNW,Ultra-1 postgres is configured in the following way: ./configure --enable-locale --enable-multibyte=UNICODE Best regards, Alex Guryanow
Alex Guryanow <gav@nlr.ru> writes: > I have the following problem: the backend crashes on Solaris while > executing the function SPI_getvalue. > At the same time on Linux this trigger works fine. > Is this a bug or my misconfuguration? Sounds like a bug to me, but you haven't demonstrated that the bug is in SPI_getvalue and not in your own code. The first thing I'd wonder about is if your trigger function is checking for NULL value before calling SPI_getvalue (or at least before trying to do anything useful with the result). The platform dependency of the failure might just be due to a configuration difference, for example whether the program is set up to force SIGSEGV on a null-pointer dereference or not. regards, tom lane
Sunday, January 28, 2001, 8:24:51 PM Tom wrote: TL> Alex Guryanow <gav@nlr.ru> writes: >> I have the following problem: the backend crashes on Solaris while >> executing the function SPI_getvalue. >> At the same time on Linux this trigger works fine. >> Is this a bug or my misconfuguration? TL> Sounds like a bug to me, but you haven't demonstrated that the bug is in TL> SPI_getvalue and not in your own code. The first thing I'd wonder about TL> is if your trigger function is checking for NULL value before calling TL> SPI_getvalue (or at least before trying to do anything useful with the TL> result). You have right. One of the values passed to SPI_getvalue is NULL. This is second parameter (tupdesc). But why on Linux it is not NULL, and on Solaris is? Here is part of my trigger-code: rel = CurrentTriggerData->tg_relation; trigtuple = CurrentTriggerData->tg_trigtuple; newtuple = CurrentTriggerData->tg_newtuple; tupdesc = rel->rd_att; elog(DEBUG, "before 1 SPI"); elog(DEBUG, "triggered for relation %s", SPI_getrelname(CurrentTriggerData->tg_relation) ); id = atoi( SPI_getvalue( trigtuple, tupdesc, 1 ) ); elog( DEBUG, "before 1.5 SPI" ); // !!! this isn't called in Solaris Best regards, Alex
Alex Guryanow <gav@nlr.ru> writes: > You have right. One of the values passed to SPI_getvalue is NULL. This is second parameter > (tupdesc). But why on Linux it is not NULL, and on Solaris is? > Here is part of my trigger-code: > rel = CurrentTriggerData->tg_relation; > trigtuple = CurrentTriggerData->tg_trigtuple; > newtuple = CurrentTriggerData->tg_newtuple; > tupdesc = rel->rd_att; > elog(DEBUG, "before 1 SPI"); > elog(DEBUG, "triggered for relation %s", SPI_getrelname(CurrentTriggerData->tg_relation) ); > id = atoi( SPI_getvalue( trigtuple, tupdesc, 1 ) ); > elog( DEBUG, "before 1.5 SPI" ); // !!! this isn't called in Solaris Are you sure that rel->rd_att is null? That seems extremely improbable. What I think is more likely is that the first column of the tuple contains a SQL NULL, and consequently SPI_getvalue returns a NULL pointer. You are passing that NULL to atoi() without any check. I'm not sure what Linux' atoi() does on NULL input, but a coredump on Solaris is very believable ... regards, tom lane
Monday, January 29, 2001, 9:33:06 AM Tom wrote: TL> Alex Guryanow <gav@nlr.ru> writes: >> You have right. One of the values passed to SPI_getvalue is NULL. This is second parameter >> (tupdesc). But why on Linux it is not NULL, and on Solaris is? >> Here is part of my trigger-code: >> rel = CurrentTriggerData->tg_relation; >> trigtuple = CurrentTriggerData->tg_trigtuple; >> newtuple = CurrentTriggerData->tg_newtuple; >> tupdesc = rel->rd_att; >> elog(DEBUG, "before 1 SPI"); >> elog(DEBUG, "triggered for relation %s", SPI_getrelname(CurrentTriggerData->tg_relation) ); >> id = atoi( SPI_getvalue( trigtuple, tupdesc, 1 ) ); >> elog( DEBUG, "before 1.5 SPI" ); // !!! this isn't called in Solaris TL> Are you sure that rel->rd_att is null? That seems extremely improbable. I have added the followind code after tupdesc = rel->rd_att; // (see above) if( tupdesc == NULL || trigtuple == NULL ){ elog( DEBUG, "tupdesc OR trigtuple == NULL" ); if( tupdesc == NULL ) elog( DEBUG, "tupdesc is NULL" ); if( trigtuple == NULL ) elog( DEBUG, "trigtuple is NULL" ); } else elog( DEBUG, "tupdesc && trigtuple ARE NOT NULL" ); And here is what I see in my postmaster.log: DEBUG: tupdesc OR trigtuple == NULL DEBUG: tupdesc is NULL DEBUG: before 1 SPI DEBUG: triggered for relation magazine Server process (pid 925) exited with status 139 at Mon Jan 29 08:59:23 2001 Terminating any active server processes... Server processes were terminated at Mon Jan 29 08:59:23 2001 Reinitializing shared memory and semaphores On Linux instead first two lines log contains DEBUG: tupdesc && trigtuple ARE NOT NULL Best regards, Alex
Alex Guryanow <gav@nlr.ru> writes: > DEBUG: tupdesc is NULL Hm. Well, I can assure you that rd_att will *never* be null in a valid relation cache entry. So there is something wrong with either CurrentTriggerData, the relation pointer, or your function's interpretation of the structures. A thought that comes to mind here is that perhaps your function was compiled against the wrong set of header files, causing it to think that rd_att is at a different offset in the relation struct than what the backend thinks. Do you have more than one version of Postgres installed on the Solaris machine, and if so where are you picking up the header files while building the library? regards, tom lane
Monday, January 29, 2001, 9:49:32 AM Tom wrote: TL> Alex Guryanow <gav@nlr.ru> writes: >> DEBUG: tupdesc is NULL TL> Hm. Well, I can assure you that rd_att will *never* be null in a valid TL> relation cache entry. So there is something wrong with either TL> CurrentTriggerData, the relation pointer, or your function's TL> interpretation of the structures. TL> A thought that comes to mind here is that perhaps your function was TL> compiled against the wrong set of header files, causing it to think TL> that rd_att is at a different offset in the relation struct than what TL> the backend thinks. Do you have more than one version of Postgres TL> installed on the Solaris machine, Yes, but they are 7.0.2 and 7.0.3 TL> and if so where are you picking up TL> the header files while building the library? Here is my Makefile for trigger install: test4.so cp -f test4.so /home/gav/pgsql/lib/trigger.so test4.so: test4.o ld -G -Bdynamic -o test4.so test4.o /usr/local/lib/libicuuc.so test4.o: test4.c g++ -c -o test4.o -I /home/gav/pgsql/include -I /home/gav/postgresql-7.0.3/src/include test4.c /home/gav/postgresql-7.0.3 is catalog with sources /home/gav/pgsql is target for installation (./configure --prefix=/home/gav/pgsql ...) By the way. To compile using g++ I have added two lines to src/include/nodes/parsenodes.h: #ifndef PARSENODES_H #define PARSENODES_H + #define typename gav_typename + #define class gav_class #include "nodes/primnodes.h" because without them g++ encounters some errors. Possible this is the cause? Best regards, Alex
Alex Guryanow <gav@nlr.ru> writes: > By the way. To compile using g++ I have added two lines to > src/include/nodes/parsenodes.h: How big does g++ think type bool is? There are several bool fields in struct RelationData ... regards, tom lane
Monday, January 29, 2001, 5:40:46 PM, you wrote: TL> Alex Guryanow <gav@nlr.ru> writes: >> By the way. To compile using g++ I have added two lines to >> src/include/nodes/parsenodes.h: TL> How big does g++ think type bool is? There are several bool fields TL> in struct RelationData ... It thinks bool is 4 bytes long: --- a.cpp --- #include <stdio.h> int main(void) { printf("sizeof(bool) = %d\n", sizeof(bool)); return 0; } --- make command --- g++ a.cpp --- results --- sizeof(bool) = 4 And /src/include/c.h from postgres distribution contains #ifndef __cplusplus #ifndef bool typedef char bool; #endif #endif What should I do to be able to compile my function using g++? Should I make changes in c.h to define bool as int and then rebuild backend? Or this is a bad idea? Best regards, Alex
Alex Guryanow <gav@nlr.ru> writes: > What should I do to be able to compile my function using g++? I think you should forget about it and use gcc :-( > Should I make changes in c.h to define bool as int and then rebuild > backend? Or this is a bad idea? Very bad. bool has to be 1 byte, unless you care to indulge in delicate surgery on a bunch of system catalogs. regards, tom lane