Thread: SPI_getvalue problem

SPI_getvalue problem

From
Alex Guryanow
Date:
Hi,

I have the following problem: the backend crashes on Solaris while executing the function
SPI_getvalue.

I have small database on Intel/Linux that works fine. Now I need to transfer them to
Sparc/Solaris. This database contains a small trigger written in C++. After installing
postgresql-7.0.3 on Solaris and compiling the trigger function  by inserting a new row in table I
receive the following:

pqReadData() -- backend closed the channel unexpectedly.
        This probably means the backend terminated abnormally
        before or while processing the request.
connection to server was lost

And in postgres log I receive:

DEBUG:  start update_prednet_words function
DEBUG:  before 1 SPI
DEBUG:  triggered for relation magazine
Server process (pid 1397) exited with status 139 at Sun Jan 28 17:23:48 2001
Terminating any active server processes...
Server processes were terminated at Sun Jan 28 17:23:48 2001
Reinitializing shared memory and semaphores
DEBUG:  Data Base System is starting up at Sun Jan 28 17:23:48 2001
DEBUG:  Data Base System was interrupted being in production at Sun Jan 28 17:23:30 2001
DEBUG:  Data Base System is in production state at Sun Jan 28 17:23:48 2001

First 3 lines are from my C-function, after logging the string "triggered for relation ..." it calls
SPI_getvalue. And the backend crashes. And this happens on Solaris 2.6 and Solaris 2.8 for both
postgresql-7.0.2 and 7.0.3. At the same time on Linux this trigger works fine.

Is this a bug or my misconfuguration?

My setup is
   Linux: RedHat 6.2, Intel Pentium II-400, 128 Mb
   Solris: SunOS 5.8 Generic_108528-02 sun4u sparc SUNW,Ultra-60,
           SunOS 5.6 Generic_105181-19 sun4u sparc SUNW,Ultra-1
postgres is configured in the following way:
         ./configure --enable-locale --enable-multibyte=UNICODE

Best regards,
Alex Guryanow



Re: SPI_getvalue problem

From
Tom Lane
Date:
Alex Guryanow <gav@nlr.ru> writes:
> I have the following problem: the backend crashes on Solaris while
> executing the function SPI_getvalue.
> At the same time on Linux this trigger works fine.
> Is this a bug or my misconfuguration?

Sounds like a bug to me, but you haven't demonstrated that the bug is in
SPI_getvalue and not in your own code.  The first thing I'd wonder about
is if your trigger function is checking for NULL value before calling
SPI_getvalue (or at least before trying to do anything useful with the
result).

The platform dependency of the failure might just be due to a
configuration difference, for example whether the program is set up
to force SIGSEGV on a null-pointer dereference or not.

            regards, tom lane

Re[2]: SPI_getvalue problem

From
Alex Guryanow
Date:
Sunday, January 28, 2001, 8:24:51 PM Tom wrote:

TL> Alex Guryanow <gav@nlr.ru> writes:
>> I have the following problem: the backend crashes on Solaris while
>> executing the function SPI_getvalue.
>> At the same time on Linux this trigger works fine.
>> Is this a bug or my misconfuguration?

TL> Sounds like a bug to me, but you haven't demonstrated that the bug is in
TL> SPI_getvalue and not in your own code.  The first thing I'd wonder about
TL> is if your trigger function is checking for NULL value before calling
TL> SPI_getvalue (or at least before trying to do anything useful with the
TL> result).

You have right. One of the values passed to SPI_getvalue is NULL. This is second parameter
(tupdesc). But why on Linux it is not NULL, and on Solaris is?

Here is part of my trigger-code:

    rel = CurrentTriggerData->tg_relation;
    trigtuple = CurrentTriggerData->tg_trigtuple;
    newtuple = CurrentTriggerData->tg_newtuple;
    tupdesc = rel->rd_att;

    elog(DEBUG, "before 1 SPI");
    elog(DEBUG, "triggered for relation %s", SPI_getrelname(CurrentTriggerData->tg_relation) );
    id = atoi( SPI_getvalue( trigtuple, tupdesc, 1 ) );
    elog( DEBUG, "before 1.5 SPI" );    // !!! this isn't called in Solaris


Best regards,
Alex



Re: Re[2]: SPI_getvalue problem

From
Tom Lane
Date:
Alex Guryanow <gav@nlr.ru> writes:
> You have right. One of the values passed to SPI_getvalue is NULL. This is second parameter
> (tupdesc). But why on Linux it is not NULL, and on Solaris is?

> Here is part of my trigger-code:

>     rel = CurrentTriggerData->tg_relation;
>     trigtuple = CurrentTriggerData->tg_trigtuple;
>     newtuple = CurrentTriggerData->tg_newtuple;
>     tupdesc = rel->rd_att;

>     elog(DEBUG, "before 1 SPI");
>     elog(DEBUG, "triggered for relation %s", SPI_getrelname(CurrentTriggerData->tg_relation) );
>     id = atoi( SPI_getvalue( trigtuple, tupdesc, 1 ) );
>     elog( DEBUG, "before 1.5 SPI" );    // !!! this isn't called in Solaris


Are you sure that rel->rd_att is null?  That seems extremely improbable.
What I think is more likely is that the first column of the tuple
contains a SQL NULL, and consequently SPI_getvalue returns a NULL
pointer.  You are passing that NULL to atoi() without any check.
I'm not sure what Linux' atoi() does on NULL input, but a coredump
on Solaris is very believable ...

            regards, tom lane

Re[4]: SPI_getvalue problem

From
Alex Guryanow
Date:
Monday, January 29, 2001, 9:33:06 AM Tom wrote:

TL> Alex Guryanow <gav@nlr.ru> writes:
>> You have right. One of the values passed to SPI_getvalue is NULL. This is second parameter
>> (tupdesc). But why on Linux it is not NULL, and on Solaris is?

>> Here is part of my trigger-code:

>>     rel = CurrentTriggerData->tg_relation;
>>     trigtuple = CurrentTriggerData->tg_trigtuple;
>>     newtuple = CurrentTriggerData->tg_newtuple;
>>     tupdesc = rel->rd_att;

>>     elog(DEBUG, "before 1 SPI");
>>     elog(DEBUG, "triggered for relation %s", SPI_getrelname(CurrentTriggerData->tg_relation) );
>>     id = atoi( SPI_getvalue( trigtuple, tupdesc, 1 ) );
>>     elog( DEBUG, "before 1.5 SPI" );    // !!! this isn't called in Solaris


TL> Are you sure that rel->rd_att is null?  That seems extremely improbable.

I have added the followind code after

 tupdesc = rel->rd_att; // (see above)

        if( tupdesc == NULL || trigtuple == NULL ){
                elog( DEBUG, "tupdesc OR trigtuple == NULL" );
                if( tupdesc == NULL )
                        elog( DEBUG, "tupdesc is NULL" );
                if( trigtuple == NULL )
                        elog( DEBUG, "trigtuple is NULL" );
        }
        else
                elog( DEBUG, "tupdesc && trigtuple ARE NOT NULL" );

And here is what I see in my postmaster.log:


DEBUG:  tupdesc OR trigtuple == NULL
DEBUG:  tupdesc is NULL
DEBUG:  before 1 SPI
DEBUG:  triggered for relation magazine
Server process (pid 925) exited with status 139 at Mon Jan 29 08:59:23 2001
Terminating any active server processes...
Server processes were terminated at Mon Jan 29 08:59:23 2001
Reinitializing shared memory and semaphores

On Linux instead first two lines log contains

DEBUG:  tupdesc && trigtuple ARE NOT NULL


Best regards,
Alex



Re: Re[4]: SPI_getvalue problem

From
Tom Lane
Date:
Alex Guryanow <gav@nlr.ru> writes:
> DEBUG:  tupdesc is NULL

Hm.  Well, I can assure you that rd_att will *never* be null in a valid
relation cache entry.  So there is something wrong with either
CurrentTriggerData, the relation pointer, or your function's
interpretation of the structures.

A thought that comes to mind here is that perhaps your function was
compiled against the wrong set of header files, causing it to think
that rd_att is at a different offset in the relation struct than what
the backend thinks.  Do you have more than one version of Postgres
installed on the Solaris machine, and if so where are you picking up
the header files while building the library?

            regards, tom lane

Re[6]: SPI_getvalue problem

From
Alex Guryanow
Date:
Monday, January 29, 2001, 9:49:32 AM Tom wrote:

TL> Alex Guryanow <gav@nlr.ru> writes:
>> DEBUG:  tupdesc is NULL

TL> Hm.  Well, I can assure you that rd_att will *never* be null in a valid
TL> relation cache entry.  So there is something wrong with either
TL> CurrentTriggerData, the relation pointer, or your function's
TL> interpretation of the structures.

TL> A thought that comes to mind here is that perhaps your function was
TL> compiled against the wrong set of header files, causing it to think
TL> that rd_att is at a different offset in the relation struct than what
TL> the backend thinks.  Do you have more than one version of Postgres
TL> installed on the Solaris machine,

Yes, but they are 7.0.2 and 7.0.3

TL> and if so where are you picking up
TL> the header files while building the library?

Here is my  Makefile for trigger

install: test4.so
        cp -f test4.so /home/gav/pgsql/lib/trigger.so

test4.so: test4.o
        ld -G -Bdynamic -o test4.so test4.o /usr/local/lib/libicuuc.so

test4.o: test4.c
        g++ -c -o test4.o -I /home/gav/pgsql/include -I /home/gav/postgresql-7.0.3/src/include test4.c

/home/gav/postgresql-7.0.3 is catalog with sources
/home/gav/pgsql is target for installation (./configure --prefix=/home/gav/pgsql ...)

By the way. To compile using g++ I have added two lines to src/include/nodes/parsenodes.h:

   #ifndef PARSENODES_H
   #define PARSENODES_H

+  #define typename gav_typename
+  #define class gav_class
   #include "nodes/primnodes.h"

because without them g++ encounters some errors. Possible this is the cause?

Best regards,
Alex



Re: Re[6]: SPI_getvalue problem

From
Tom Lane
Date:
Alex Guryanow <gav@nlr.ru> writes:
> By the way. To compile using g++ I have added two lines to
> src/include/nodes/parsenodes.h:

How big does g++ think type bool is?  There are several bool fields
in struct RelationData ...

            regards, tom lane

Re[8]: SPI_getvalue problem

From
Alex Guryanow
Date:
Monday, January 29, 2001, 5:40:46 PM, you wrote:

TL> Alex Guryanow <gav@nlr.ru> writes:
>> By the way. To compile using g++ I have added two lines to
>> src/include/nodes/parsenodes.h:

TL> How big does g++ think type bool is?  There are several bool fields
TL> in struct RelationData ...

It thinks bool is 4 bytes long:

--- a.cpp ---
#include <stdio.h>
int main(void)
{
    printf("sizeof(bool) = %d\n", sizeof(bool));
    return 0;
}
--- make command ---
g++ a.cpp
--- results ---
sizeof(bool) = 4

And /src/include/c.h from postgres distribution contains

#ifndef __cplusplus
#ifndef bool
typedef char bool;
#endif
#endif

What should I do to be able to compile my function using g++?
Should I make changes in c.h to define bool as int and then rebuild
backend? Or this is a bad idea?

Best regards,
Alex



Re: Re[8]: SPI_getvalue problem

From
Tom Lane
Date:
Alex Guryanow <gav@nlr.ru> writes:
> What should I do to be able to compile my function using g++?

I think you should forget about it and use gcc :-(

> Should I make changes in c.h to define bool as int and then rebuild
> backend? Or this is a bad idea?

Very bad.  bool has to be 1 byte, unless you care to indulge in
delicate surgery on a bunch of system catalogs.

            regards, tom lane