Thread: consistency check on SPI tuple count failed

consistency check on SPI tuple count failed

From
"Gaetano Mendola"
Date:
Hi all,
the following code was working properly under Postgres 7.3.X
I'm now running my regression test with Postgres 7.4beta1 and I'm
having the error in subj.

CREATE TABLE test ( a integer, b integer );

INSERT INTO test VALUES ( 1 );

CREATE OR REPLACE FUNCTION foo(INTEGER)
RETURNS INTEGER AS'
BEGIN    RETURN $1 + 1;
END;
' LANGUAGE 'plpgsql';


CREATE OR REPLACE FUNCTION bar()
RETURNS INTEGER AS'
DECLARE   my_ret RECORD;
BEGIN        FOR my_ret IN         SELECT foo(a) AS ret         FROM test    LOOP         IF my_ret.ret = 3 THEN
       RETURN -1;         END IF;                       END LOOP;
 
    RETURN 0;

END;
' LANGUAGE 'plpgsql';



Regards
Gaetano Mendola




Re: consistency check on SPI tuple count failed

From
"Gaetano Mendola"
Date:
I forgot to say to do a:

select bar()

at the end!


Gaetano



Re: consistency check on SPI tuple count failed

From
Tom Lane
Date:
"Gaetano Mendola" <mendola@bigfoot.com> writes:
> the following code was working properly under Postgres 7.3.X
> I'm now running my regression test with Postgres 7.4beta1 and I'm
> having the error in subj.

I tried this and got

regression=# select bar();bar
-----  0
(1 row)

regression=#

Anyone else see the problem?
        regards, tom lane


Re: consistency check on SPI tuple count failed

From
Stephan Szabo
Date:
On Fri, 8 Aug 2003, Tom Lane wrote:

> "Gaetano Mendola" <mendola@bigfoot.com> writes:
> > the following code was working properly under Postgres 7.3.X
> > I'm now running my regression test with Postgres 7.4beta1 and I'm
> > having the error in subj.
>
> I tried this and got
>
> regression=# select bar();
>  bar
> -----
>    0
> (1 row)
>
> regression=#
>
> Anyone else see the problem?

I got the same thing as Gaetano on my just prior to beta1 system.



Re: consistency check on SPI tuple count failed

From
Rod Taylor
Date:
On Fri, 2003-08-08 at 11:55, Tom Lane wrote:
> "Gaetano Mendola" <mendola@bigfoot.com> writes:
> > the following code was working properly under Postgres 7.3.X
> > I'm now running my regression test with Postgres 7.4beta1 and I'm
> > having the error in subj.
>
> I tried this and got
>
> regression=# select bar();
>  bar
> -----
>    0
> (1 row)
>
> regression=#
>
> Anyone else see the problem?

Bar gives 0 for me as well.

Re: consistency check on SPI tuple count failed

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> I got the same thing as Gaetano on my just prior to beta1 system.

Well, we couldn't have fixed it since beta1 --- there's been no changes
anywhere near SPI.  I'm thinking it must be platform-dependent.  What
are you guys using, exactly?
        regards, tom lane


Re: consistency check on SPI tuple count failed

From
Stephan Szabo
Date:
On Fri, 8 Aug 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > I got the same thing as Gaetano on my just prior to beta1 system.
>
> Well, we couldn't have fixed it since beta1 --- there's been no changes
> anywhere near SPI.  I'm thinking it must be platform-dependent.  What
> are you guys using, exactly?

I'm using RedHat 9.




Re: consistency check on SPI tuple count failed

From
"Mendola Gaetano"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us>
> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > I got the same thing as Gaetano on my just prior to beta1 system.
>
> Well, we couldn't have fixed it since beta1 --- there's been no changes
> anywhere near SPI.  I'm thinking it must be platform-dependent.  What
> are you guys using, exactly?
>
> regards, tom lane

kalman=# select version();                                                 version
----------------------------------------------------------------------------
--------------------------------PostgreSQL 7.4beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2
20030222 (Red Hat Linux 3.2.2-5)
(1 row)



Regards
Gateano Mendola




Re: consistency check on SPI tuple count failed

From
"Mendola Gaetano"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> "Gaetano Mendola" <mendola@bigfoot.com> writes:
> > the following code was working properly under Postgres 7.3.X
> > I'm now running my regression test with Postgres 7.4beta1 and I'm
> > having the error in subj.
> 
> I tried this and got
> 
> regression=# select bar();
>  bar
> -----
>    0
> (1 row)
> 
> regression=#
> 
> Anyone else see the problem?
> 
> regards, tom lane

Incredible to believe but after playng around  that funcion started
to work. I'm not crazy.

I deleted the DB. 
Stopped postgres. 
Restart postgres.
Create the DB.
Create the language. 
Inserted my example.

Again the error:

kalman=# select bar();
ERROR:  consistency check on SPI tuple count failed
CONTEXT:  PL/pgSQL function "bar" line 5 at for over select rows
kalman=# select bar();
ERROR:  consistency check on SPI tuple count failed
CONTEXT:  PL/pgSQL function "bar" line 5 at for over select rows
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or
whileprocessing the request.
 
The connection to the server was lost. Attempting reset: Failed.

Gaetano







Re: consistency check on SPI tuple count failed

From
Tom Lane
Date:
"Mendola Gaetano" <mendola@bigfoot.com> writes:
> Again the error:

> kalman=# select bar();
> ERROR:  consistency check on SPI tuple count failed
> CONTEXT:  PL/pgSQL function "bar" line 5 at for over select rows
> kalman=# select bar();
> ERROR:  consistency check on SPI tuple count failed
> CONTEXT:  PL/pgSQL function "bar" line 5 at for over select rows
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

After adding a second row to the test table, I am able to reproduce
the above (including the core dump after second try) on an intel/linux
box, but *not* on HPUX.

I now suspect a memory-stomp kind of problem, like someone writing one
too many bytes in a struct.  HPUX tends to mask these in situations
where intel will not, because it uses MAXALIGN 8 rather than 4.

I have also just traced through _SPI_cursor_operation() in spi.c,
watched PortalRunFetch return 2, and then watched _SPI_checktuples read
zero from _SPI_current->processed.  How the heck could that happen?
Compiler bug, or am I just crazy?
        regards, tom lane


Re: consistency check on SPI tuple count failed

From
Stephan Szabo
Date:
On Fri, 8 Aug 2003, Tom Lane wrote:

> "Mendola Gaetano" <mendola@bigfoot.com> writes:
> > Again the error:
>
> > kalman=# select bar();
> > ERROR:  consistency check on SPI tuple count failed
> > CONTEXT:  PL/pgSQL function "bar" line 5 at for over select rows
> > kalman=# select bar();
> > ERROR:  consistency check on SPI tuple count failed
> > CONTEXT:  PL/pgSQL function "bar" line 5 at for over select rows
> > server closed the connection unexpectedly
> >         This probably means the server terminated abnormally
> >         before or while processing the request.
> > The connection to the server was lost. Attempting reset: Failed.
>
> After adding a second row to the test table, I am able to reproduce
> the above (including the core dump after second try) on an intel/linux
> box, but *not* on HPUX.
>
> I now suspect a memory-stomp kind of problem, like someone writing one
> too many bytes in a struct.  HPUX tends to mask these in situations
> where intel will not, because it uses MAXALIGN 8 rather than 4.
>
> I have also just traced through _SPI_cursor_operation() in spi.c,
> watched PortalRunFetch return 2, and then watched _SPI_checktuples read
> zero from _SPI_current->processed.  How the heck could that happen?
> Compiler bug, or am I just crazy?

Not sure, but I got the same thing.  When I changed it to put the
result in a temporary int variable and then put it in it started
working for me (returning 0), reverting to the original made it fail
again.  I'm going to try -O0 and see what happens there.




Re: consistency check on SPI tuple count failed

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Fri, 8 Aug 2003, Tom Lane wrote:
>> I have also just traced through _SPI_cursor_operation() in spi.c,
>> watched PortalRunFetch return 2, and then watched _SPI_checktuples read
>> zero from _SPI_current->processed.  How the heck could that happen?
>> Compiler bug, or am I just crazy?

> Not sure, but I got the same thing.  When I changed it to put the
> result in a temporary int variable and then put it in it started
> working for me (returning 0), reverting to the original made it fail
> again.  I'm going to try -O0 and see what happens there.

Oooohhhh ...

<lightbulb>
SPI_stack can move around as functions are entered/exited.
</lightbulb>

Wonder why we've not seen that kind of failure happen before?  Someone
(doubtless me) must have changed the coding of this routine since 7.3.
        regards, tom lane


Re: consistency check on SPI tuple count failed

From
Tom Lane
Date:
"Mendola Gaetano" <mendola@bigfoot.com> writes:
> Incredible to believe but after playng around  that funcion started
> to work. I'm not crazy.

Yeah, it was a problem with storing into a possibly-obsolete pointer ---
the visible effects could range from nothing to a core dump depending on
whether the pointer was really out-of-date and what got clobbered if it
was.

Fix is in CVS.
        regards, tom lane