Thread: Backend dies creating plpgsql procedures (with reproducible example!)

Backend dies creating plpgsql procedures (with reproducible example!)

From
Wayne Piekarski
Date:
============================================================================
                        POSTGRESQL BUG REPORT TEMPLATE
============================================================================


Your name               : Wayne Piekarski / Matt Altus
Your email address      : wayne@senet.com.au

Category                : runtime: back-end
Severity                : critical

Summary: Backend dies creating plpgsql procedures (with reproducible example!)

System Configuration
--------------------
  Operating System   : FreeBSD 2.2.7 A.OUT

  PostgreSQL version : 6.5 and 6.4.2

  Compiler used      : gcc 2.7.2.1

Hardware:
---------
Pentium-II 450, 384 mb RAM, Big disk

Versions of other tools:
------------------------
All GNU tools

--------------------------------------------------------------------------

Problem Description:
--------------------

In a previous email I replied to someone else where they were saying that
the backend would die inserting functions or something along those lines.
(I can't really remember exactly what it was). I've seen this a few times
but was never able to create a test case which would reproduce this. Well,
the other day I did a pg_dump of our 6.4.2 database and tried to load it
back into 6.5 - it failed with the error message:

FATAL 1:  btree: failed to add item to the page

Also, I've seen other errors such as "all my bits have fallen off the earth" or
something along those lines.

This is pretty freaky stuff as you'd imagine. I tried to load the same dump file
into 6.4.2 and we got the same problem as well! So this is bad news as we need
to play with the dump file to get it to reload properly. This means that
our dump files from pg_dump are not really useful without some hacking.

So, we did some testing and managed to produce a simple test case
(although it is a bit long) that causes it to happen all the time, so it
is possible to attach gdb or something to the postgres process to find out
what is happening.

The test script is included below:

Program runs with -o -F switch for no fsync

If you need anything such as a core dump (I don't have a pgsql6.5 compiled
with -g, but i can do this if required) then I will be happy to supply it.


--------------------------------------------------------------------------

Test Case:
----------
I had to paste this into Netscape for the form here, so hopefully it made
it into the email correctly:

Do a:

destroydb tester
createdb tester

and then send this into psql tester

[see the attached file, the mailer daemon wouldn't let me submit this email
unless I compressed it and sent it as an attachment]

--------------------------------------------------------------------------

Solution:
---------
Sorry, no patch for this. I couldn't understand the code on initial inspection.

--------------------------------------------------------------------------


thanks,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au



Attachment
Wayne Piekarski <wayne@senet.com.au> writes:
> the other day I did a pg_dump of our 6.4.2 database and tried to load it
> back into 6.5 - it failed with the error message:

> FATAL 1:  btree: failed to add item to the page

IIRC this just means the tuple is too long ... btrees want to be able to
fit at least two tuples per disk page, so indexed fields can't exceed
4k bytes in a stock installation.  Sometimes you'll get away with more,
but not if two such keys end up on the same btree page.

It's not real clear to me *why* we are keeping an index on the prosrc
field of pg_proc, but we evidently are, so plpgsql source code can't
safely exceed 4k per proc as things stand.

In short, it was only by chance that you were able to put this set of
procs into 6.4 in the first place :-(

Can any hackers comment on whether pg_proc_prosrc_index is really
necessary??  Just dropping it would allow plpgsql sources to approach 8k,
and I can't think of any scenario where it's needed...

BTW, Jan has been muttering about compressing plpgsql source, which
would provide some more breathing room for big procs, but not before 6.6.

            regards, tom lane

> It's not real clear to me *why* we are keeping an index on the prosrc
> field of pg_proc, but we evidently are, so plpgsql source code can't
> safely exceed 4k per proc as things stand.
>
> In short, it was only by chance that you were able to put this set of
> procs into 6.4 in the first place :-(
>
> Can any hackers comment on whether pg_proc_prosrc_index is really
> necessary??  Just dropping it would allow plpgsql sources to approach 8k,
> and I can't think of any scenario where it's needed...
>
> BTW, Jan has been muttering about compressing plpgsql source, which
> would provide some more breathing room for big procs, but not before 6.6.

Good question.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

> Wayne Piekarski <wayne@senet.com.au> writes:
> > the other day I did a pg_dump of our 6.4.2 database and tried to load it
> > back into 6.5 - it failed with the error message:
>
> > FATAL 1:  btree: failed to add item to the page
>
> IIRC this just means the tuple is too long ... btrees want to be able to
> fit at least two tuples per disk page, so indexed fields can't exceed
> 4k bytes in a stock installation.  Sometimes you'll get away with more,
> but not if two such keys end up on the same btree page.

Ok, well this is quite interesting actually. The test example I sent had
very large procedures, but my actual real life case contains functions
with length(prosrc) = 2082, 2059, 18888, 1841, 1525 ... etc bytes long. So
I am nowhere near 4096 bytes, but I have crossed the 2048 byte boundary.

The error message is the same for both my test case and the real life
pg_dump so I'm not sure what this indicates. Is the problem actually at
2048 bytes?

> It's not real clear to me *why* we are keeping an index on the prosrc
> field of pg_proc, but we evidently are, so plpgsql source code can't
> safely exceed 4k per proc as things stand.
>
> In short, it was only by chance that you were able to put this set of
> procs into 6.4 in the first place :-(

Yeah, this makes sense now. When we used to reload our procedures, I
always did a vacuum before hand which seemed to make it more reliable, and
then we would only replace one function at a time (ie, never a bulk reload
of all our functions).

Every so often we'd have a problem when playing with test databases, but
we were always careful with our real one so managed to avoid it.

> > Can any hackers comment on whether pg_proc_prosrc_index is really
> necessary??  Just dropping it would allow plpgsql sources to approach 8k,
> and I can't think of any scenario where it's needed...

Eeeep! I went and tried this and got some really bizarre behaviour:

psql>UPDATE pg_class SET relname = 'dog' WHERE relname ='pg_proc_prosrc_index';
postgres> mv pg_proc_prosrc_index dog
psql> DROP INDEX pg_proc_prosrc_index;

Then, whenever I try to insert a function into pg_proc:

create function "test" (int4, text) RETURNS int4 AS
'/home/postgres/functions.so' LANGUAGE 'c';

The backend dies, but the errlog contains no error message at all.
/var/log/messages says the backend died with a segmentation fault. Eeep!


So I don't know why this is dying, is the way I dropped the index ok? I
couldn't think of any other way to do this because the backend won't let
me drop or work on any pg_* tables.

> BTW, Jan has been muttering about compressing plpgsql source, which
> would provide some more breathing room for big procs, but not before 6.6.

I would be happy to drop the pg_proc_prosrc_index - now that I know the
limits of plpgsql functions I can rewrite them to call other functions or
something like that to make sure they fit within 4k, but mine are dying at
2k as well, which is bad :(

I personally would think the prosrc index could go because what kind of
query could possibly use this index?


thanks for your help,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au