Thread: BUG #4208: Server crashes on insert into gist index

BUG #4208: Server crashes on insert into gist index

From
"Ron Mackley"
Date:
The following bug has been logged online:

Bug reference:      4208
Logged by:          Ron Mackley
Email address:      ronm@signalpatterns.com
PostgreSQL version: 8.2.6
Operating system:   Linux {hostname removed} 2.6.18-53.1.13.el5 #1 SMP Mon
Feb 11 13:27:27 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Description:        Server crashes on insert into gist index
Details:

We've been seeing infrequent crashes of the postgres. This is version 8.2.6
running on Red Hat Enterprise 5.1 on x64 using redhat issued RPMs.

We captured a core file and here is the stack trace:

(gdb) bt
#0  0x0000000000615b5a in pfree (pointer=0x2aaaccaff318) at mcxt.c:585
#1  0x00002aaace72ab59 in cube_inter (fcinfo=0x7fff102205f0) at cube.c:898
#2  0x0000000000602b14 in DirectFunctionCall2 (func=0x2aaace72aa00
<cube_inter>, arg1=46913066890008, arg2=5) at fmgr.c:888
#3  0x00002aaace72a73b in g_cube_picksplit (fcinfo=<value optimized out>) at
cube.c:571
#4  0x000000000060224f in FunctionCall2 (flinfo=0x2aaaccaff318,
arg1=46913066890008, arg2=5) at fmgr.c:1162
#5  0x000000000044d6e6 in gistSplitByKey (r=0x2aaace6b6d40,
page=0x2aaaccafe8a0 "\017", itup=0x3628200, len=83,
giststate=0x7fff10221800, v=0x7fff10221420,
    entryvec=0x36269e0, attno=0) at gistsplit.c:306
#6  0x0000000000444b15 in gistSplit (r=0x2aaace6b6d40, page=0x2aaaccafe8a0
"\017", itup=0x3628200, len=83, giststate=0x7fff10221800) at gist.c:943
#7  0x00000000004454ff in gistmakedeal (state=0x7fff10221780,
giststate=0x7fff10221800) at gist.c:329
#8  0x0000000000445d4b in gistdoinsert (r=0x2aaace6b6d40, itup=0x362aa90,
freespace=0, giststate=0x7fff10221800) at gist.c:278
#9  0x0000000000446115 in gistinsert (fcinfo=<value optimized out>) at
gist.c:242
#10 0x0000000000601eec in FunctionCall6 (flinfo=0x75000,
arg1=46913066890008, arg2=5, arg3=5, arg4=0, arg5=4, arg6=0) at fmgr.c:1275
#11 0x000000000045a03f in index_insert (indexRelation=0x2aaace6b6d40,
values=0x7fff10223f90, isnull=0x7fff10224090 "", heap_t_ctid=0x360256c,
    heapRelation=0x2aaace6af020, check_uniqueness=0 '\0') at indexam.c:196
#12 0x00000000005082bf in ExecInsertIndexTuples (slot=0x36011b8,
tupleid=0x360256c, estate=0x3600d10, is_vacuum=0 '\0') at execUtils.c:1116
#13 0x00000000005006f1 in ExecutorRun (queryDesc=<value optimized out>,
direction=<value optimized out>, count=0) at execMain.c:1401
#14 0x00000000005880d7 in ProcessQuery (parsetree=<value optimized out>,
plan=0x35b2ff8, params=0x0, dest=0x35b36d0, completionTag=0x7fff10224440 "")
at pquery.c:157
#15 0x0000000000588311 in PortalRunMulti (portal=0x35f63e0, dest=0x35b36d0,
altdest=0x35b36d0, completionTag=0x7fff10224440 "") at pquery.c:1150
#16 0x0000000000588b74 in PortalRun (portal=0x35f63e0,
count=9223372036854775807, dest=0x35b36d0, altdest=0x35b36d0,
completionTag=0x7fff10224440 "") at pquery.c:700
#17 0x0000000000584a9f in exec_simple_query (
    query_string=0x35b2650 "INSERT INTO user_attribute_vectors
(\"client_id\", \"survey_id\", \"user_id\", \"attribute_vector\") VALUES(1,
1, 9180,
E'(0.924332429795696,0.568812924744321,0.568812924744321,-0.599309532325997,
-1.462648746"...) at postgres.c:939
#18 0x0000000000585f22 in PostgresMain (argc=4, argv=<value optimized out>,
username=0x35396f0 "signalmatch-production") at postgres.c:3424
#19 0x000000000055f1c7 in ServerLoop () at postmaster.c:2934
#20 0x000000000055fdda in PostmasterMain (argc=5, argv=0x3519bb0) at
postmaster.c:966
#21 0x00000000005208e3 in main (argc=5, argv=<value optimized out>) at
main.c:188
(gdb)

And more information about the table:
sp_hub_production=# select count(*) from user_attribute_vectors;
 count
-------
  6057
(1 row)

sp_hub_production=# \d user_attribute_vectors
                              Table "public.user_attribute_vectors"
      Column      |  Type   |                              Modifiers

------------------+---------+-----------------------------------------------
----------------------
 id               | integer | not null default
nextval('user_attribute_vectors_id_seq'::regclass)
 user_id          | integer |
 client_id        | integer |
 survey_id        | integer |
 attribute_vector | cube    |
Indexes:
    "user_attribute_vectors_pkey" PRIMARY KEY, btree (id)
    "index_user_attribute_vectors_on_attribute_vector_gist" gist
(attribute_vector)
    "index_user_attribute_vectors_on_user_id" btree (user_id)

Re: BUG #4208: Server crashes on insert into gist index

From
Tom Lane
Date:
"Ron Mackley" <ronm@signalpatterns.com> writes:
> We've been seeing infrequent crashes of the postgres. This is version 8.2.6
> running on Red Hat Enterprise 5.1 on x64 using redhat issued RPMs.

> We captured a core file and here is the stack trace:

> (gdb) bt
> #0  0x0000000000615b5a in pfree (pointer=0x2aaaccaff318) at mcxt.c:585
> #1  0x00002aaace72ab59 in cube_inter (fcinfo=0x7fff102205f0) at cube.c:898
> #2  0x0000000000602b14 in DirectFunctionCall2 (func=0x2aaace72aa00
> <cube_inter>, arg1=46913066890008, arg2=5) at fmgr.c:888
> #3  0x00002aaace72a73b in g_cube_picksplit (fcinfo=<value optimized out>) at
> cube.c:571

Hmmm ... just looking at the code, I bet what is happening is that the
"swap" path in cube_inter is taken, and then the comparisons in
PG_FREE_IF_COPY get confused and try to pfree values that were not
separately allocated.  But if that's the story, why does cube_inter not
show a crash rate approaching 50%?  Maybe most people only use it on
cubes of the same dimensionality.  Does your gist index contain cubes
of varying numbers of dimensions?

            regards, tom lane

Re: BUG #4208: Server crashes on insert into gist index

From
Ron Mackley
Date:
Tom,

Thanks for the quick response. No:

sp_hub_production=# select distinct cube_dim(attribute_vector) from
user_attribute_vectors;
  cube_dim
----------
         5
(1 row)

In the past we had a problem where runt vectors found their way into
the table, but we deleted them and try to detect them on their way in.

This happened once before and then stopped. That's when we started
capturing core files. I suspect that it happens when our user base
grows enough to force a split on insert with just the right record. We
will probably see a fallow period and it will happen again after we
get another couple thousand users.

Ron

Ron Mackley
Manager, Software Development
Signal Patterns
ronm@signalpatterns.com

==========================================================
The information contained in this email message may be privileged,
confidential and protected from disclosure. If you are not the
intended recipient, any distribution or copying is strictly
prohibited. If you think that you have received this email message in
error, please notify the sender by reply email and delete the message
and any attachments.




On 29 May 2008, at 10:54 AM, Tom Lane wrote:

> "Ron Mackley" <ronm@signalpatterns.com> writes:
>> We've been seeing infrequent crashes of the postgres. This is
>> version 8.2.6
>> running on Red Hat Enterprise 5.1 on x64 using redhat issued RPMs.
>
>> We captured a core file and here is the stack trace:
>
>> (gdb) bt
>> #0  0x0000000000615b5a in pfree (pointer=0x2aaaccaff318) at mcxt.c:
>> 585
>> #1  0x00002aaace72ab59 in cube_inter (fcinfo=0x7fff102205f0) at
>> cube.c:898
>> #2  0x0000000000602b14 in DirectFunctionCall2 (func=0x2aaace72aa00
>> <cube_inter>, arg1=46913066890008, arg2=5) at fmgr.c:888
>> #3  0x00002aaace72a73b in g_cube_picksplit (fcinfo=<value optimized
>> out>) at
>> cube.c:571
>
> Hmmm ... just looking at the code, I bet what is happening is that the
> "swap" path in cube_inter is taken, and then the comparisons in
> PG_FREE_IF_COPY get confused and try to pfree values that were not
> separately allocated.  But if that's the story, why does cube_inter
> not
> show a crash rate approaching 50%?  Maybe most people only use it on
> cubes of the same dimensionality.  Does your gist index contain cubes
> of varying numbers of dimensions?
>
>             regards, tom lane

Re: BUG #4208: Server crashes on insert into gist index

From
Tom Lane
Date:
Ron Mackley <ronm@signalpatterns.com> writes:
> Thanks for the quick response. No:

> sp_hub_production=# select distinct cube_dim(attribute_vector) from
> user_attribute_vectors;
>   cube_dim
> ----------
>          5
> (1 row)

> In the past we had a problem where runt vectors found their way into
> the table, but we deleted them and try to detect them on their way in.

Well, if there were such entries in the past then they could still exist
in the index, I believe.  One or two such keys lurking in dusty corners
of the index would fit with the observed fact that you don't see the
crash often.  As a workaround, it'd probably be worth your trouble to
REINDEX that index to get rid of any such entries.

In any case, the swap bug is definitely real; for instance in cube's
regression database try this:

contrib_regression=# select cube_inter(c,'(1,2,3,4,5,6)') from test_cube;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

I will go fix that ...

            regards, tom lane