Thread: BUG #4208: Server crashes on insert into gist index
The following bug has been logged online: Bug reference: 4208 Logged by: Ron Mackley Email address: ronm@signalpatterns.com PostgreSQL version: 8.2.6 Operating system: Linux {hostname removed} 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Description: Server crashes on insert into gist index Details: We've been seeing infrequent crashes of the postgres. This is version 8.2.6 running on Red Hat Enterprise 5.1 on x64 using redhat issued RPMs. We captured a core file and here is the stack trace: (gdb) bt #0 0x0000000000615b5a in pfree (pointer=0x2aaaccaff318) at mcxt.c:585 #1 0x00002aaace72ab59 in cube_inter (fcinfo=0x7fff102205f0) at cube.c:898 #2 0x0000000000602b14 in DirectFunctionCall2 (func=0x2aaace72aa00 <cube_inter>, arg1=46913066890008, arg2=5) at fmgr.c:888 #3 0x00002aaace72a73b in g_cube_picksplit (fcinfo=<value optimized out>) at cube.c:571 #4 0x000000000060224f in FunctionCall2 (flinfo=0x2aaaccaff318, arg1=46913066890008, arg2=5) at fmgr.c:1162 #5 0x000000000044d6e6 in gistSplitByKey (r=0x2aaace6b6d40, page=0x2aaaccafe8a0 "\017", itup=0x3628200, len=83, giststate=0x7fff10221800, v=0x7fff10221420, entryvec=0x36269e0, attno=0) at gistsplit.c:306 #6 0x0000000000444b15 in gistSplit (r=0x2aaace6b6d40, page=0x2aaaccafe8a0 "\017", itup=0x3628200, len=83, giststate=0x7fff10221800) at gist.c:943 #7 0x00000000004454ff in gistmakedeal (state=0x7fff10221780, giststate=0x7fff10221800) at gist.c:329 #8 0x0000000000445d4b in gistdoinsert (r=0x2aaace6b6d40, itup=0x362aa90, freespace=0, giststate=0x7fff10221800) at gist.c:278 #9 0x0000000000446115 in gistinsert (fcinfo=<value optimized out>) at gist.c:242 #10 0x0000000000601eec in FunctionCall6 (flinfo=0x75000, arg1=46913066890008, arg2=5, arg3=5, arg4=0, arg5=4, arg6=0) at fmgr.c:1275 #11 0x000000000045a03f in index_insert (indexRelation=0x2aaace6b6d40, values=0x7fff10223f90, isnull=0x7fff10224090 "", heap_t_ctid=0x360256c, heapRelation=0x2aaace6af020, check_uniqueness=0 '\0') at indexam.c:196 #12 0x00000000005082bf in ExecInsertIndexTuples (slot=0x36011b8, tupleid=0x360256c, estate=0x3600d10, is_vacuum=0 '\0') at execUtils.c:1116 #13 0x00000000005006f1 in ExecutorRun (queryDesc=<value optimized out>, direction=<value optimized out>, count=0) at execMain.c:1401 #14 0x00000000005880d7 in ProcessQuery (parsetree=<value optimized out>, plan=0x35b2ff8, params=0x0, dest=0x35b36d0, completionTag=0x7fff10224440 "") at pquery.c:157 #15 0x0000000000588311 in PortalRunMulti (portal=0x35f63e0, dest=0x35b36d0, altdest=0x35b36d0, completionTag=0x7fff10224440 "") at pquery.c:1150 #16 0x0000000000588b74 in PortalRun (portal=0x35f63e0, count=9223372036854775807, dest=0x35b36d0, altdest=0x35b36d0, completionTag=0x7fff10224440 "") at pquery.c:700 #17 0x0000000000584a9f in exec_simple_query ( query_string=0x35b2650 "INSERT INTO user_attribute_vectors (\"client_id\", \"survey_id\", \"user_id\", \"attribute_vector\") VALUES(1, 1, 9180, E'(0.924332429795696,0.568812924744321,0.568812924744321,-0.599309532325997, -1.462648746"...) at postgres.c:939 #18 0x0000000000585f22 in PostgresMain (argc=4, argv=<value optimized out>, username=0x35396f0 "signalmatch-production") at postgres.c:3424 #19 0x000000000055f1c7 in ServerLoop () at postmaster.c:2934 #20 0x000000000055fdda in PostmasterMain (argc=5, argv=0x3519bb0) at postmaster.c:966 #21 0x00000000005208e3 in main (argc=5, argv=<value optimized out>) at main.c:188 (gdb) And more information about the table: sp_hub_production=# select count(*) from user_attribute_vectors; count ------- 6057 (1 row) sp_hub_production=# \d user_attribute_vectors Table "public.user_attribute_vectors" Column | Type | Modifiers ------------------+---------+----------------------------------------------- ---------------------- id | integer | not null default nextval('user_attribute_vectors_id_seq'::regclass) user_id | integer | client_id | integer | survey_id | integer | attribute_vector | cube | Indexes: "user_attribute_vectors_pkey" PRIMARY KEY, btree (id) "index_user_attribute_vectors_on_attribute_vector_gist" gist (attribute_vector) "index_user_attribute_vectors_on_user_id" btree (user_id)
"Ron Mackley" <ronm@signalpatterns.com> writes: > We've been seeing infrequent crashes of the postgres. This is version 8.2.6 > running on Red Hat Enterprise 5.1 on x64 using redhat issued RPMs. > We captured a core file and here is the stack trace: > (gdb) bt > #0 0x0000000000615b5a in pfree (pointer=0x2aaaccaff318) at mcxt.c:585 > #1 0x00002aaace72ab59 in cube_inter (fcinfo=0x7fff102205f0) at cube.c:898 > #2 0x0000000000602b14 in DirectFunctionCall2 (func=0x2aaace72aa00 > <cube_inter>, arg1=46913066890008, arg2=5) at fmgr.c:888 > #3 0x00002aaace72a73b in g_cube_picksplit (fcinfo=<value optimized out>) at > cube.c:571 Hmmm ... just looking at the code, I bet what is happening is that the "swap" path in cube_inter is taken, and then the comparisons in PG_FREE_IF_COPY get confused and try to pfree values that were not separately allocated. But if that's the story, why does cube_inter not show a crash rate approaching 50%? Maybe most people only use it on cubes of the same dimensionality. Does your gist index contain cubes of varying numbers of dimensions? regards, tom lane
Tom, Thanks for the quick response. No: sp_hub_production=# select distinct cube_dim(attribute_vector) from user_attribute_vectors; cube_dim ---------- 5 (1 row) In the past we had a problem where runt vectors found their way into the table, but we deleted them and try to detect them on their way in. This happened once before and then stopped. That's when we started capturing core files. I suspect that it happens when our user base grows enough to force a split on insert with just the right record. We will probably see a fallow period and it will happen again after we get another couple thousand users. Ron Ron Mackley Manager, Software Development Signal Patterns ronm@signalpatterns.com ========================================================== The information contained in this email message may be privileged, confidential and protected from disclosure. If you are not the intended recipient, any distribution or copying is strictly prohibited. If you think that you have received this email message in error, please notify the sender by reply email and delete the message and any attachments. On 29 May 2008, at 10:54 AM, Tom Lane wrote: > "Ron Mackley" <ronm@signalpatterns.com> writes: >> We've been seeing infrequent crashes of the postgres. This is >> version 8.2.6 >> running on Red Hat Enterprise 5.1 on x64 using redhat issued RPMs. > >> We captured a core file and here is the stack trace: > >> (gdb) bt >> #0 0x0000000000615b5a in pfree (pointer=0x2aaaccaff318) at mcxt.c: >> 585 >> #1 0x00002aaace72ab59 in cube_inter (fcinfo=0x7fff102205f0) at >> cube.c:898 >> #2 0x0000000000602b14 in DirectFunctionCall2 (func=0x2aaace72aa00 >> <cube_inter>, arg1=46913066890008, arg2=5) at fmgr.c:888 >> #3 0x00002aaace72a73b in g_cube_picksplit (fcinfo=<value optimized >> out>) at >> cube.c:571 > > Hmmm ... just looking at the code, I bet what is happening is that the > "swap" path in cube_inter is taken, and then the comparisons in > PG_FREE_IF_COPY get confused and try to pfree values that were not > separately allocated. But if that's the story, why does cube_inter > not > show a crash rate approaching 50%? Maybe most people only use it on > cubes of the same dimensionality. Does your gist index contain cubes > of varying numbers of dimensions? > > regards, tom lane
Ron Mackley <ronm@signalpatterns.com> writes: > Thanks for the quick response. No: > sp_hub_production=# select distinct cube_dim(attribute_vector) from > user_attribute_vectors; > cube_dim > ---------- > 5 > (1 row) > In the past we had a problem where runt vectors found their way into > the table, but we deleted them and try to detect them on their way in. Well, if there were such entries in the past then they could still exist in the index, I believe. One or two such keys lurking in dusty corners of the index would fit with the observed fact that you don't see the crash often. As a workaround, it'd probably be worth your trouble to REINDEX that index to get rid of any such entries. In any case, the swap bug is definitely real; for instance in cube's regression database try this: contrib_regression=# select cube_inter(c,'(1,2,3,4,5,6)') from test_cube; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. I will go fix that ... regards, tom lane