Thread: segfault at aset.c:539

segfault at aset.c:539

From
Tomas Szepe
Date:
Apparently my mail hasn't made it to the list; here's a bounce.

fidx is always 1, set->freelist[] is full of NULLs except at index 1
where the value is either complete garbage or something that looks
like poisoned memory (0x7f7f7f7e).

The resolution of this bug is of critical importance to me, will
somebody help?

Thanks, T.

----- Forwarded message from Tomas Szepe <szepe@pinerecords.com> -----

Date: Thu, 10 Jul 2003 11:54:14 +0200
From: Tomas Szepe <szepe@pinerecords.com>
To: pgsql-bugs@postgresql.org
Subject: segfault at aset.c:539

Hi list,

I'm getting an ugly non-deterministic segfault in postmaster
at aset.c:539.

/*
 * Request is small enough to be treated as a chunk.  Look in the
 * corresponding free list to see if there is a free chunk we could
 * reuse.
 */
fidx = AllocSetFreeIndex(size);
priorfree = NULL;
for (chunk = set->freelist[fidx]; chunk; chunk = (AllocChunk) chunk->aset)
{
    if (chunk->size >= size)    /*  <--- line 539  */
        break;
    priorfree = chunk;
}

bt:
#0  0x081e2803 in AllocSetAlloc (context=0x82c6300, size=32) at aset.c:539
#1  0x081e32d4 in MemoryContextAlloc (context=0x82c6300, size=32) at mcxt.c:452
#2  0x0808b02c in btrescan (fcinfo=0xbfffe4f0) at nbtree.c:417
#3  0x081d89e4 in OidFunctionCall2 (functionId=334, arg1=137435560,
    arg2=3221219328) at fmgr.c:1248
#4  0x08085acd in index_rescan (scan=0x83119a8, key=0xbfffe800)
    at indexam.c:314
#5  0x080855c3 in RelationGetIndexScan (indexRelation=0x82d5670, nkeys=1,
    key=0xbfffe800) at genam.c:118
#6  0x0808af61 in btbeginscan (fcinfo=0xbfffe670) at nbtree.c:392
#7  0x081d8b1b in OidFunctionCall3 (functionId=333, arg1=137188976, arg2=1,
    arg3=3221219328) at fmgr.c:1275
#8  0x080859e5 in index_beginscan (heapRelation=0x82c6b98,
    indexRelation=0x82d5670, snapshot=0x0, nkeys=1, key=0xbfffe800)
    at indexam.c:268
#9  0x080856cd in systable_beginscan (heapRelation=0x82c6b98,
    indexRelname=0x8260a3f "pg_proc_oid_index", indexOK=1 '\001',
    snapshot=0x0, nkeys=1, key=0xbfffe800) at genam.c:219
#10 0x081cb8f5 in SearchCatCache (cache=0x406da088, v1=2034, v2=0, v3=0, v4=0)
    at catcache.c:1193
#11 0x081d28e8 in SearchSysCache (cacheId=25, key1=2034, key2=0, key3=0,
    key4=0) at syscache.c:536
#12 0x080ad193 in pg_proc_aclcheck (proc_oid=2034, userid=100, mode=128)
    at aclchk.c:1072
#13 0x081d49d2 in init_fcache (foid=2034, nargs=1, fcacheCxt=0x82c6278)
    at fcache.c:33
#14 0x08103021 in ExecEvalFunc (funcClause=0x8301700, econtext=0x830dc18,
    isNull=0xbfffeaa0 "", isDone=0xbfffe990) at execQual.c:1162
#15 0x08103c21 in ExecEvalExpr (expression=0x8301700, econtext=0x830dc18,
    isNull=0xbfffeaa0 "", isDone=0xbfffe990) at execQual.c:1715
#16 0x08102597 in ExecEvalFuncArgs (fcinfo=0xbfffea10, argList=0x8301728,
    econtext=0x830dc18) at execQual.c:624
#17 0x081026ca in ExecMakeFunctionResult (fcache=0x8310508,
    arguments=0x8301728, econtext=0x830dc18, isNull=0xbfffec60 "",
    isDone=0xbfffeb50) at execQual.c:680
#18 0x08103054 in ExecEvalFunc (funcClause=0x8301740, econtext=0x830dc18,
    isNull=0xbfffec60 "", isDone=0xbfffeb50) at execQual.c:1167
#19 0x08103c21 in ExecEvalExpr (expression=0x8301740, econtext=0x830dc18,
    isNull=0xbfffec60 "", isDone=0xbfffeb50) at execQual.c:1715
#20 0x08102597 in ExecEvalFuncArgs (fcinfo=0xbfffebd0, argList=0x8301768,
    econtext=0x830dc18) at execQual.c:624
#21 0x081026ca in ExecMakeFunctionResult (fcache=0x8310400,
    arguments=0x8301768, econtext=0x830dc18, isNull=0xbfffee20 "",
    isDone=0xbfffed10) at execQual.c:680
#22 0x08103054 in ExecEvalFunc (funcClause=0x8301780, econtext=0x830dc18,
    isNull=0xbfffee20 "", isDone=0xbfffed10) at execQual.c:1167
#23 0x08103c21 in ExecEvalExpr (expression=0x8301780, econtext=0x830dc18,
    isNull=0xbfffee20 "", isDone=0xbfffed10) at execQual.c:1715
#24 0x08102597 in ExecEvalFuncArgs (fcinfo=0xbfffed90, argList=0x8301b00,
    econtext=0x830dc18) at execQual.c:624
#25 0x081026ca in ExecMakeFunctionResult (fcache=0x83102f8,
    arguments=0x8301b00, econtext=0x830dc18, isNull=0xbfffef33 "", isDone=0x0)
    at execQual.c:680
#26 0x08102fcf in ExecEvalOper (opClause=0x8301b18, econtext=0x830dc18,
    isNull=0xbfffef33 "", isDone=0x0) at execQual.c:1125
#27 0x08103bf9 in ExecEvalExpr (expression=0x8301b18, econtext=0x830dc18,
    isNull=0xbfffef33 "", isDone=0x0) at execQual.c:1711
#28 0x081032a1 in ExecEvalOr (orExpr=0x8301f18, econtext=0x830dc18,
    isNull=0xbfffef33 "") at execQual.c:1316
#29 0x08103c42 in ExecEvalExpr (expression=0x8301f18, econtext=0x830dc18,
    isNull=0xbfffef33 "", isDone=0x0) at execQual.c:1719
#30 0x08103ec8 in ExecQual (qual=0x8302ad0, econtext=0x830dc18,
    resultForNull=0 '\0') at execQual.c:1885
#31 0x08104619 in ExecScan (node=0x83028e8, accessMtd=0x810cd64 <SeqNext>)
    at execScan.c:124
#32 0x0810ce8e in ExecSeqScan (node=0x83028e8) at nodeSeqscan.c:133
#33 0x08100e4d in ExecProcNode (node=0x83028e8, parent=0x830d988)
    at execProcnode.c:291
#34 0x0810db29 in ExecSort (node=0x830d988) at nodeSort.c:162
#35 0x08100f0b in ExecProcNode (node=0x830d988, parent=0x0)
    at execProcnode.c:337
#36 0x080ff6ba in ExecutePlan (estate=0x830da38, plan=0x830d988,
    operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection,
    destfunc=0x830dfe0) at execMain.c:958
#37 0x080fea01 in ExecutorRun (queryDesc=0x830da10, estate=0x830da38,
    direction=ForwardScanDirection, count=0) at execMain.c:195
#38 0x0816fa5a in ProcessQuery (parsetree=0x82fcbb8, plan=0x830d988,
    dest=Remote, completionTag=0xbffff1b0 "") at pquery.c:242
#39 0x0816db96 in pg_exec_query_string (query_string=0x82fbcb8, dest=Remote,
    parse_context=0x82c6168) at postgres.c:838
#40 0x0816eecc in PostgresMain (argc=4, argv=0xbffff470,
    username=0x82c1741 "kala") at postgres.c:2013
#41 0x0814c34e in DoBackend (port=0x82c1610) at postmaster.c:2310
#42 0x0814ba06 in BackendStartup (port=0x82c1610) at postmaster.c:1932
#43 0x0814a6ba in ServerLoop () at postmaster.c:1009
#44 0x0814a1b5 in PostmasterMain (argc=2, argv=0x82a8ae8) at postmaster.c:788
#45 0x08119fe4 in main (argc=2, argv=0xbffffe04) at main.c:210
#46 0x40209757 in __libc_start_main () from /lib/libc.so.6

0x81e27fc <AllocSetAlloc+270>:  jne    0x81e2800 <AllocSetAlloc+274>
0x81e27fe <AllocSetAlloc+272>:  jmp    0x81e281d <AllocSetAlloc+303>
0x81e2800 <AllocSetAlloc+274>:  mov    0xfffffff4(%ebp),%eax
0x81e2803 <AllocSetAlloc+277>:  mov    0x4(%eax),%eax
0x81e2806 <AllocSetAlloc+280>:  cmp    0xc(%ebp),%eax
0x81e2809 <AllocSetAlloc+283>:  jb     0x81e280d <AllocSetAlloc+287>
0x81e280b <AllocSetAlloc+285>:  jmp    0x81e281d <AllocSetAlloc+303>

Sometimes postmaster doesn't die but prints the following error message
instead (the chunk# varies):

ERROR:  AllocSetFree: cannot find block containing chunk 0x8337444

Test case: init the db for LATIN2, createdb -E LATIN2 testdb (that's
how I'm running; can't say if the oops happens for C too), then
"psql -f <file_attached_to_this_post> testdb" -- you may need to
run the command a couple dozen times before your postmaster blows
up.  The SIGSEGV strikes at the SELECT.

Anyone with a fix? :)

--
Tomas Szepe <szepe@pinerecords.com>

create table customer (
    ikey integer NOT NULL,
    nick character varying(32),
    flags smallint,
    instdate date,
    ts timestamp without time zone
    DEFAULT ('now'::text)::timestamp(0) with time zone NOT NULL
);

copy customer from stdin;
592    Vondruska Jiri    7    2003-01-01    2003-05-16 21:43:17
592    Datatrans CZ    7    2003-01-01    2003-05-16 21:43:17
592    Stechova Marcela    7    2003-01-01    2003-05-16 21:43:17
\.

alter table customer add column _flags text;
update customer set _flags='flag 1, flag 2, flag 3' where flags=7;
alter table customer drop column flags;
alter table customer rename column _flags to flags;

select *
 from customer
 where (lower(to_ascii(ikey)) like '%hor%')
 or (lower(to_ascii(nick)) like '%hor%')
 or (lower(to_ascii(flags)) like '%hor%')
 or (lower(to_ascii(instdate)) like '%hor%')
 or (lower(to_ascii(ts)) like '%hor%')
 order by ikey;

drop table customer;

Re: segfault at aset.c:539

From
Tom Lane
Date:
Tomas Szepe <szepe@pinerecords.com> writes:
> The resolution of this bug is of critical importance to me, will
> somebody help?

Postgres version?  Platform?

            regards, tom lane

Re: segfault at aset.c:539

From
Tomas Szepe
Date:
> [tgl@sss.pgh.pa.us]
>
> > The resolution of this bug is of critical importance to me, will
> > somebody help?
>
> Postgres version?  Platform?

Ouch, sorry.  PostgreSQL 7.3.3 on x86 Linux.

--
Tomas Szepe <szepe@pinerecords.com>

Re: segfault at aset.c:539

From
Tom Lane
Date:
Tomas Szepe <szepe@pinerecords.com> writes:
> I'm getting an ugly non-deterministic segfault in postmaster
> at aset.c:539.
> ...
> Anyone with a fix? :)

Yech.  This is the *second* buffer-overrun bug we've found in to_ascii()
in the last couple months.  I've now taken a close look at that whole
file and I think the rest of it is okay, but ... :-(

Patch against 7.3.3 is attached.

            regards, tom lane

*** src/backend/utils/adt/ascii.c.orig    Wed Apr  2 16:08:07 2003
--- src/backend/utils/adt/ascii.c    Mon Jul 14 12:37:33 2003
***************
*** 94,100 ****
  {
      pg_to_ascii(
                  (unsigned char *) VARDATA(data),        /* src */
!                 VARDATA(data) + VARSIZE(data),    /* src end */
                  (unsigned char *) VARDATA(data),        /* desc */
                  enc);            /* encoding */

--- 94,100 ----
  {
      pg_to_ascii(
                  (unsigned char *) VARDATA(data),        /* src */
!                 (unsigned char *) (data) + VARSIZE(data),    /* src end */
                  (unsigned char *) VARDATA(data),        /* desc */
                  enc);            /* encoding */

Re: segfault at aset.c:539

From
Tomas Szepe
Date:
> [tgl@sss.pgh.pa.us]
>
> > I'm getting an ugly non-deterministic segfault in postmaster
> > at aset.c:539.
> > ...
> > Anyone with a fix? :)
>
> Yech.  This is the *second* buffer-overrun bug we've found in to_ascii()
> in the last couple months.  I've now taken a close look at that whole
> file and I think the rest of it is okay, but ... :-(

Tom, you've saved the day again!  (Thanks thanks thanks.)
(BTW, it seems the bug can't be triggered on Linux/sparc32).

--
Tomas Szepe <szepe@pinerecords.com>

Re: segfault at aset.c:539

From
Tom Lane
Date:
Tomas Szepe <szepe@pinerecords.com> writes:
> (BTW, it seems the bug can't be triggered on Linux/sparc32).

You'd be less likely to see it on a machine where MAXALIGN is 8,
since there would be more pad bytes on the average ... but depending
on the string length fed to to_ascii(), I think it could be made
to happen on any platform.  Strings whose length is an odd multiple
of four (4, 12, etc) would have no pad bytes on any platform.

I'm surprised we did not notice this case when we were testing the fix
for the other bug.  That bug was only an off-by-one, this was an
off-by-four :-(

            regards, tom lane