Re: [HACKERS] SIGSEGV in BRIN autosummarize - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] SIGSEGV in BRIN autosummarize
Date
Msg-id efefda33-5fd9-0a77-6ae5-ca21dbd163aa@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] SIGSEGV in BRIN autosummarize  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: [HACKERS] SIGSEGV in BRIN autosummarize
List pgsql-hackers
Hi,

On 10/15/2017 03:56 AM, Justin Pryzby wrote:
> On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
...
>> It's a bit difficult to guess what went wrong from this backtrace. For
>> me gdb typically prints a bunch of lines immediately before the frames,
>> explaining what went wrong - not sure why it's missing here.
> 
> Do you mean this ?
> 
> ...
> Loaded symbols for /lib64/libnss_files-2.12.so
> Core was generated by `postgres: autovacuum worker process   gtt             '.
> Program terminated with signal 11, Segmentation fault.
> #0  pfree (pointer=0x298c740) at mcxt.c:954
> 954             (*context->methods->free_p) (context, pointer);
> 

Yes. So either 'context' is bogus. Or 'context->methods' is bogus. Or
'context->methods->free_p' is bogus.

>> Perhaps some of those pointers are bogus, the memory was already pfree-d
>> or something like that. You'll have to poke around and try dereferencing
>> the pointers to find what works and what does not.
>>
>> For example what do these gdb commands do in the #0 frame?
>>
>> (gdb) p *(MemoryContext)context
> 
> (gdb) p *(MemoryContext)context
> Cannot access memory at address 0x7474617261763a20
> 

OK, this means the memory context pointer (tracked in the header of a
chunk) is bogus. There are multiple common ways how that could happen:

* Something corrupts memory (typically out-of-bounds write).

* The pointer got allocated in an incorrect memory context (which then
was released, and the memory was reused for new stuff).

* It's a use-after-free.

* ... various other possibilities ...

> 
> I uploaded the corefile:
> http://telsasoft.com/tmp/coredump-postgres-autovacuum-brin-summarize.gz
> 

Thanks, but I'm not sure that'll help, at this point. We already know
what happened (corrupted memory), we don't know "how". And core files
are mostly just "snapshots" so are not very useful in answering that :-(

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [HACKERS] oversight in EphemeralNamedRelation support
Next
From: Vik Fearing
Date:
Subject: Re: [HACKERS] [PATCH] pageinspect function to decode infomasks