Thread: PostgreSQL crashes with SIGSEGV

PostgreSQL crashes with SIGSEGV

From
Bernd Helmle
Date:
A customer recently reported a crash in a postgres backend. The backend
encountered a SIGSEGV, crashing during SELECTs from a fairly
complicated view using a grouping set directive. I've managed to
reproduce it by tracking it down to a specific SELECT, but
unfortunately couldn't yet manage to strip it down to a small,
repeatable test case which doesn't involve the whole (sensitive)
dataset. I'm reporting my findings so far, maybe it helps to track it
down already.

The crashed backend gives the following backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x000000000099be84 in pfree (pointer=0x17194e8) at mcxt.c:1007
1007        context = ((StandardChunkHeader *)
(gdb) bt
#0  0x000000000099be84 in pfree (pointer=0x17194e8) at mcxt.c:1007
#1  0x00000000004798d8 in heap_free_minimal_tuple (mtup=0x17194e8) at
heaptuple.c:1403
#2  0x0000000000687202 in ExecClearTuple (slot=0x10a3f88) at
execTuples.c:455
#3  0x0000000000686b89 in ExecResetTupleTable (tupleTable=0x10a06a0,
shouldFree=0 '\000') at execTuples.c:169
#4  0x00000000006773ff in ExecEndPlan (planstate=0x10a0398,
estate=0x10a01f0) at execMain.c:1469
#5  0x0000000000675927 in standard_ExecutorEnd (queryDesc=0x10910c0) at
execMain.c:468
#6  0x0000000000675865 in ExecutorEnd (queryDesc=0x10910c0) at
execMain.c:439
#7  0x000000000062c71c in PortalCleanup (portal=0xf79ae0) at
portalcmds.c:280
#8  0x000000000099ce97 in PortalDrop (portal=0xf79ae0, isTopCommit=0
'\000') at portalmem.c:510
#9  0x0000000000813639 in exec_simple_query (
    query_string=0xf3dbe0 "SELECT * FROM \"xxxx\".\"xxxx\" WHERE (xxxx
= 'xxxx') LIMIT 1000;") at postgres.c:1095
#10 0x00000000008177ea in PostgresMain (argc=1, argv=0xee9e10,
dbname=0xeb9a78 "xxxx", username=0xee9c78 "bernd")
    at postgres.c:4072
#11 0x000000000078cc27 in BackendRun (port=0xedc700) at
postmaster.c:4342
#12 0x000000000078c377 in BackendStartup (port=0xedc700) at
postmaster.c:4016
#13 0x0000000000788983 in ServerLoop () at postmaster.c:1721
#14 0x0000000000787fb8 in PostmasterMain (argc=1, argv=0xeb7960) at
postmaster.c:1329
#15 0x00000000006d1b52 in main (argc=1, argv=0xeb7960) at main.c:228

I've tested this so far against very current REL9_6_STABLE and
REL9_5_STABLE and got them to crash with the same backtrace. The crash
is dependent on the chosen plan, experiments with work_mem show that
the crash seems to happen only if you get external sorts into the
execution plan.

REL10_STABLE seems not affected, as my extracted application query
doesn't crash there.

Running the query against REL9_6_STABLE with valgrind shows the
following results:

==00:00:01:33.067 13158== Invalid write of size 8
==00:00:01:33.067 13158==    at 0x93D53B: AllocSetFree (aset.c:998)
==00:00:01:33.067 13158==    by 0x93EC03: pfree (mcxt.c:1012)
==00:00:01:33.067 13158==    by 0x476E34: heap_free_minimal_tuple
(heaptuple.c:1403)
==00:00:01:33.067 13158==    by 0x6521A3: ExecClearTuple
(execTuples.c:455)
==00:00:01:33.067 13158==    by 0x651D4B: ExecResetTupleTable
(execTuples.c:169)

==00:00:03:34.540 14381== Invalid read of size 8
==00:00:03:34.540 14381==    at 0x99E29C: pfree (mcxt.c:1007)
==00:00:03:34.540 14381==    by 0x4798D7: heap_free_minimal_tuple
(heaptuple.c:1403)
==00:00:03:34.540 14381==    by 0x68753E: ExecClearTuple
(execTuples.c:455)
==00:00:03:34.540 14381==    by 0x686EC5: ExecResetTupleTable
(execTuples.c:169)
==00:00:03:34.540 14381==    by 0x67773B: ExecEndPlan (execMain.c:1469)
==00:00:03:34.540 14381==    by 0x675C63: standard_ExecutorEnd
(execMain.c:468)
==00:00:03:34.540 14381==    by 0x675BA1: ExecutorEnd (execMain.c:439)
==00:00:03:34.540 14381==    by 0x62CA58: PortalCleanup
(portalcmds.c:280)
==00:00:03:34.540 14381==    by 0x99F412: PortalDrop (portalmem.c:510)
==00:00:03:34.540 14381==    by 0x8142A4: exec_simple_query
(postgres.c:1095)
==00:00:03:34.540 14381==    by 0x818455: PostgresMain
(postgres.c:4072)
==00:00:03:34.540 14381==    by 0x78CF63: BackendRun
(postmaster.c:4342)
==00:00:03:34.540 14381==  Address 0x188cd220 is 474,080 bytes inside a
block of size 1,048,576 free'd
==00:00:03:34.540 14381==    at 0x4C2FD18: free
(vg_replace_malloc.c:530)
==00:00:03:34.540 14381==    by 0x99A8DB: AllocSetDelete (aset.c:653)
==00:00:03:34.540 14381==    by 0x99C7A4: MemoryContextDelete
(mcxt.c:225)
==00:00:03:34.540 14381==    by 0x99C855: MemoryContextDeleteChildren
(mcxt.c:245)
==00:00:03:34.540 14381==    by 0x99C772: MemoryContextDelete
(mcxt.c:208)
==00:00:03:34.540 14381==    by 0x9A5F10: tuplesort_end
(tuplesort.c:1198)
==00:00:03:34.540 14381==    by 0x692568: ExecEndAgg (nodeAgg.c:3449)
==00:00:03:34.540 14381==    by 0x67BB46: ExecEndNode
(execProcnode.c:755)
==00:00:03:34.540 14381==    by 0x6AD652: ExecEndSubqueryScan
(nodeSubqueryscan.c:181)
==00:00:03:34.540 14381==    by 0x67BA69: ExecEndNode
(execProcnode.c:697)
==00:00:03:34.540 14381==    by 0x69DF6E: ExecEndLimit
(nodeLimit.c:438)
==00:00:03:34.540 14381==    by 0x67BB9D: ExecEndNode
(execProcnode.c:779)
==00:00:03:34.540 14381==  Block was alloc'd at
==00:00:03:34.540 14381==    at 0x4C2EB6B: malloc
(vg_replace_malloc.c:299)
==00:00:03:34.540 14381==    by 0x99AED4: AllocSetAlloc (aset.c:866)
==00:00:03:34.540 14381==    by 0x99DC7A: palloc (mcxt.c:904)
==00:00:03:34.540 14381==    by 0x4798F6: heap_copy_minimal_tuple
(heaptuple.c:1417)
==00:00:03:34.540 14381==    by 0x6878D7: ExecCopySlotMinimalTuple
(execTuples.c:593)
==00:00:03:34.540 14381==    by 0x9AC1D2: copytup_heap
(tuplesort.c:3998)
==00:00:03:34.540 14381==    by 0x9A616C: tuplesort_puttupleslot
(tuplesort.c:1345)
==00:00:03:34.540 14381==    by 0x68D7D8: fetch_input_tuple
(nodeAgg.c:601)
==00:00:03:34.540 14381==    by 0x68FEC1: agg_retrieve_direct
(nodeAgg.c:2168)
==00:00:03:34.540 14381==    by 0x68F9AB: ExecAgg (nodeAgg.c:1903)
==00:00:03:34.540 14381==    by 0x67B70A: ExecProcNode
(execProcnode.c:503)
==00:00:03:34.540 14381==    by 0x6AD40F: SubqueryNext
(nodeSubqueryscan.c:53)
==00:00:03:34.540 14381== 
{
   <insert_a_suppression_name_here>
   Memcheck:Addr8
   fun:pfree
   fun:heap_free_minimal_tuple
   fun:ExecClearTuple
   fun:ExecResetTupleTable
   fun:ExecEndPlan
   fun:standard_ExecutorEnd
   fun:ExecutorEnd
   fun:PortalCleanup
   fun:PortalDrop
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
}
==00:00:03:34.548 14381== Invalid read of size 4
==00:00:03:34.548 14381==    at 0x99E2AE: pfree (mcxt.c:1010)
==00:00:03:34.548 14381==    by 0x4798D7: heap_free_minimal_tuple
(heaptuple.c:1403)
==00:00:03:34.548 14381==    by 0x68753E: ExecClearTuple
(execTuples.c:455)
==00:00:03:34.548 14381==    by 0x686EC5: ExecResetTupleTable
(execTuples.c:169)
==00:00:03:34.548 14381==    by 0x67773B: ExecEndPlan (execMain.c:1469)
==00:00:03:34.548 14381==    by 0x675C63: standard_ExecutorEnd
(execMain.c:468)
==00:00:03:34.548 14381==    by 0x675BA1: ExecutorEnd (execMain.c:439)
==00:00:03:34.548 14381==    by 0x62CA58: PortalCleanup
(portalcmds.c:280)
==00:00:03:34.548 14381==    by 0x99F412: PortalDrop (portalmem.c:510)
==00:00:03:34.548 14381==    by 0x8142A4: exec_simple_query
(postgres.c:1095)
==00:00:03:34.548 14381==    by 0x818455: PostgresMain
(postgres.c:4072)
==00:00:03:34.548 14381==    by 0x78CF63: BackendRun
(postmaster.c:4342)
==00:00:03:34.548 14381==  Address 0x7f7f7f7f7f7f7f7f is not stack'd,
malloc'd or (recently) free'd
==00:00:03:34.548 14381== 
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:pfree
   fun:heap_free_minimal_tuple
   fun:ExecClearTuple
   fun:ExecResetTupleTable
   fun:ExecEndPlan
   fun:standard_ExecutorEnd
   fun:ExecutorEnd
   fun:PortalCleanup
   fun:PortalDrop
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
}
==00:00:03:34.548 14381== 
==00:00:03:34.548 14381== Process terminating with default action of
signal 11 (SIGSEGV): dumping core
==00:00:03:34.548 14381==  General Protection Fault
==00:00:03:34.548 14381==    at 0x99E2AE: pfree (mcxt.c:1010)
==00:00:03:34.548 14381==    by 0x4798D7: heap_free_minimal_tuple
(heaptuple.c:1403)
==00:00:03:34.548 14381==    by 0x68753E: ExecClearTuple
(execTuples.c:455)
==00:00:03:34.548 14381==    by 0x686EC5: ExecResetTupleTable
(execTuples.c:169)
==00:00:03:34.548 14381==    by 0x67773B: ExecEndPlan (execMain.c:1469)
==00:00:03:34.548 14381==    by 0x675C63: standard_ExecutorEnd
(execMain.c:468)
==00:00:03:34.548 14381==    by 0x675BA1: ExecutorEnd (execMain.c:439)
==00:00:03:34.548 14381==    by 0x62CA58: PortalCleanup
(portalcmds.c:280)
==00:00:03:34.548 14381==    by 0x99F412: PortalDrop (portalmem.c:510)
==00:00:03:34.548 14381==    by 0x8142A4: exec_simple_query
(postgres.c:1095)
==00:00:03:34.548 14381==    by 0x818455: PostgresMain
(postgres.c:4072)
==00:00:03:34.548 14381==    by 0x78CF63: BackendRun
(postmaster.c:4342)
==00:00:03:35.088 14381== 
==00:00:03:35.088 14381== Process terminating with default action of
signal 11 (SIGSEGV)
==00:00:03:35.088 14381==  General Protection Fault
==00:00:03:35.088 14381==    at 0x632016C: _dl_catch_error (in
/usr/lib64/libc-2.25.so)
==00:00:03:35.088 14381==    by 0x631F8E6: __libc_dlclose (in
/usr/lib64/libc-2.25.so)
==00:00:03:35.088 14381==    by 0x634B5E4: free_mem (in
/usr/lib64/libc-2.25.so)
==00:00:03:35.088 14381==    by 0x634B1E1: __libc_freeres (in
/usr/lib64/libc-2.25.so)
==00:00:03:35.088 14381==    by 0x4A296DB: _vgnU_freeres
(vg_preloaded.c:77)
==00:00:03:35.088 14381==    by 0x18118EA7: ???
==00:00:03:35.088 14381==    by 0x4798D7: heap_free_minimal_tuple
(heaptuple.c:1403)
==00:00:03:35.088 14381==    by 0x68753E: ExecClearTuple
(execTuples.c:455)
==00:00:03:35.088 14381==    by 0x686EC5: ExecResetTupleTable
(execTuples.c:169)
==00:00:03:35.088 14381==    by 0x67773B: ExecEndPlan (execMain.c:1469)
==00:00:03:35.088 14381==    by 0x675C63: standard_ExecutorEnd
(execMain.c:468)
==00:00:03:35.088 14381==    by 0x675BA1: ExecutorEnd (execMain.c:439)
==00:00:03:35.088 14381== 
==00:00:03:35.088 14381== HEAP SUMMARY:
==00:00:03:35.088 14381==     in use at exit: 6,833,625 bytes in 531
blocks
==00:00:03:35.088 14381==   total heap usage: 814,111 allocs, 1,264
frees, 118,761,978 bytes allocated
==00:00:03:35.088 14381== 
==00:00:03:35.088 14381== For a detailed leak analysis, rerun with: --
leak-check=full
==00:00:03:35.088 14381== 
==00:00:03:35.088 14381== For counts of detected and suppressed errors,
rerun with: -v
==00:00:03:35.088 14381== ERROR SUMMARY: 2 errors from 2 contexts
(suppressed: 35 from 6)

It seems the backend tries to free a minimal tuple at executor end,
which is already gone by deleting the memory context it was allocated
in before. ExecResetTupleTable() is looping through a list with 70
entries, it encounters (after 6 or seven rounds) the first tuple slot
with tts_shouldFreeMin set, all others before don't have it set:

(gdb) p *slot
$6 = {type = T_TupleTableSlot, tts_isempty = 0 '\000', tts_shouldFree =
0 '\000', tts_shouldFreeMin = 1 '\001', tts_slow = 1 '\001', 
  tts_tuple = 0x10a6c48, tts_tupleDescriptor = 0x10d1be8, tts_mcxt =
0xf59668, tts_buffer = 0, tts_nvalid = 6, tts_values = 0x10d2640, 
  tts_isnull = 0x10d26d8 "", tts_mintuple = 0x171c168, tts_minhdr =
{t_len = 162, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, 
      ip_posid = 0}, t_tableOid = 0, t_data = 0x171c160}, tts_off = 88}
(gdb) p list_length(tupleTable) 

tts_mintuple is invalid, though:

(gdb) p *slot->tts_mintuple
Cannot access memory at address 0x171c168

The following attempt to clean it in ExecClearTuple() then crashes the
backend.

    Bernd



Re: PostgreSQL crashes with SIGSEGV

From
Michael Paquier
Date:
On Fri, Dec 8, 2017 at 12:47 AM, Bernd Helmle <mailings@oopsware.de> wrote:
> A customer recently reported a crash in a postgres backend. The backend
> encountered a SIGSEGV, crashing during SELECTs from a fairly
> complicated view using a grouping set directive. I've managed to
> reproduce it by tracking it down to a specific SELECT, but
> unfortunately couldn't yet manage to strip it down to a small,
> repeatable test case which doesn't involve the whole (sensitive)
> dataset. I'm reporting my findings so far, maybe it helps to track it
> down already.

Hmm. Even if you cannot reproduce an isolated test case, could it be
possible to get an idea of the shape the SELECT query involved and of
the schema plus the view? No need for sensitive data here. This would
help in reproducing a test case. What are also the sizes involved?
Even a small data set with work_mem low should trigger the problem?

> I've tested this so far against very current REL9_6_STABLE and
> REL9_5_STABLE and got them to crash with the same backtrace. The crash
> is dependent on the chosen plan, experiments with work_mem show that
> the crash seems to happen only if you get external sorts into the
> execution plan.
> REL10_STABLE seems not affected, as my extracted application query
> doesn't crash there.

That's one thing to begin with. So HEAD is not affected as well?
-- 
Michael


Re: PostgreSQL crashes with SIGSEGV

From
Peter Geoghegan
Date:
On Thu, Dec 7, 2017 at 7:47 AM, Bernd Helmle <mailings@oopsware.de> wrote:
> I've tested this so far against very current REL9_6_STABLE and
> REL9_5_STABLE and got them to crash with the same backtrace. The crash
> is dependent on the chosen plan, experiments with work_mem show that
> the crash seems to happen only if you get external sorts into the
> execution plan.
>
> REL10_STABLE seems not affected, as my extracted application query
> doesn't crash there.

Does "set replacement_sort_tuples = 0" change anything on
REL9_6_STABLE? If you only get a crash when there is very little
work_mem, then that might be a good avenue of investigation. Note that
my changes to external sorting started in REL9_6_STABLE, so they can't
be involved here.

Are you aware of commit 512f67c8d02cc558f9c269cc848b0f0f788c4fe1,
which fixed a bug affecting external sorts? Are you sure that you have
that fix on REL9_5_STABLE + REL9_6_STABLE?

-- 
Peter Geoghegan


Re: PostgreSQL crashes with SIGSEGV

From
Peter Geoghegan
Date:
On Thu, Dec 7, 2017 at 7:47 AM, Bernd Helmle <mailings@oopsware.de> wrote:
> It seems the backend tries to free a minimal tuple at executor end,
> which is already gone by deleting the memory context it was allocated
> in before. ExecResetTupleTable() is looping through a list with 70
> entries, it encounters (after 6 or seven rounds) the first tuple slot
> with tts_shouldFreeMin set, all others before don't have it set:

On second thought, it seems more likely that the reason that
REL10_STABLE is unaffected is commit
3856cf9607f41245ec9462519c53f1109e781fc5. As of that commit (which
built on earlier v10 work) there is no question about memory for
tuples retrieved via tuplesort_gettupleslot() not belonging to caller
-- it must belong to caller. The old interface already resulted in
bugs in early 9.6 point releases that looked similar to this one, so I
was glad to remove it. (Note also that this picture was slightly
complicated by the performance optimization commit
fa117ee40330db401da776e7b003f047098a7d4c that followed, which made
some callers opt out of copying when that was clearly safe, but that
probably isn't important.)

So, as you said, the question that we probably need to answer is: just
how did grouping sets/nodeAgg.c code end up getting tuple memory
lifetime wrong. One good way to get more information is to rerun
Valgrind, but this time with track origins enabled. I myself run
Valgrind like this when I want to see the origin of memory involved in
an error. I specify:

$ valgrind --leak-check=no --gen-suppressions=all --trace-children=yes
--track-origins=yes --read-var-info=yes
--suppressions=/home/pg/postgresql/root/source/src/tools/valgrind.supp
-v postgres --log_line_prefix="%m %p " --log_statement=all
--shared_buffers=64MB 2>&1 | tee postmaster.log

(Probably the only change that you'll need is to make is to run
Valgrind with an the extra "--track-origins=yes".)

--track-origins=yes is usually something I use when I already know
that Valgrind will complain, but want more information about the
nature of the problem.

-- 
Peter Geoghegan


Re: PostgreSQL crashes with SIGSEGV

From
Bernd Helmle
Date:
Am Donnerstag, den 07.12.2017, 18:23 -0800 schrieb Peter Geoghegan:
> Does "set replacement_sort_tuples = 0" change anything on
> REL9_6_STABLE? If you only get a crash when there is very little
> work_mem, then that might be a good avenue of investigation. Note
> that
> my changes to external sorting started in REL9_6_STABLE, so they
> can't
> be involved here.
> 

replacement_sort_tuples = 0 changes the picture, indeed. With that
setting, the query runs without problems in REL9_6_STABLE.

> Are you aware of commit 512f67c8d02cc558f9c269cc848b0f0f788c4fe1,
> which fixed a bug affecting external sorts? Are you sure that you
> have
> that fix on REL9_5_STABLE + REL9_6_STABLE?

My test instances are build against a fresh pull from yesterday.


Re: PostgreSQL crashes with SIGSEGV

From
Bernd Helmle
Date:
Am Donnerstag, den 07.12.2017, 18:54 -0800 schrieb Peter Geoghegan:
> So, as you said, the question that we probably need to answer is:
> just
> how did grouping sets/nodeAgg.c code end up getting tuple memory
> lifetime wrong. One good way to get more information is to rerun
> Valgrind, but this time with track origins enabled. I myself run
> Valgrind like this when I want to see the origin of memory involved
> in
> an error. I specify:
> 
> $ valgrind --leak-check=no --gen-suppressions=all --trace-
> children=yes
> --track-origins=yes --read-var-info=yes
> --
> suppressions=/home/pg/postgresql/root/source/src/tools/valgrind.supp
> -v postgres --log_line_prefix="%m %p " --log_statement=all
> --shared_buffers=64MB 2>&1 | tee postmaster.log
> 
> (Probably the only change that you'll need is to make is to run
> Valgrind with an the extra "--track-origins=yes".)
> 
> --track-origins=yes is usually something I use when I already know
> that Valgrind will complain, but want more information about the
> nature of the problem.

That's what i've already did. My usage of valgrind was this:

valgrind --leak-check=no --gen-suppressions=all \
--track-origins=yes --suppressions=src/tools/valgrind.supp \
--time-stamp=yes --trace-children=yes postgres



Re: PostgreSQL crashes with SIGSEGV

From
Michael Paquier
Date:
On Fri, Dec 8, 2017 at 6:19 PM, Bernd Helmle <mailings@oopsware.de> wrote:
> valgrind --leak-check=no --gen-suppressions=all \
> --track-origins=yes --suppressions=src/tools/valgrind.supp \
> --time-stamp=yes --trace-children=yes postgres

I have been trying for a couple of hours to reproduce a failure using
views with grouping sets, enforcing an external sort with things like
that:zfAgg
create table aa (a1 int, a2 int, a3 int, a4 int, a5 int, a6 int);
insert into aa SELECT (random() * 2000000000)::int,
  (random() * 2000000000)::int,
  (random() * 2000000000)::int,
  (random() * 2000000000)::int,
  (random() * 2000000000)::int,
  (random() * 2000000000)::int FROM generate_series(1,1000000);
set work_mem = '128kB';
create or replace view aav as SELECT a1, a2, a3, a4, a5, avg(a6)
  FROM aa GROUP BY GROUPING SETS (a1, a2, a3, a4, a5);
explain analyze select * from aav order by a1, a2, a3, a4, a5;

Not sure if you can provide this information, but what does the plan
of the query look like?
-- 
Michael


Re: PostgreSQL crashes with SIGSEGV

From
Andreas Seltenreich
Date:
Michael Paquier writes:

> Not sure if you can provide this information, but what does the plan
> of the query look like?

We did some more reducing work on the original query and data.  The
following testcase reproduces the double free reported by valgrind for
me when run against a vanilla REL9_6_STABLE build.

regards,
Andreas

--8<---------------cut here---------------start------------->8---
drop table if exists bug;
create table bug (n text, v text, b text, t text);
insert into bug
   select i%9, i%9, i%16 ,i%4096 from generate_series(1,100000) g(i);
analyze bug;

explain select * from (
select thecube.nv
   from ( select
            (n || ' ') || coalesce(v, '') as nv
           from bug
           group by ((n || ' ') || coalesce(v, '')) ,cube(b, t)
      ) thecube
     where nv = '8 8'
) sub limit 7000;
--8<---------------cut here---------------end--------------->8---

                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Limit  (cost=13422.32..15747.28 rows=7000 width=32)
   ->  Subquery Scan on thecube  (cost=13422.32..27622.49 rows=42754 width=32)
         ->  GroupAggregate  (cost=13422.32..27194.95 rows=42754 width=38)
               Group Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text))), bug.b, bug.t
               Group Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text))), bug.b
               Group Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text)))
               Sort Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text))), bug.t
                 Group Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text))), bug.t
               Filter: ((((bug.n || ' '::text) || COALESCE(bug.v, ''::text))) = '8 8'::text)
               ->  Sort  (cost=13422.32..13672.32 rows=100000 width=42)
                     Sort Key: (((bug.n || ' '::text) || COALESCE(bug.v, ''::text))), bug.b, bug.t
                     ->  Seq Scan on bug  (cost=0.00..2041.00 rows=100000 width=42)



Re: PostgreSQL crashes with SIGSEGV

From
Peter Geoghegan
Date:
On Thu, Dec 14, 2017 at 10:47 AM, Andreas Seltenreich
<andreas.seltenreich@credativ.de> wrote:
> We did some more reducing work on the original query and data.  The
> following testcase reproduces the double free reported by valgrind for
> me when run against a vanilla REL9_6_STABLE build.

> --8<---------------cut here---------------start------------->8---
> drop table if exists bug;
> create table bug (n text, v text, b text, t text);
> insert into bug
>    select i%9, i%9, i%16 ,i%4096 from generate_series(1,100000) g(i);
> analyze bug;
>
> explain select * from (
> select thecube.nv
>    from ( select
>             (n || ' ') || coalesce(v, '') as nv
>            from bug
>            group by ((n || ' ') || coalesce(v, '')) ,cube(b, t)
>           ) thecube
>          where nv = '8 8'
> ) sub limit 7000;
> --8<---------------cut here---------------end--------------->8---

I can reproduce this against REL9_6_STABLE, once work_mem is set to
4MB, and replacement_sort_tuples is set to 150000.

-- 
Peter Geoghegan


Re: PostgreSQL crashes with SIGSEGV

From
Peter Geoghegan
Date:
On Thu, Dec 14, 2017 at 12:06 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> I can reproduce this against REL9_6_STABLE, once work_mem is set to
> 4MB, and replacement_sort_tuples is set to 150000.

I've figured this one out. It's a double free. Consider this code:

(Although REL9_5_STABLE is also affected, the following remarks
reference REL9_6_STABLE-only code.)

 * The slot receives a copied tuple (sometimes allocated in caller memory
 * context) that will stay valid regardless of future manipulations of the
 * tuplesort's state.
 */
bool
tuplesort_gettupleslot(/* SNIP */)

I wrote that comment block, in commit a5f0bd77a, an early bugfix for
9.6 (I actually mentioned this specific commit to Bernd in passing
already). This bug is very similar to the one fixed by a5f0bd77a, and
is arguably a broader version of that same bug. Note that I didn't do
anything with external sorts until 9.6, so I expect that this affects
all stable branches. That said, it may not be possible to produce a
hard crash on all versions. More or less by accident.

The function opens with this code:

    MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
    SortTuple stup;
    bool should_free;

    if (!tuplesort_gettuple_common(state, forward, &stup, &should_free))
        stup.tuple = NULL;

    MemoryContextSwitchTo(oldcontext);
    /* SNIP */
    ...

As long as the slot receives a copied tuple that may or may not be in
caller's memory context (it may instead get allocated in a
tuplesort-private memory context), and as long as destroying a
tuplesort via tuplesort_end() counts as a "future manipulation", then
you it's possible to crash with a double-pfree() while not violating
the tuplesort_gettupleslot() contract. The memory may or may not be
freed as part of tuplesort_end() teardown, based only on accidental
implementation factors (that's the first time the memory gets freed
when we do crash). Specifically, if state->status == TSS_SORTEDONTAPE,
we risk this crash on versions before Postgres 10 (more on why 10
appears to be okay tomorrow), when tuplesort_end() just so happens to
have been called (destroying the 2 memory contexts) before the
executor's superfluous pfree() within ExecResetTupleTable().

The ambiguity about who owns the tuple memory when is the basic
problem, once again. One could argue that it's the caller's fault for
not knowing to not pfree() the tuple after tuplesort_end() is called,
but I don't buy that because it seems like an undue burden on callers
to do that. It seems okay to either provide a very weak, simple
guarantee about tuple memory lifetime, or very strong simple guarantee
about tuple memory lifetime. That's what routines like
tuplesort_gettupleslot() and tuplesort_getindextuple() should limit
their contracts to -- we've had enough problems in this area over the
years that that seems pretty clear to me.

ISTM that an appropriate fix is one that results in having
tuplesort_gettupleslot() tuple memory that is *consistently* allocated
within caller's own memory context, at least on versions prior to
Postgres 10 (it's a bit more complicated in Postgres 10). I'll begin
work at a patch for this tomorrow. It's not obvious what approach to
take, because on 9.6+, memory is allocated in tuplecontext, not
sortcontext. A well-targeted fix will not burden low-level READTUP()
routines (that use tuplecontext) with direct knowledge of the problem.

-- 
Peter Geoghegan


Re: PostgreSQL crashes with SIGSEGV

From
Peter Geoghegan
Date:
On Thu, Dec 14, 2017 at 5:58 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> The ambiguity about who owns the tuple memory when is the basic
> problem, once again. One could argue that it's the caller's fault for
> not knowing to not pfree() the tuple after tuplesort_end() is called,
> but I don't buy that because it seems like an undue burden on callers
> to do that. It seems okay to either provide a very weak, simple
> guarantee about tuple memory lifetime, or very strong simple guarantee
> about tuple memory lifetime. That's what routines like
> tuplesort_gettupleslot() and tuplesort_getindextuple() should limit
> their contracts to -- we've had enough problems in this area over the
> years that that seems pretty clear to me.
>
> ISTM that an appropriate fix is one that results in having
> tuplesort_gettupleslot() tuple memory that is *consistently* allocated
> within caller's own memory context, at least on versions prior to
> Postgres 10 (it's a bit more complicated in Postgres 10).

This took longer than expected. Attached patch, which targets 9.6,
shows what I have in mind. I'm going on vacation shortly, but wanted
to post this patch before I leave. It could definitely use some more
testing, since it was written under time pressure. I expect to be back
early in the new year, so if someone feels like taking this off my
hands, they're more than welcome to do so.

Postgres v10 isn't affected because there is no such thing as
should_free from commit f1f5ec1ef onwards (commit fa117ee40
reintroduced a no-copy optimization a while later, which looks like it
made things unsafe again, but it's actually fine -- the
ExecClearTuple() pfree() won't happen, so no crash). Everything is
fine on v10 because the design was improved for v10. On v10,
tuplesort_gettupleslot() calls ExecStoreMinimalTuple() with arguments
that indicate that the memory still belongs to the tuplesort, unless
it was an explicit copy made with heap_copy_minimal_tuple() there and
then, in which case it belong to caller. This works because on v10
we're always managing some temp buffers, regardless of state->status.
There is no question of a routine like readtup_alloc() allocating or
using memory without knowing about its lifetime, because it's always
using the smallish slab allocator buffers.

To put in another way, in v10 everything is fine because callers to
routines like tuplesort_gettupleslot() either have very limited
guarantees about fetched tuple memory that is owned by tuplesort (it
can go away almost immediately, on the next call to
tuplesort_gettupleslot()), or infinitely flexible guarantees (it's
unambiguously caller's copy, and lives in caller's memory context).
There is no awkward mix of these two situations, as is the case in
earlier versions. That awkward mix is: "You (caller) should free this
memory, and can rely on this memory sticking around until
tuplesort_end() is called, at which point I (tuplesort) will free it
if you haven't -- though I might *not* free it then, based on whether
or not I felt like using your memory context!". This is an accident
waiting to happen. Remember, the crash isn't a use-after-free; it's a
double-free caused by conflated responsibilities across two low-level,
refcount-like mechanisms. The idea of a tts_shouldFree=true tuple
whose memory exists in a context that isn't under the executor's
direct control is a fundamentally bogus idea. It kinda works most of
the time, but only to the extent that the lifetime of a tuplesort is
accidentally tied to the lifetime of an executor node, without the
tuple/slot making in into something like estate->es_tupleTable (as was
the case in Andreas' example).

An alternative approach to fixing this issue, that I also want to
throw out there, is perform a simple change within
ExecResetTupleTable() to make ExecClearTuple()'d slots force
"tts_shouldFree = false" -- a second patch shows what I mean here. All
ExecResetTupleTable() callers pass "shouldFree = false" anyway,
because all memory is about to go away (within FreeExecutorState()).
This second approach seems like a real kludge to me, but it's also a
fix that requires a tiny tweak, rather than non-trivial tuplesort.c
memory management changes. I would feel worse about the idea of
applying this kludge if we had to rely on it forever, but since v10 is
unaffected anyway, we really don't. But it's still an awful kludge. I
find it hard to know what to do here (the vacation is overdue, it
seems).

(Note that v10 does have a minor, related bug, noted in commit
messages of both patches.)

-- 
Peter Geoghegan

Attachment

Re: PostgreSQL crashes with SIGSEGV

From
Michael Paquier
Date:
On Sun, Dec 17, 2017 at 10:40 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Dec 14, 2017 at 5:58 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> The ambiguity about who owns the tuple memory when is the basic
>> problem, once again. One could argue that it's the caller's fault for
>> not knowing to not pfree() the tuple after tuplesort_end() is called,
>> but I don't buy that because it seems like an undue burden on callers
>> to do that. It seems okay to either provide a very weak, simple
>> guarantee about tuple memory lifetime, or very strong simple guarantee
>> about tuple memory lifetime. That's what routines like
>> tuplesort_gettupleslot() and tuplesort_getindextuple() should limit
>> their contracts to -- we've had enough problems in this area over the
>> years that that seems pretty clear to me.
>>
>> ISTM that an appropriate fix is one that results in having
>> tuplesort_gettupleslot() tuple memory that is *consistently* allocated
>> within caller's own memory context, at least on versions prior to
>> Postgres 10 (it's a bit more complicated in Postgres 10).
>
> This took longer than expected. Attached patch, which targets 9.6,
> shows what I have in mind. I'm going on vacation shortly, but wanted
> to post this patch before I leave. It could definitely use some more
> testing, since it was written under time pressure. I expect to be back
> early in the new year, so if someone feels like taking this off my
> hands, they're more than welcome to do so.

Could you add that to the next commit fest registered as a bug fix? We
don't want to lose track of that and this should be reviewed.
-- 
Michael


Re: PostgreSQL crashes with SIGSEGV

From
Peter Geoghegan
Date:
On Sun, Dec 17, 2017 at 2:46 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Could you add that to the next commit fest registered as a bug fix? We
> don't want to lose track of that and this should be reviewed.

Done.

-- 
Peter Geoghegan