Re: 9.5: Better memory accounting, towards memory-bounded HashAgg - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Date
Msg-id 54971D52.7070903@fuzzy.cz
Whole thread Raw
In response to Re: 9.5: Better memory accounting, towards memory-bounded HashAgg  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: 9.5: Better memory accounting, towards memory-bounded HashAgg  (Tomas Vondra <tv@fuzzy.cz>)
Re: 9.5: Better memory accounting, towards memory-bounded HashAgg  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On 2.12.2014 06:14, Jeff Davis wrote:
> On Sun, 2014-11-30 at 17:49 -0800, Peter Geoghegan wrote:
>> On Mon, Nov 17, 2014 at 11:39 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>>> I can also just move isReset there, and keep mem_allocated as a uint64.
>>> That way, if I find later that I want to track the aggregated value for
>>> the child contexts as well, I can split it into two uint32s. I'll hold
>>> off any any such optimizations until I see some numbers from HashAgg
>>> though.
>>
>> I took a quick look at memory-accounting-v8.patch.
>>
>> Is there some reason why mem_allocated is a uint64? All other things
>> being equal, I'd follow the example of tuplesort.c's
>> MemoryContextAllocHuge() API, which (following bugfix commit
>> 79e0f87a1) uses int64 variables to track available memory and so on.
>
> No reason. New version attached; that's the only change.

I've started reviewing this today. It does not apply cleanly on current
head, because of 4a14f13a committed a few days ago. That commit changed
the constants in  src/include/utils/hsearch.h, so the patch needs to use
this:

#define HASH_NOCHILDCXT 0x4000 /* ... */

That's the only conflict, and after fixing it it compiles OK. However, I
got a segfault on the very first query I tried :-(

  create table test_hash_agg as select i AS a, i AS b, i AS c, i AS d
    from generate_series(1,10000000) s(i);

  analyze test_hash_agg;

  select a, count(*) from test_hash_agg where a < 100000 group by a;

With a fresh cluster (default config), I get this:

  server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
  The connection to the server was lost. Attempting reset: Failed.

The backtrace looks like this (full attached):

Program received signal SIGSEGV, Segmentation fault.
advance_transition_function (aggstate=aggstate@entry=0x2b5c5f0,
peraggstate=peraggstate@entry=0x2b5efd0,
pergroupstate=pergroupstate@entry=0x8) at nodeAgg.c:468
468                     if (pergroupstate->noTransValue)
(gdb) bt
#0  advance_transition_function at nodeAgg.c:468
#1  0x00000000005c3494 in advance_aggregates at nodeAgg.c:624
#2  0x00000000005c3dc2 in agg_fill_hash_table at nodeAgg.c:1640
#3  ExecAgg (node=node@entry=0x2b5c5f0) at nodeAgg.c:1338

(gdb) print pergroupstate->noTransValue
Cannot access memory at address 0x11
(gdb) print pergroupstate
$1 = (AggStatePerGroup) 0x8

My guess is something is scribbling over the pergroup memory, or maybe
the memory context gets released, or something. In any case, it's easily
reproducible, and apparently deterministic (always exactly the same
values, no randomness).

regards
Tomas

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Proposal "VACUUM SCHEMA"
Next
From: Andrew Dunstan
Date:
Subject: Re: controlling psql's use of the pager a bit more