Re: Should pg 11 use a lot more memory building an spgist index? - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Should pg 11 use a lot more memory building an spgist index?
Date
Msg-id f7da983e-e489-3f05-5446-48812c8e35fc@lab.ntt.co.jp
Whole thread Raw
In response to Re: Should pg 11 use a lot more memory building an spgist index?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Should pg 11 use a lot more memory building an spgist index?  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-hackers
On 2018/10/30 4:48, Tom Lane wrote:
> I wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> How about modifying SysScanDescData to have a memory context member,
>>> which is created by systable_beginscan and destroyed by endscan?
> 
>> I think it would still have problems, in that it would affect in which
>> context index_getnext delivers its output.  We could reasonably make
>> these sorts of changes in places where the entire index_beginscan/
>> index_getnext/index_endscan call structure is in one place, but that
>> doesn't apply for the systable functions.
> 
> Actually, after looking more closely, index_getnext doesn't return a
> palloc'd object anyway, or at least not one that the caller is responsible
> for managing.  So maybe that could work.

Instead of SysScanDescData, could we add one to IndexScanData?  It's
somewhat clear that catalog scans don't leak much, but user index scans
can, as seen in this case.

In this particular case, I see that it's IndexCheckExclusion() that
invokes check_unique_or_exclusion_constraint() repeatedly for each heap
tuple after finishing building index on the heap.  The latter performs
index_beginscan/endscan() for every heap tuple, but doesn't bother to
release the memory allocated for IndexScanDesc and its members.

I've tried to fix that with the attached patches.

0001 adds the ability for the callers of index_beginscan to specify a
memory context.  index_beginscan_internals switches to that context before
calling ambeginscan and stores into a new field xs_scan_cxt of
IndexScanData, but maybe storing it is unnecessary.

0002 builds on that and adds the ability for the callers of
check_exclusion_or_unique_constraints to specify a memory context.  Using
that, it fixes the leak by teaching IndexCheckExclusion and
ExecIndexInsertTuples to pass a memory context that's reset before calling
check_exclusion_or_unique_constraints() for the next tuple.

The following example shows that the patch works.

With HEAD:

create table foo (a int4range);
alter table foo add exclude using spgist (a with &&);
-- this consumes about 1GB
insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
alter table foo drop constraint foo_a_excl;
-- this too
alter table foo add exclude using spgist (a with &&);

Patched:

create table foo (a int4range);
alter table foo add exclude using spgist (a with &&);
-- memory consumption doesn't grow above 37MB
insert into foo select int4range(i,i+1) from generate_series(1, 100000) i;
alter table foo drop constraint foo_a_excl;
-- ditto
alter table foo add exclude using spgist (a with &&);

Thoughts?

Thanks,
Amit

Attachment

pgsql-hackers by date:

Previous
From: Krzysztof Nienartowicz
Date:
Subject: Re: Speeding up INSERTs and UPDATEs to partitioned tables
Next
From: Andreas 'ads' Scherbaum
Date:
Subject: Re: INSTALL file