Home > mailing lists

Re: A potential memory leak on Merge Join when Sort node is not below Materialize node - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Date	September 29, 2022 01:12:44
Msg-id	CAApHDvpxQsTX0rYceTQYeSqUZzpyFQkFVKOV=UUB68=NLcJAFg@mail.gmail.com Whole thread Raw
In response to	Re: A potential memory leak on Merge Join when Sort node is not below Materialize node (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
List	pgsql-hackers

Tree view

On Thu, 29 Sept 2022 at 12:07, Peter Geoghegan <pg@bowt.ie> wrote:
> Also potentially relevant: the 2017 commit fa117ee4 anticipated adding
> a "copy" argument to tuplesort_getdatum() (the same commit added such
> a "copy" argument to tuplesort_gettupleslot()). I see that that still
> hasn't happened to tuplesort_getdatum() all these years later. Might
> be a good idea to do it in the next year or two, though.
>
> If David is interested in pursuing this now then I certainly won't object.

Just while this is fresh in my head, I wrote some code to make this
happen.  My preference would be not to add the "copy" param to the
existing function and instead just add a new function to prevent
additional branching.

The attached puts back the datum sort in nodeSort.c for byref types
and adjusts process_ordered_aggregate_single() to make use of this
function.

I did a quick benchmark to see if this help DISTINCT aggregate any:

create table t1 (a varchar(32) not null, b varchar(32) not null);
insert into t1 select md5((x%10)::text),md5((x%10)::text) from
generate_Series(1,1000000)x;
vacuum freeze t1;
create index on t1(a);

With a work_mem of 256MBs I get:

query = select max(distinct a), max(distinct b) from t1;

Master:
latency average = 313.197 ms

Patched:
latency average = 304.335 ms

So not a very impressive speedup there (about 3%)

Some excerpts from perf top show:

Master:
   1.40%  postgres          [.] palloc
   1.13%  postgres          [.] tuplesort_getdatum
   0.77%  postgres          [.] datumCopy

Patched:
   0.91%  postgres          [.] tuplesort_getdatum_nocopy
   0.65%  postgres          [.] palloc

I stared for a while at the mode_final() function and thought maybe we
could use the nocopy variant there. I just didn't quite pluck up the
motivation to write any code to see if it could be made faster.

David

Attachment

add_tuplesort_getdatum_nocopy_function.patch

pgsql-hackers by date:

From: Zhang Mingli
Date: 29 September 2022, 01:05:18
Subject: Re: Refactor UnpinBuffer()

From: Peter Geoghegan
Date: 29 September 2022, 01:31:42
Subject: Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

Re: A potential memory leak on Merge Join when Sort node is not below Materialize node - Mailing list pgsql-hackers

Attachment

Previous

Next