Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20190625203207.lvmmat7d2ilnr64t@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Tue, Jun 25, 2019 at 12:13:01PM -0700, Peter Geoghegan wrote:
>On Tue, Jun 25, 2019 at 11:03 AM James Coleman <jtc331@gmail.com> wrote:
>> No, I haven't confirmed that it's called less frequently, and I'd be
>> extremely surprised if it were given the diff doesn't suggest any
>> changes to that at all.
>
>I must have misunderstood, then. I thought that you were suggesting
>that that might have happened.
>
>> If you think it's important enough to do so, I can instrument it to
>> confirm, but I was mostly wanting to know if there were any other
>> plausible explanations, and I think you've provided one: there *are*
>> changes in the patch to memory contexts in tuplesort.c, so if memory
>> fragmentation is a real concern this patch could definitely notice
>> changes in that regard.
>
>Sounds like it's probably fragmentation. That's generally hard to measure.
>

I'm not sure I'm really conviced this explains the difference, because
the changes in tuplesort.c are actually fairly small - we do split the
tuplesort context into two, but vast majority of the stuff is allocated
in one of the contexts (essentially just the tuplesort state gets moved
to a new context). I wouldn't expect this to have such strong impact on
locality/fragmentation.

But maybe it does - in that case it seems it might be worthwile to do it
separately, irrespectedly of the incremental sort patch. I wonder if
perf would show that as cache hits/misses, or something?

It shouldn't be that difficult to separate this change into a separate
patch, and benchmark it on it's own, though.


FWIW while looking at the tuplesort.c changes, I've noticed some
inaccurate comments in tuplesort_free. Firstly, the top-level comment
says:

/*
 * tuplesort_free
 *
 *    Internal routine for freeing resources of tuplesort.
 */

without mentioning which resources it actually releases, so it kinda
suggests it releases everything. But that's not true - AFAICS it only
releases the per-sort resources. IMO this is a poor function name, and
people will easily keep resources longer than they think - we should
rename it to something like tuplesort_free_batch().

And then at the end tuplesort_free() does this:

    /*
     * Free the per-sort memory context, thereby releasing all working memory,
     * including the Tuplesortstate struct itself.
     */
    MemoryContextReset(state->sortcontext);

But that's clearly not true, because the tuplesortstate is allocated in
the maincontext, not sortcontext.

In general, the comments seem to be a bit confused by what 'sort' means.
Sometimes it means the whole sort operation, sometimes it means one of
the batches, etc.  And the fact that the per-batch context is called
sortcontext does not really improve the situation.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Don't allocate IndexAmRoutine dynamically?
Next
From: James Coleman
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)