Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From James Coleman
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id CAAaqYe8KUqeybHdAJJ=V0jS3JztS0wiNmNu9i1VY3m-uoMXkzw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
On Thu, Apr 2, 2020 at 8:46 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > Hi,
> >
> > Thanks, the v52 looks almost ready. I've been looking at the two or
> > three things I mentioned, and I have a couple of comments.
> >
> >
> > 1) /* XXX comparison_cost shouldn't be 0? */
> >
> > I'm not worried about this, because this is not really intriduced by
> > this patch - create_sort_path has the same comment/issue, so I think
> > it's acceptable to do the same thing for incremental sort.
>
> Sounds good.
>
> > 2) INITIAL_MEMTUPSIZE
> >
> > tuplesort.c does this:
> >
> >    #define INITIAL_MEMTUPSIZE Max(1024, \
> >        ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1)
> >
> > supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD.
> > But I think it fails to do that, for a couple of reasons.
> >
> > Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at
> > minimum (without padding), so with 1024 elements it's guaranteed to be
> > at least 21kB - so exceeding the threshold. The maximum value is
> > something like 256.
> >
> > Secondly, the second part of the formula is guaranteed to get us over
> > the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then
> >
> >    ALLOCSET_SEPARATE_THRESHOLD / 30 = 273
> >
> > but we end up using 274, resulting in 8220B array. :-(
> >
> > So I guess the formula should be more like
> >
> >    Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple))
> >
> > or something like that.
> >
> > FWIW I think the whole hypothesis that selecting the array size below
> > ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we
> > allocate this only once (or very few times), and if we need to grow the
> > array we'll hit the threshold anyway.
> >
> > I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024
> > may be OK too.
>
> That was a part of the patch I haven't touched since I inherited it,
> and I didn't feel like I knew enough about the Postgres memory
> management to make a determination on whether the reasoning made
> sense.
>
> So I' happy to use a constant as suggested.

I take that back. That code hasn't changed, it's just moved. Here's
the current code in tuplesort_begin_common on master:

/*
* Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD;
* see comments in grow_memtuples().
*/
state->memtupsize = Max(1024,
ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1);

I'm not sure we ought to change that in this patch...

James



pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: Amit Kapila
Date:
Subject: Re: WAL usage calculation patch