Re: Polyphase merge is obsolete - Mailing list pgsql-hackers

From John Naylor
Subject Re: Polyphase merge is obsolete
Date
Msg-id CAFBsxsEbWy=imDgm_HpQg2r98p5GnKBR0pkv4xBBbSNTNM8fRQ@mail.gmail.com
Whole thread Raw
In response to Re: Polyphase merge is obsolete  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Polyphase merge is obsolete  (Peter Geoghegan <pg@bowt.ie>)
Re: Polyphase merge is obsolete  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers

On Wed, Sep 15, 2021 at 5:35 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Thanks, here's another rebase.
>
> - Heikki

I've had a chance to review and test out the v5 patches.

0001 is a useful simplification. Nothing in 0002 stood out as needing comment.

I've done some performance testing of master versus both patches applied. The full results and test script are attached, but I'll give a summary here. A variety of value distributions were tested, with work_mem from 1MB to 16MB, plus 2GB which will not use external sort at all. I settled on 2 million records for the sort, to have something large enough to work with but also keep the test time reasonable. That works out to about 130MB on disk. We have recent improvements to datum sort, so I used both single values and all values in the SELECT list.

The system was on a Westmere-era Xeon with gcc 4.8. pg_prewarm was run on the input tables. The raw measurements were reduced to the minimum of five runs.

I can confirm that sort performance is improved with small values of work_mem. That was not the motivating reason for the patch, but it's a nice bonus. Even as high as 16MB work_mem, it's possible some of the 4-6% differences represent real improvement and not just noise or binary effects, but it's much more convincing at 4MB and below, with 25-30% faster with non-datum integer sorts at 1MB work_mem. The nominal regressions seem within the noise level, with one exception that only showed up in one set of measurements (-10.89% in the spreadsheet). I'm not sure what to make of that since it only happens in one combination of factors and nowhere else.

On Sat, Sep 11, 2021 at 4:28 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> + * Before PostgreSQL 14, we used the polyphase merge algorithm (Knuth's
> + * Algorithm 5.4.2D),
>
> I think the above 'Before PostgreSQL 14' should be 'Before PostgreSQL 15' now that PostgreSQL 14 has been released.
>
> +static int64
> +merge_read_buffer_size(int64 avail_mem, int nInputTapes, int nInputRuns,
> +                      int maxOutputTapes)
>
> For memory to allocate, I think uint64 can be used (instead of int64).

int64 is used elsewhere in this file, and I see now reason to do otherwise.

Aside from s/14/15/ for the version as noted above, I've marked it Ready for Committer.

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: How to retain lesser paths at add_path()?
Next
From: Bharath Rupireddy
Date:
Subject: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?