Re: 9.5 release notes - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: 9.5 release notes
Date
Msg-id CAM3SWZQa8sQ94bNWyL4YWq1m6uyf+-A3+KTqYcVTN5pe=58kSA@mail.gmail.com
Whole thread Raw
In response to 9.5 release notes  (Bruce Momjian <bruce@momjian.us>)
Responses Re: 9.5 release notes  (Peter Geoghegan <pg@heroku.com>)
Re: 9.5 release notes  (Robert Haas <robertmhaas@gmail.com>)
Re: 9.5 release notes  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Wed, Jun 10, 2015 at 9:15 PM, Bruce Momjian <bruce@momjian.us> wrote:
> I have committed the first draft of the 9.5 release notes.  You can view
> the output here:

+      <listitem>
+       <para>
+        Improve the speed of sorting character and numeric fields (Robert
+        Haas, Peter Geoghegan, Andrew Gierth)
+       </para>
+      </listitem>

A few comments on this.

First of all, I think it should be separately noted that the
sortsupport infrastructure is now used in virtually all places where
it's useful (see commit 5ea86e6e6). So for example, CREATE INDEX on
integer columns ought to be notably faster (and CLUSTER, too). The 9.2
era sortsupport stuff was simply never adopted to do that until now.
That has nothing to do with abbreviated keys, except that the idea of
abbreviated keys gave me a strong reason to care about sortsupport a
lot more. But commit 5ea86e6e6 predates abbreviated keys, and is
certainly independently useful (this really should have made it into
9.2).

Secondly, Robert didn't credit himself as an author in his commit
message for the abbreviated keys infrastructure + text opclass support
*at all*. However, I think that Robert should be listed as a secondary
author of the abbreviated keys infrastructure, and that he would agree
that I am clearly the primary author. Andrew Gierth did work on the
datum case for sortsupport + abbreviation, so I agree he should be
listed as a secondary author of the infrastructure too, after Robert.

I think there should be a total of 4 items related to sorting. The
wording I come up with may not be appropriate, but will give you an
idea:

* Allow sorting to be performed by inlined, non-SQL-callable
comparison functions for CREATE INDEX, REINDEX and CLUSTER operations
based on a B-Tree operator class. (5ea86e6e6 -- Geoghegan)

* Add abbreviated key sorting infrastructure. This allows B-Tree
operator classes to provide compact abbreviated representations of
pass-by-reference types which are sorted with inexpensive comparisons.
This makes sort operations with support for the infrastructure very
significantly faster in the common case where most comparisons can be
resolved with the abbreviated representation alone. (4ea51cdfe85 --
Geoghegan, Haas, Gierth, with Gierth's contribution coming from
78efd5c1 alone)

* Add sortsupport (support for non-SQL callable interface for
comparators) with abbreviation capability to text/varlena operator
class. This significantly accelerates sorting on text columns.
(4ea51cdfe85 too, but also b34e37bf. Worth noting separately IMV.
Geoghegan, Haas).

* Add sortsupport (support for non-SQL callable interface for
comparators) with abbreviation capability to numeric operator class.
This significantly accelerates sorting on numeric columns. (abd94bcac,
Gierth)

I'm not sure if it's worth mentioning the "cheap equality for text"
commit (e246b3d6eac09). I guess that it probably is, because it will
help with things like index scans, too. Arguably that isn't a sorting
thing (it's certainly not *just* a sorting thing).

I've blogged on the abbreviated key stuff quite a bit, which may be
useful should you require additional background information:

http://pgeoghegan.blogspot.com/2015/01/abbreviated-keys-exploiting-locality-to.html

http://pgeoghegan.blogspot.com/2015/04/abbreviated-keys-for-numeric-to.html

Thanks
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Reconsidering the behavior of ALTER COLUMN TYPE
Next
From: Peter Geoghegan
Date:
Subject: Re: 9.5 release notes