Thread: More work on SortSupport for text - strcoll() and strxfrm() caching

More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

04 July 2015, 00:33:45

Since apparently we're back to development work, I thought it was time
to share a patch implementing a few additional simple tricks to make
sorting text under a non-C locale even faster than in 9.5. These
techniques are mostly effective when values are physically clustered
together. This might be because there is a physical/logical
correlation, but cases involving any kind of clustering of values are
helped significantly.

Caching
======

The basic idea is that we cache strxfrm() blobs. Separately, we
exploit temporal locality and clustering of values by caching the
result of the most recent strcoll()-resolved comparison performed. The
strxfrm() technique helps a lot with low cardinality single attribute
sorts if we can avoid most strxfrm() work. On the other hand,
strcoll() comparison caching particularly helps with multi-attribute
sorts where there is a low to moderate cardinality secondary attribute
and low cardinality leading attribute. The master branch will still
opportunistically take the equality memcmp() fastpath plenty of times
for that second attribute, but there are no abbreviated keys to help
when that doesn't work out (because it isn't the leading attribute).

Regressions
==========

The patch still helps with strcoll() avoidance when the ordering of a
moderate cardinality attribute is totally random, but it helps much
less there. I have not seen a regression for any case. I'm expecting
someone to ask me to do something with the program I wrote last year,
to prove the opportunistic memcmp() equality fastpath for text is
"free" [1]. This patch has exactly the same tension as last year's
memcmp() equality one [2]: I add something opportunistic, that in
general might consistently not work out at all in some cases, and on
the face of it implies extra costs for those cases -- costs which must
be paid every single time. So as with the opportunistic memcmp()
equality thing, the *actual* overhead for cases that do not benefit
must be virtually zero for the patch to be worthwhile. That is the
standard that I expect that this patch will be held to, too.

Benchmark
=========

The query that I've been trying this out with is a typical rollup
query, using my "cities" sample data [3] (this is somewhat, although
not perfectly correlated on (country, province) before sorting):

postgres=# select country, province, count(*) from cities group by
rollup (country, province);
               country               |               province
      | count
-------------------------------------+---------------------------------------+--------
 Afghanistan                         | Badaẖšan
       |      5
 Afghanistan                         | Bādgīs
      |      2
 Afghanistan                         | Baġlān
      |      5
 Afghanistan                         | Balẖ
       |      6
 Afghanistan                         | Bāmiyān
      |      3
 Afghanistan                         | Farāh
      |      3
 Afghanistan                         | Fāryāb
      |      4
 Afghanistan                         | Ġawr
      |      3

*** SNIP *****
...

 Zimbabwe                            | Manicaland
      |     22
 Zimbabwe                            | Mashonaland Central
      |     13
 Zimbabwe                            | Mashonaland East
      |      9
 Zimbabwe                            | Mashonaland West
      |     21
 Zimbabwe                            | Masvingo
      |     11
 Zimbabwe                            | Matabeleland North
      |      8
 Zimbabwe                            | Matabeleland South
      |     14
 Zimbabwe                            | Midlands
      |     14
 Zimbabwe                            | [null]
      |    116
 [null]                              | [null]
      | 317102
(3529 rows)

With master, this takes about 525ms when it stabilizes after a few
runs on my laptop. With the patch, it takes about 405ms. That's almost
a 25% reduction in total run time. If I perform a more direct test of
sort performance against this data with minimal non-sorting overhead,
I see a reduction of as much as 30% in total query runtime (I chose
this rollup query because it is obviously representative of the real
world).

If this data is *perfectly* correlated (e.g. because someone ran
CLUSTER) and some sort can use the dubious "bubble sort best case"
path [4] that we added to qsort back in 2006, the improvement still
hold up at ~20%, I've found.

Performance of the "C" locale
---------------------------------------

For this particular rollup query, my 25% improvement leaves the
collated text sort perhaps marginally faster than an equivalent query
that uses the "C" locale (with or without the patch applied). It's
hard to be sure that that effect is real -- many trials are needed --
but it's reasonable to speculate that it's possible to sometimes beat
the "C" locale because of factors like final abbreviated key
cardinality.

It's easy to *contrive* a case where the "C" locale is beaten even
with 9.5 -- just sort a bunch of strings (that are abbreviated), that
look something like this:

"``..,,``..AAA"
"``..,,``..CCC"
"``..,,``..ZZZ"
"``..,,``..BBB"

Anyway, this avoidance of strxfrm() work I've introduced makes it
possible that abbreviated keys could make a strxfrm() locale-based
sort beat the "C" locale fair-and-square with a realistic dataset and
specific realistic query. That would be pretty nice, because that
can't be too far from optimal, and these cases are not uncommon.

A further idea -- unsigned integer comparisons
===================================

I've also changed text abbreviated keys to compare as unsigned
integers. On my Thinkpad laptop (which, of course, has an Intel CPU),
this makes a noticeable difference. memcmp() may be fast, but an
unsigned integer comparison is even faster (not sure if a big-endian
machine can have the existing memcmp() call optimized away, so that
effectively the same thing happens automatically).

Maybe other platforms benefit less, but it's very hard to imagine it
ever costing us anything.

[1] http://www.postgresql.org/message-id/5415A843.3070602@vmware.com
[2] Commit e246b3d6
[3] http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/data/cities.dump
[4] Commit a3f0b3d6
--
Peter Geoghegan

Attachment

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

04 August 2015, 19:41:38

On Fri, Jul 3, 2015 at 8:33 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Since apparently we're back to development work, I thought it was time
> to share a patch implementing a few additional simple tricks to make
> sorting text under a non-C locale even faster than in 9.5. These
> techniques are mostly effective when values are physically clustered
> together. This might be because there is a physical/logical
> correlation, but cases involving any kind of clustering of values are
> helped significantly.

Interesting work.

Some comments:

1. My biggest gripe with this patch is that the comments are not easy
to understand.  For example:

+     * Attempt to re-use buffers across calls.  Also, avoid work in the event
+     * of clustered together identical items by exploiting temporal locality.
+     * This works well with divide-and-conquer, comparison-based sorts like
+     * quicksort and mergesort.
+     *
+     * With quicksort, there is, in general, a pretty strong chance that the
+     * same buffer contents can be used repeatedly for pivot items -- early
+     * pivot items will account for large number of total comparisons, since
+     * they must be compared against many (possibly all other) items.

Well, what I would have written is something like: "We're likely to be
asked to compare the same strings repeatedly, and memcmp() is so much
cheaper than memcpy() that it pays to attempt a memcmp() in the hopes
of avoiding a memcpy().  This doesn't seem to slow things down
measurably even if it doesn't work out very often."

+     * While it is worth going to the trouble of trying to re-use buffer
+     * contents across calls, ideally that will lead to entirely avoiding a
+     * strcoll() call by using a cached return value.
+     *
+     * This optimization can work well again and again for the same set of
+     * clustered together identical attributes;  as they're relocated to new
+     * subpartitions, only one strcoll() is required for each pivot (in respect
+     * of that clump of identical values, at least).  Similarly, the final
+     * N-way merge of a mergesort can be effectively accelerated if each run
+     * has its own locally clustered values.

And here I would have written something like: "If we're comparing the
same two strings that we compared last time, we can return the same
answer without calling strcoll() again.  This is more likely than it
seems, because quicksort compares the same pivot against many values,
and some of those values might be duplicates."

Of course everybody may prefer something different here; I'm just
telling you what I think.

2. I believe the change to bttextcmp_abbrev() should be pulled out
into a separate patch and committed separately.  That part  seems like
a slam dunk.

3. What is the worst case for the strxfrm()-reuse stuff?  I suppose
it's the case where we have many strings that are long, all
equal-length, and all different, but only in the last few characters.
Then the memcmp() is as expensive as possible but never works out.
How does the patch hold up in that case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

04 August 2015, 20:31:02

On Tue, Aug 4, 2015 at 12:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Interesting work.

Thanks.

> 1. My biggest gripe with this patch is that the comments are not easy
> to understand.

> Of course everybody may prefer something different here; I'm just
> telling you what I think.

I have struggled with trying to put just the right amount of
exposition on the theory behind a particular optimization in source
code comments, and things like that. Since no one is telling me that I
need to write more, clearly I don't have the balance right yet. To a
certain extent, it is a matter of personal style, but I'll try and be
more terse.

> 2. I believe the change to bttextcmp_abbrev() should be pulled out
> into a separate patch and committed separately.  That part  seems like
> a slam dunk.

Makes sense.

> 3. What is the worst case for the strxfrm()-reuse stuff?  I suppose
> it's the case where we have many strings that are long, all
> equal-length, and all different, but only in the last few characters.
> Then the memcmp() is as expensive as possible but never works out.
> How does the patch hold up in that case?

I haven't tested it. I'll get around to it at some point in the next
couple of weeks. I imagine that it's exactly the same as the memcmp()
equality thing because of factors like speculative execution, and the
fact that we need both strings in cache anyway. It's almost exactly
the same story, although unlike the memcmp() opportunistic equality
pre-check thing, this check happens only n times, not n log n times.

I'm quite sure that the cost needs to be virtually zero to go ahead
with the idea. I think it probably is. Note that like the memcmp()
thing, we check string length first, before a memcmp().

-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

04 August 2015, 20:43:22

On Tue, Aug 4, 2015 at 1:30 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> 2. I believe the change to bttextcmp_abbrev() should be pulled out
>> into a separate patch and committed separately.  That part  seems like
>> a slam dunk.
>
> Makes sense.

BTW, I want to put the string_uint() macro in a common header now. It
can be used for other types. I've written a SortSupport + abbreviated
keys patch for the UUID type which will use it, too, so that it too
can use simple unsigned integer comparisons within its abbreviated
comparator. I haven't posted the UUID patch yet only because everyone
is up to their ears in my sorting patches.

-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

04 October 2015, 06:18:09

On Tue, Aug 4, 2015 at 12:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Some comments:

I attach a new version of the patch series that incorporates all your
feedback. The patch series is now made cumulative in a way that makes
it easy for someone to independently commit the unsigned integer
comparison optimization for text, and nothing else. The macro that
uses is in a dedicated header now, because I have another patch
(SortSupport for the UUID type) that uses the same optimization for
the same reason. It seems like something that will probably end up
with a third or forth client before too long, so I think the byte swap
macro wrapper belongs in sortsupport.h.

BTW, I think that in practice the merge phase of a tape sort isn't
much helped by comparison caching, contrary to comments appearing in
the original version. The heap data structure used by polyphase merge
has bad properties around locality (both temporal and spatial). I'm
thinking about independently addressing that problem. I now make no
claims about it in this patch.

--
Peter Geoghegan

Attachment

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

06 October 2015, 20:07:16

On Sun, Oct 4, 2015 at 2:17 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Aug 4, 2015 at 12:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Some comments:
>
> I attach a new version of the patch series that incorporates all your
> feedback. The patch series is now made cumulative in a way that makes
> it easy for someone to independently commit the unsigned integer
> comparison optimization for text, and nothing else. The macro that
> uses is in a dedicated header now, because I have another patch
> (SortSupport for the UUID type) that uses the same optimization for
> the same reason. It seems like something that will probably end up
> with a third or forth client before too long, so I think the byte swap
> macro wrapper belongs in sortsupport.h.

Reviewing 0001, I'm happy to see us add bswap64, but I'm not sure we
should put it in c.h, because that's included by absolutely
everything.  How about putting it in a new #include inside src/port,
like src/port/pg_bswap.h?  Then pg_crc.h can include that, but other
things can, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

06 October 2015, 20:09:47

On Tue, Oct 6, 2015 at 1:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Reviewing 0001, I'm happy to see us add bswap64, but I'm not sure we
> should put it in c.h, because that's included by absolutely
> everything.  How about putting it in a new #include inside src/port,
> like src/port/pg_bswap.h?  Then pg_crc.h can include that, but other
> things can, too.

I guess I imagined that bswap64() was fundamental infrastructure, but
on second thought that's not actually in evidence -- it is not already
needed in plenty of other places. So yeah, works for me.

-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

06 October 2015, 20:16:16

On Tue, Oct 6, 2015 at 4:09 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Oct 6, 2015 at 1:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Reviewing 0001, I'm happy to see us add bswap64, but I'm not sure we
>> should put it in c.h, because that's included by absolutely
>> everything.  How about putting it in a new #include inside src/port,
>> like src/port/pg_bswap.h?  Then pg_crc.h can include that, but other
>> things can, too.
>
> I guess I imagined that bswap64() was fundamental infrastructure, but
> on second thought that's not actually in evidence -- it is not already
> needed in plenty of other places. So yeah, works for me.

If you would care to revise the patch accordingly, I will commit it
(barring objections from others, of course).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

06 October 2015, 20:26:20

On Tue, Oct 6, 2015 at 1:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I guess I imagined that bswap64() was fundamental infrastructure, but
>> on second thought that's not actually in evidence -- it is not already
>> needed in plenty of other places. So yeah, works for me.
>
> If you would care to revise the patch accordingly, I will commit it
> (barring objections from others, of course).

Sure. It might take me a couple of days to get around to it, though --
things are a bit hectic here.

Thanks
-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

06 October 2015, 20:38:26

On Tue, Oct 6, 2015 at 4:26 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Oct 6, 2015 at 1:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I guess I imagined that bswap64() was fundamental infrastructure, but
>>> on second thought that's not actually in evidence -- it is not already
>>> needed in plenty of other places. So yeah, works for me.
>>
>> If you would care to revise the patch accordingly, I will commit it
>> (barring objections from others, of course).
>
> Sure. It might take me a couple of days to get around to it, though --
> things are a bit hectic here.

I know the feeling.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

08 October 2015, 00:09:38

On Tue, Oct 6, 2015 at 1:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> If you would care to revise the patch accordingly, I will commit it
> (barring objections from others, of course).

Here is a revision of 0001-*, with both BSWAP32() and BSWAP64() in a
new header, src/port/pg_bswap.h.

No revisions were required to any other patch in the patch series to
make this work, and so I only include a revised 0001-*.

--
Peter Geoghegan

Attachment

0001-Provide-for-unsigned-comparisons-of-abbreviated-keys.patch

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

08 October 2015, 17:13:42

On Wed, Oct 7, 2015 at 8:09 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Oct 6, 2015 at 1:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> If you would care to revise the patch accordingly, I will commit it
>> (barring objections from others, of course).
>
> Here is a revision of 0001-*, with both BSWAP32() and BSWAP64() in a
> new header, src/port/pg_bswap.h.
>
> No revisions were required to any other patch in the patch series to
> make this work, and so I only include a revised 0001-*.

Great.  I've committed that, minus the sortsupport.h changes which I
think should be part of 0002, and which in any case I'd like to
discuss a bit more.  It seems to me that (1) ABBREV_STRING_UINT isn't
a great name for this and (2) the comment is awfully long for the
thing to which it refers.  I suggest that we instead call it
DatumToBigEndian(), put it pg_bswap.h, and change the comments to
something like this:

/** Rearrange the bytes of a Datum into big-endian order.** One possible application of this macro is to make
comparisons
cheaper.  An integer* comparison of the new Datums will return the same result as a memcmp() on the* original Datums,
butthe integer comparison should be much cheaper.*/

The specific way that this is used by various sortsupport routines can
be adequately explained in the comments for those routines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

08 October 2015, 18:07:23

On Thu, Oct 8, 2015 at 10:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> It seems to me that (1) ABBREV_STRING_UINT isn't
> a great name for this and (2) the comment is awfully long for the
> thing to which it refers.  I suggest that we instead call it
> DatumToBigEndian(), put it pg_bswap.h, and change the comments to
> something like this:
>
> /*
>  * Rearrange the bytes of a Datum into big-endian order.
>  *
>  * One possible application of this macro is to make comparisons
> cheaper.  An integer
>  * comparison of the new Datums will return the same result as a memcmp() on the
>  * original Datums, but the integer comparison should be much cheaper.
>  */
>
> The specific way that this is used by various sortsupport routines can
> be adequately explained in the comments for those routines.

This is pretty clearly something specific to SortSupport. I'm not
opposed to changing the name and making the comments more terse along
those lines, but I think it should live in sortsupport.h. The macro
byteswaps datums on little-endian platforms only, which seems very
specific.

I think that we're going to have SortSupport with abbreviation for
UUIDs and bytea at some point, and maybe character(n). Centralizing
information about this to sortsupport.h makes sense to me.

-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

08 October 2015, 18:37:07

On Thu, Oct 8, 2015 at 2:07 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Oct 8, 2015 at 10:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It seems to me that (1) ABBREV_STRING_UINT isn't
>> a great name for this and (2) the comment is awfully long for the
>> thing to which it refers.  I suggest that we instead call it
>> DatumToBigEndian(), put it pg_bswap.h, and change the comments to
>> something like this:
>>
>> /*
>>  * Rearrange the bytes of a Datum into big-endian order.
>>  *
>>  * One possible application of this macro is to make comparisons
>> cheaper.  An integer
>>  * comparison of the new Datums will return the same result as a memcmp() on the
>>  * original Datums, but the integer comparison should be much cheaper.
>>  */
>>
>> The specific way that this is used by various sortsupport routines can
>> be adequately explained in the comments for those routines.
>
> This is pretty clearly something specific to SortSupport. I'm not
> opposed to changing the name and making the comments more terse along
> those lines, but I think it should live in sortsupport.h. The macro
> byteswaps datums on little-endian platforms only, which seems very
> specific.
>
> I think that we're going to have SortSupport with abbreviation for
> UUIDs and bytea at some point, and maybe character(n). Centralizing
> information about this to sortsupport.h makes sense to me.

I'm not convinced.  Doesn't this exact same concept get used for
over-the-wire communication between BE and LE machines?  There, this
operation is spelled htonl/ntohl.  Some systems even have htonll, but
I'm sure there are still a bunch that don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

09 October 2015, 00:20:09

On Thu, Oct 8, 2015 at 11:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I'm not convinced.  Doesn't this exact same concept get used for
> over-the-wire communication between BE and LE machines?  There, this
> operation is spelled htonl/ntohl.  Some systems even have htonll, but
> I'm sure there are still a bunch that don't.

I continue to disagree with that. The spelling of the macro that you
propose suggests that this process occurs at a relatively high level
of abstraction, which is misleading. Datums that have abbreviated key
bytes packed into them are in general kind of special. All the same,
here is a revision of the patch series along those lines. I'll also
have to update the UUID patch to independently note the same issues.

I should point out that I did not call the macro DatumToBigEndian(),
because it's actually the other way around. I called it
DatumToLittleEndian(), since the unsigned integer comparator would
work correctly on big-endian systems without calling any new macro
(which is of course why the new macro does nothing on big-endian
systems). We start off with a big endian Datum/unsigned integer on all
platforms, and then we byteswap it to make it a little-endian unsigned
integer if and when that's required (i.e. only on little-endian
systems).

--
Peter Geoghegan

Attachment

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

09 October 2015, 18:44:17

On Thu, Oct 8, 2015 at 8:20 PM, Peter Geoghegan <pg@heroku.com> wrote:
> I should point out that I did not call the macro DatumToBigEndian(),
> because it's actually the other way around. I called it
> DatumToLittleEndian(), since the unsigned integer comparator would
> work correctly on big-endian systems without calling any new macro
> (which is of course why the new macro does nothing on big-endian
> systems). We start off with a big endian Datum/unsigned integer on all
> platforms, and then we byteswap it to make it a little-endian unsigned
> integer if and when that's required (i.e. only on little-endian
> systems).

Hmm.  But then this doesn't seem to make much sense:

+ * Rearrange the bytes of a Datum into little-endian order from big-endian
+ * order.  On big-endian machines, this does nothing at all.

Rearranging bytes into little-endian order ought to be a no-op on a
little-endian machine; and rearranging them into big-endian order
ought to be a no-op on a big-endian machine.

Thinking about this a bit more, it seems like the situation we're in
here is that the input datum is always going to be big-endian.
Regardless of what the machine's integer format is, the sortsupport
abbreviator is going to output a Datum where the most significant byte
is the first one stored in the datum.  We want to convert that Datum
to one that has *native* endianness.  So maybe we should call this
DatumBigEndianToNative or something like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

09 October 2015, 18:48:58

On Fri, Oct 9, 2015 at 11:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Hmm.  But then this doesn't seem to make much sense:
>
> + * Rearrange the bytes of a Datum into little-endian order from big-endian
> + * order.  On big-endian machines, this does nothing at all.
>
> Rearranging bytes into little-endian order ought to be a no-op on a
> little-endian machine; and rearranging them into big-endian order
> ought to be a no-op on a big-endian machine.

I think that that's very clearly implied anyway.

> Thinking about this a bit more, it seems like the situation we're in
> here is that the input datum is always going to be big-endian.
> Regardless of what the machine's integer format is, the sortsupport
> abbreviator is going to output a Datum where the most significant byte
> is the first one stored in the datum.  We want to convert that Datum
> to one that has *native* endianness.  So maybe we should call this
> DatumBigEndianToNative or something like that.

I'd be fine with DatumBigEndianToNative() -- I agree that that's
slightly better.

-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

09 October 2015, 19:11:45

On Fri, Oct 9, 2015 at 2:48 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 9, 2015 at 11:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Hmm.  But then this doesn't seem to make much sense:
>>
>> + * Rearrange the bytes of a Datum into little-endian order from big-endian
>> + * order.  On big-endian machines, this does nothing at all.
>>
>> Rearranging bytes into little-endian order ought to be a no-op on a
>> little-endian machine; and rearranging them into big-endian order
>> ought to be a no-op on a big-endian machine.
>
> I think that that's very clearly implied anyway.
>
>> Thinking about this a bit more, it seems like the situation we're in
>> here is that the input datum is always going to be big-endian.
>> Regardless of what the machine's integer format is, the sortsupport
>> abbreviator is going to output a Datum where the most significant byte
>> is the first one stored in the datum.  We want to convert that Datum
>> to one that has *native* endianness.  So maybe we should call this
>> DatumBigEndianToNative or something like that.
>
> I'd be fine with DatumBigEndianToNative() -- I agree that that's
> slightly better.

OK, committed that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

09 October 2015, 19:33:57

On Fri, Oct 9, 2015 at 12:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> OK, committed that way.

Thank you.


-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

09 October 2015, 21:32:03

On Fri, Oct 9, 2015 at 12:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> OK, committed that way.

Just for the record, with the same "cities" table as my original post
to this thread, this query:

select count(distinct(city)) from cities;

Goes from taking about 296ms (once it stabilizes), to about 265ms
(once it stabilizes) following today's commit of just the unsigned
integer comparison patch. I've shaved just over 10% off the duration
of this representative sort-heavy query (against a 9.5 baseline),
which is nice.

-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

09 October 2015, 23:39:59

On Fri, Oct 9, 2015 at 3:33 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 9, 2015 at 12:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> OK, committed that way.
>
> Thank you.

You're welcome.  After some study and experimentation, I've committed
the other part also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

09 October 2015, 23:50:06

On Fri, Oct 9, 2015 at 4:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> You're welcome.  After some study and experimentation, I've committed
> the other part also.

Fantastic. I guess the precedent of the 9.5 text equality fast path
made this discussion way smoother than last time, since essentially
the same principle applies.

-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

10 October 2015, 00:54:43

On Fri, Oct 9, 2015 at 7:49 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 9, 2015 at 4:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> You're welcome.  After some study and experimentation, I've committed
>> the other part also.
>
> Fantastic. I guess the precedent of the 9.5 text equality fast path
> made this discussion way smoother than last time, since essentially
> the same principle applies.

I think that is true.  I spent some time thinking about whether the
way you used INT_MIN as a sentinel value should be changed around
somehow, but ultimately I decided that it wasn't too bad and that
suggesting something else would be pointless kibitzing.  I also tried
to think of scenarios in which this would lose, and I'm not totally
convinced that there aren't any, but I'm convinced that, if they
exist, I don't know what they are.  Since the patch did deliver a
small improvement on my test cases and on yours, I think we might as
well have it in the tree.  If some pathological scenario shows up
where it turns out to hurt, we can always fix it then, or revert if it
need be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

10 October 2015, 19:32:41

On Fri, Oct 9, 2015 at 5:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think that is true.  I spent some time thinking about whether the
> way you used INT_MIN as a sentinel value should be changed around
> somehow, but ultimately I decided that it wasn't too bad and that
> suggesting something else would be pointless kibitzing.  I also tried
> to think of scenarios in which this would lose, and I'm not totally
> convinced that there aren't any, but I'm convinced that, if they
> exist, I don't know what they are.  Since the patch did deliver a
> small improvement on my test cases and on yours, I think we might as
> well have it in the tree.  If some pathological scenario shows up
> where it turns out to hurt, we can always fix it then, or revert if it
> need be.

That seems very reasonable.

I noticed that there is still one comment that I really should have
removed as part of this work. The comment didn't actually add any new
information for 9.5, but is now obsolete. Attached patch removes it
entirely.

--
Peter Geoghegan

Attachment

remove_text_comment.patch

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

12 October 2015, 07:47:19

On Sat, Oct 10, 2015 at 12:32 PM, Peter Geoghegan <pg@heroku.com> wrote:
> I noticed that there is still one comment that I really should have
> removed as part of this work.

I also noticed that I failed to reset the last_returned strcoll()
cache variable as part of an abbreviation call, despite the fact that
tapesort may freely interleave conversions with comparisons, while
reusing buf1 and buf2 both as scratch space for strxfrm() blobs, as
well as for storing strings to be compared with strcoll(). I suggest
that the attached patch also be applied to fix this issue.

--
Peter Geoghegan

Attachment

reset_last_returned.patch

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

12 October 2015, 19:31:42

On Mon, Oct 12, 2015 at 12:47 AM, Peter Geoghegan <pg@heroku.com> wrote:
> I also noticed that I failed to reset the last_returned strcoll()
> cache variable as part of an abbreviation call, despite the fact that
> tapesort may freely interleave conversions with comparisons, while
> reusing buf1 and buf2 both as scratch space for strxfrm() blobs, as
> well as for storing strings to be compared with strcoll(). I suggest
> that the attached patch also be applied to fix this issue.

I think that I jumped the gun with this fix, because theoretically you
can still get the same problem in the opposite direction -- an
original string treated as a strxfrm() blob when the cache is
consulted.

I'll consider a more comprehensive fix.

-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

12 October 2015, 23:15:34

On Mon, Oct 12, 2015 at 3:31 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Oct 12, 2015 at 12:47 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> I also noticed that I failed to reset the last_returned strcoll()
>> cache variable as part of an abbreviation call, despite the fact that
>> tapesort may freely interleave conversions with comparisons, while
>> reusing buf1 and buf2 both as scratch space for strxfrm() blobs, as
>> well as for storing strings to be compared with strcoll(). I suggest
>> that the attached patch also be applied to fix this issue.
>
> I think that I jumped the gun with this fix, because theoretically you
> can still get the same problem in the opposite direction -- an
> original string treated as a strxfrm() blob when the cache is
> consulted.
>
> I'll consider a more comprehensive fix.

This is not the first time I've committed one of your patches and
promptly received a whole series of emails with fixes for what went
into that commit, despite the fact that I have not and did not change
the relevant parts of the patch while committing.  Since that makes
more work for me, I am not a huge fan, and the practical effect is to
subtract from the amount of time that could otherwise have been spent
reviewing your next patch (or someone else's).  In this case, I think
the best thing for me to do right now is wait to commit anything
further until you have had a chance to go over this and come up with a
fix or set of fixes that you think are completely, 100% ready to go;
or else until you get to the point of wanting feedback before
proceeding further.  In general, I would appreciate if this sort of
post-commit self-review could be done pre-commit.

I apologize for the fact that this email probably sounds grouchy.
Please try to read it in the most positive light possible (and don't
shoot me).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

12 October 2015, 23:54:36

On Mon, Oct 12, 2015 at 4:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> In this case, I think
> the best thing for me to do right now is wait to commit anything
> further until you have had a chance to go over this and come up with a
> fix or set of fixes that you think are completely, 100% ready to go;
> or else until you get to the point of wanting feedback before
> proceeding further.  In general, I would appreciate if this sort of
> post-commit self-review could be done pre-commit.

This came to me while I was making dinner last night - I was not
actually poring over the code on my computer. I don't know why I
thought of it then and not at any earlier point, but it seems
reasonable to suppose it had something to do with perspective, or
being able to see a bigger picture that is difficult to keep in mind
during initial development and self review. I was mostly thinking
about external sorting at the time, and not this patch.

I cannot do that on demand. It isn't a matter of effort. When I come
back from vacation at the end of the month, I may be able to do
somewhat better for a few months.

-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

14 October 2015, 23:05:14

On Mon, Oct 12, 2015 at 12:31 PM, Peter Geoghegan <pg@heroku.com> wrote:
> I'll consider a more comprehensive fix.

I attach a revised fix that considers the problem of misinterpreting
the contents of the buffers in both directions.

Thanks
--
Peter Geoghegan

Attachment

0001-Correctly-distinguish-the-contents-of-text-buffers.patch

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

14 October 2015, 23:10:04

On Wed, Oct 14, 2015 at 7:05 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Oct 12, 2015 at 12:31 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> I'll consider a more comprehensive fix.
>
> I attach a revised fix that considers the problem of misinterpreting
> the contents of the buffers in both directions.

I wonder if it wouldn't be better to just add a separate Boolean
indicating exactly the thing we care about.  This doesn't seem
particularly easy to understand and verify.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

14 October 2015, 23:11:19

On Wed, Oct 14, 2015 at 4:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I wonder if it wouldn't be better to just add a separate Boolean
> indicating exactly the thing we care about.  This doesn't seem
> particularly easy to understand and verify.

I'm not really sure that that's an improvement. But I defer to you.

-- 
Peter Geoghegan

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

15 October 2015, 16:07:33

On Wed, Oct 14, 2015 at 7:11 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Oct 14, 2015 at 4:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I wonder if it wouldn't be better to just add a separate Boolean
>> indicating exactly the thing we care about.  This doesn't seem
>> particularly easy to understand and verify.
>
> I'm not really sure that that's an improvement. But I defer to you.

Would you be willing to try coding it up that way so we can see how it looks?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Peter Geoghegan

Date:

18 October 2015, 18:52:42

On Thu, Oct 15, 2015 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Would you be willing to try coding it up that way so we can see how it looks?

I attach a patch that does it that way.

Note that I will be away for until late this month. Do not expect a
response from me before then, unless it's truly urgent.

Thanks

--
Peter Geoghegan

Attachment

0001-Correctly-distinguish-the-contents-of-text-buffers.patch

Re: More work on SortSupport for text - strcoll() and strxfrm() caching

From

Robert Haas

Date:

20 October 2015, 13:41:54

On Sun, Oct 18, 2015 at 2:52 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Oct 15, 2015 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Would you be willing to try coding it up that way so we can see how it looks?
>
> I attach a patch that does it that way.

That looks good to me.  I've committed this, and the other patch to
remove the obsolete comment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company