Re: Reverse collations (initially for making keyset pagination cover more cases) - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Reverse collations (initially for making keyset pagination cover more cases)
Date
Msg-id 87v9ri14pw.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: Reverse collations (initially for making keyset pagination covermore cases)  (David Fetter <david@fetter.org>)
Responses Re: Reverse collations (initially for making keyset pagination covermore cases)  (David Fetter <david@fetter.org>)
List pgsql-hackers
>>>>> "David" == David Fetter <david@fetter.org> writes:

First, in testing the patch I found there were indeed some missing
cases: the sortsupport version of the comparator needs to be fixed too.
I attach a draft addition to your patch, you should probably look at
adding test cases that need this to work.

 David> (a, b, c) < ($1, $2 COLLATE "C_backwards", $3)
 David> ...
 David> ORDER BY a, b DESC, c

That would have to be:

 WHERE (a, b COLLATE "C_backwards", c) < ($1, $2, $3)
 ...
 ORDER BY a, b COLLATE "C_backwards", c

Adding the below patch to yours, I can get this on the regression test
db (note that this is a -O0 asserts build, timings may be slow relative
to a production build):

create collation "C_rev" ( LOCALE = "C", REVERSE = true );
create index on tenk1 (hundred, (stringu1::text collate "C_rev"), string4);

explain analyze
  select hundred, stringu1::text, string4
    from tenk1
   where (hundred, stringu1::text COLLATE "C_rev", string4)
           > (10, 'WKAAAA', 'VVVVxx')
   order by hundred, (stringu1::text collate "C_rev"), string4
   limit 5;
                                                                       QUERY PLAN
                               
 

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..1.28 rows=5 width=132) (actual time=0.029..0.038 rows=5 loops=1)
   ->  Index Scan using tenk1_hundred_stringu1_string4_idx on tenk1  (cost=0.29..1768.49 rows=8900 width=132) (actual
time=0.028..0.036rows=5 loops=1)
 
         Index Cond: (ROW(hundred, ((stringu1)::text)::text, string4) > ROW(10, 'WKAAAA'::text, 'VVVVxx'::name))
 Planning Time: 0.225 ms
 Execution Time: 0.072 ms
(5 rows)

and I checked the results, and they look correct now.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 02cbcbd23d..61ab9720c5 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -84,6 +84,7 @@ typedef struct
     int            last_returned;    /* Last comparison result (cache) */
     bool        cache_blob;        /* Does buf2 contain strxfrm() blob, etc? */
     bool        collate_c;
+    bool        reverse;
     Oid            typid;            /* Actual datatype (text/bpchar/bytea/name) */
     hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
     hyperLogLogState full_card; /* Full key cardinality state */
@@ -2090,6 +2091,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
         /* Initialize */
         sss->last_returned = 0;
         sss->locale = locale;
+        sss->reverse = (locale != 0) && locale->reverse;
 
         /*
          * To avoid somehow confusing a strxfrm() blob and an original string,
@@ -2401,6 +2403,9 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
         (!sss->locale || sss->locale->deterministic))
         result = strcmp(sss->buf1, sss->buf2);
 
+    if (sss->reverse)
+        INVERT_COMPARE_RESULT(result);
+
     /* Cache result, perhaps saving an expensive strcoll() call next time */
     sss->cache_blob = false;
     sss->last_returned = result;
@@ -2663,6 +2668,13 @@ done:
      */
     res = DatumBigEndianToNative(res);
 
+    /*
+     * Account for reverse-ordering locales by flipping the bits. Note that
+     * Datum is an unsigned type (uintptr_t).
+     */
+    if (sss->reverse)
+        res ^= ~(Datum)0;
+
     /* Don't leak memory here */
     if (PointerGetDatum(authoritative) != original)
         pfree(authoritative);

pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: Reverse collations (initially for making keyset pagination covermore cases)
Next
From: Alvaro Herrera
Date:
Subject: Re: Invisible PROMPT2