Re: The dangers of streaming across versions of glibc: A cautionary tale - Mailing list pgsql-general

From Matthew Kelly
Subject Re: The dangers of streaming across versions of glibc: A cautionary tale
Date
Msg-id 1A97C54F-1AF3-4C11-8EEA-F53EEFD5B91F@tripadvisor.com
Whole thread Raw
In response to Re: The dangers of streaming across versions of glibc: A cautionary tale  (Peter Geoghegan <peter.geoghegan86@gmail.com>)
Responses Re: The dangers of streaming across versions of glibc: A cautionary tale  (Bruce Momjian <bruce@momjian.us>)
List pgsql-general
We are currently running with the en_US.UTF-8 collation.  It was a decision made long ago, and seeing as we never
actuallyrely on the sort order of internationalized strings (other than for stability, apparently), we have never had
anymotivation to change this practice. 

Some way of versioning collations, which is not tied to glibc seems immensely appealing.  Without a good way of testing
thedifferences between glibc sort versions, it seems the only safe thing to do at the moment is to guarantee all
streamingreplica's run from the exact same OS image.  Which is fine until you want to upgrade your OS, and need to do a
dump-restoreinstead of being able to do that in a rolling fashion. 



To Bruce's point the way I was able to test for this issue in a particular index was (approximately):
--Assuming textfield is what the index is on, this causes the query planner to scan the index and give the position in
theindex. 
CREATE TABLE index_order (SELECT textfield, dense_rank() OVER (ORDER BY textfield) as i_order FROM table);
--No index here, postgres must sort
CREATE TABLE both_order as (SELECT textfield, i_order, dense_rank() OVER (ORDER BY textfield) as sort_order FROM
index_order);
-- If this doesn't return zero, you have a problem
SELECT count(*) FROM both_orders WHERE i_order <> sort_order;

This method is really slow on a big table, and I'm not going to promise it always works, but that is how we found the
rootcause. 

- Matt K

On Aug 6, 2014, at 9:46 PM, Peter Geoghegan <peter.geoghegan86@gmail.com>
 wrote:

> On Wed, Aug 6, 2014 at 6:30 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> Another idea could be having our own collation data to isolate any
>> changes from outside world. I vaguley recall this had been discussed
>> before.
>
> That's probably the best solution. It would not be the first time that
> we decided to stop relying on the operating system's facilities due to
> various problems (e.g. we used to use the C standard library qsort()
> until about 2006). The only problem is that it's a lot of work. One
> possible solution that has been proposed is to adopt ICU [1]. That
> might allow us to say "this is the official way that PostgreSQL 9.6
> sorts Japanese; you may use the old way if you want, but it's
> incompatible with the new way". ICU would give us a standard
> versioning interface [2]. They seem to take this seriously, and are
> aware of our considerations around B-Tree indexes on text.
>
> [1] https://wiki.postgresql.org/wiki/Todo:ICU
> [2] http://userguide.icu-project.org/collation/architecture#TOC-Versioning
> --
> Regards,
> Peter Geoghegan



pgsql-general by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: order by question
Next
From: Steve Clark
Date:
Subject: Re: order by question