Re: The dangers of streaming across versions of glibc: A cautionary tale - Mailing list pgsql-general

From Bruce Momjian
Subject Re: The dangers of streaming across versions of glibc: A cautionary tale
Date
Msg-id 20140807164614.GC14724@momjian.us
Whole thread Raw
In response to Re: The dangers of streaming across versions of glibc: A cautionary tale  (Matthew Kelly <mkelly@tripadvisor.com>)
Responses Re: The dangers of streaming across versions of glibc: A cautionary tale  (Peter Geoghegan <peter.geoghegan86@gmail.com>)
List pgsql-general
On Thu, Aug  7, 2014 at 03:07:04PM +0000, Matthew Kelly wrote:
> We are currently running with the en_US.UTF-8 collation.  It was a decision made long ago, and seeing as we never
actuallyrely on the sort order of internationalized strings (other than for stability, apparently), we have never had
anymotivation to change this practice. 
>
> Some way of versioning collations, which is not tied to glibc seems immensely appealing.  Without a good way of
testingthe differences between glibc sort versions, it seems the only safe thing to do at the moment is to guarantee
allstreaming replica's run from the exact same OS image.  Which is fine until you want to upgrade your OS, and need to
doa dump-restore instead of being able to do that in a rolling fashion. 
>
>
>
> To Bruce's point the way I was able to test for this issue in a particular index was (approximately):
> --Assuming textfield is what the index is on, this causes the query planner to scan the index and give the position
inthe index. 
> CREATE TABLE index_order (SELECT textfield, dense_rank() OVER (ORDER BY textfield) as i_order FROM table);
> --No index here, postgres must sort
> CREATE TABLE both_order as (SELECT textfield, i_order, dense_rank() OVER (ORDER BY textfield) as sort_order FROM
index_order);
> -- If this doesn't return zero, you have a problem
> SELECT count(*) FROM both_orders WHERE i_order <> sort_order;
>
> This method is really slow on a big table, and I'm not going to promise it always works, but that is how we found the
rootcause. 

We could walk the index looking for inconsistent btree splits, e.g. the
split doesn't match the ordering returned by the existing collation
functions.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


pgsql-general by date:

Previous
From: Steve Clark
Date:
Subject: Re: order by question
Next
From: Shaun Thomas
Date:
Subject: Re: dump/restore with a hidden dependency?