Home > mailing lists

Re: Reducing tuple overhead - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Reducing tuple overhead
Date	June 7, 2015 11:05:01
Msg-id	CAM3SWZS0GyUaiFx97oYJuirmcW1MsojmEAtoEF7WCgxdppNOXg@mail.gmail.com Whole thread Raw
In response to	Re: Reducing tuple overhead (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Apr 30, 2015 at 6:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> The other, related problem is that the ordering operator might start
> to return different results than it did at index creation time.  For
> example, consider a btree index built on a text column.  Now consider
> 'yum update'.  glibc gets updated, collation ordering of various
> strings change, and now you've got tuples that are in the "wrong
> place" in the index, because when the index was built, we thought A <
> B, but now we think B < A.  You would think the glibc maintainers
> might avoid such changes in minor releases, or that the Red Hat guys
> would avoid packaging and shipping those changes in minor releases,
> but you'd be wrong.

I would not think that. Unicode Technical Standard #10 states:

"""
Collation order is not fixed.

Over time, collation order will vary: there may be fixes needed as
more information becomes available about languages; there may be new
government or industry standards for the language that require
changes; and finally, new characters added to the Unicode Standard
will interleave with the previously-defined ones. This means that
collations must be carefully versioned.
"""

Also, in the paper "Modern B-Tree Techniques", by Goetz Graefe, page
238, it states:

"""
In many operating systems, appropriate functions are provided to
compute a normalized key from a localized string value, date value, or
time value. This functionality is used, for example, to list files in
a directory as appropriate for the local language. Adding
normalization for numeric data types is relatively straightforward, as
is concatenation of multiple normalized values. Database code must not
rely on such operating system code, however. The problem with relying
on operating systems support for database indexes is the update
frequency. An operating system might update its normalization code due
to an error or extension in the code or in the definition of a local
sort order; it is unacceptable, however, if such an update silently
renders existing large database indexes incorrect.
"""

Unfortunately, it is simply not the case that we can rely on OS
collations being immutable. We have *no* contract with any C standard
library concerning collation stability whatsoever. I'm surprised that
we don't see hear more about this kind of corruption.
-- 
Peter Geoghegan

pgsql-hackers by date:

From: Amit Kapila
Date: 07 June 2015, 08:13:29
Subject: Re: Reducing tuple overhead

From: Simon Riggs
Date: 07 June 2015, 12:32:13
Subject: Re: Reducing tuple overhead

Re: Reducing tuple overhead - Mailing list pgsql-hackers

Previous

Next