Thread: Collation order for btree-indexable datatypes

Collation order for btree-indexable datatypes

From
Tom Lane
Date:
To avoid getting into states where a btree index is corrupt (or appears
that way), it is absolutely critical that the datatype provide a unique,
consistent sort order.  In particular, the operators = <> < <= > >= had
better all agree with each other and with the 3-way-comparison support
function about the ordering of any two non-NULL data values.

After tracing some Assert failures in the new planner statistics code
I'm working on, I have realized that several of our existing datatypes
fail to meet this fundamental requirement, and therefore are prone to
serious misbehavior when trying to index "weird" values.  In particular,
type NUMERIC does not return consistent results for comparisons
involving "NaN" values, and several of the date/time types do not return
consistent results for comparisons involving "INVALID" values.
(Example: numeric_cmp will assert that two NaNs are equal, whereas
numeric_eq will assert that they aren't.  Worse, numeric_cmp will assert
that a NaN is equal to any non-NaN, too.  The date/time routines avoid
the latter mistake but make the former one.)

I am planning to fix this by ensuring that all these operations agree
on an (arbitrarily chosen) sort order for the "weird" values of these
types.  What I'm wondering about is whether to insert the fixes into
7.1.1 or wait for 7.2.  In theory changing the sort order might break
existing user indexes, and should therefore be avoided until an initdb
is needed.  But: any indexes that contain these values are likely broken
already, since in fact we don't have a well-defined sort order right now
for these values.

A closely related problem is that the "current time" special value
supported by several of the date/time datatypes is inherently not
compatible with being indexed, since its sort order relative to
ordinary time values keeps changing.  We had discussed removing this
special case, and I think agreed to do so, but it hasn't happened yet.

What I'm inclined to do is force consistency of the comparison operators
now (for 7.1.1) and then remove "current time" for 7.2, but perhaps it'd
be better to leave the whole can of worms alone until 7.2.  Comments
anyone?
        regards, tom lane


Re: Collation order for btree-indexable datatypes

From
Stephan Szabo
Date:
> I am planning to fix this by ensuring that all these operations agree
> on an (arbitrarily chosen) sort order for the "weird" values of these
> types.  What I'm wondering about is whether to insert the fixes into
> 7.1.1 or wait for 7.2.  In theory changing the sort order might break
> existing user indexes, and should therefore be avoided until an initdb
> is needed.  But: any indexes that contain these values are likely broken
> already, since in fact we don't have a well-defined sort order right now
> for these values.

> What I'm inclined to do is force consistency of the comparison operators
> now (for 7.1.1) and then remove "current time" for 7.2, but perhaps it'd
> be better to leave the whole can of worms alone until 7.2.  Comments
> anyone?

Assuming that the changes are reasonably safe, I think you're
right.  What parts of the changes would require an initdb, would new
functions need to be added or the index ops need to change or would
it be fixes to the existing functions (if the latter, wouldn't a recompile
and dropping/recreating the indexes be enough?)



Re: Collation order for btree-indexable datatypes

From
Bruce Momjian
Date:
> A closely related problem is that the "current time" special value
> supported by several of the date/time datatypes is inherently not
> compatible with being indexed, since its sort order relative to
> ordinary time values keeps changing.  We had discussed removing this
> special case, and I think agreed to do so, but it hasn't happened yet.
> 
> What I'm inclined to do is force consistency of the comparison operators
> now (for 7.1.1) and then remove "current time" for 7.2, but perhaps it'd
> be better to leave the whole can of worms alone until 7.2.  Comments
> anyone?

Comparing NaN/Invalid seems so off the beaten path that we would just
wait for 7.2.  That and no one has reported a problem with it so far.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Collation order for btree-indexable datatypes

From
Stephan Szabo
Date:
On Wed, 2 May 2001, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > What parts of the changes would require an initdb, would new
> > functions need to be added or the index ops need to change or would
> > it be fixes to the existing functions (if the latter, wouldn't a recompile
> > and dropping/recreating the indexes be enough?)
> 
> Yes, dropping and recreating any user indexes that contain the problematic
> values would be sufficient to get you out of trouble.  We don't need any
> system catalog changes for this, AFAICS.

Looking back, I misread the original message.  I thought you were saying
that it needed to wait for an initdb and so would be bad in a dot release,
but it was just the breaking of indexes thing, but since they're already
pretty broken, I don't see much of a loss by fixing it.




Re: Collation order for btree-indexable datatypes

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> What parts of the changes would require an initdb, would new
> functions need to be added or the index ops need to change or would
> it be fixes to the existing functions (if the latter, wouldn't a recompile
> and dropping/recreating the indexes be enough?)

Yes, dropping and recreating any user indexes that contain the problematic
values would be sufficient to get you out of trouble.  We don't need any
system catalog changes for this, AFAICS.
        regards, tom lane


Re: Collation order for btree-indexable datatypes

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Comparing NaN/Invalid seems so off the beaten path that we would just
> wait for 7.2.  That and no one has reported a problem with it so far.

Do you consider "vacuum analyze" on the regression database to be
off the beaten path?  How about creating an index on a numeric column
that contains NaNs, or a timestamp column that contains Invalid?

Unless you believe these values are not being used in the field at all,
there's a problem.  (And if you do believe that, you shouldn't be
worried about my changing their behavior ;-))
        regards, tom lane


Re: Collation order for btree-indexable datatypes

From
Bruce Momjian
Date:
If you feel strongly about it, go ahead.  I didn't see any problem
reports on it, and it seemed kind of iffy, so I thought we should hold
it.

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Comparing NaN/Invalid seems so off the beaten path that we would just
> > wait for 7.2.  That and no one has reported a problem with it so far.
> 
> Do you consider "vacuum analyze" on the regression database to be
> off the beaten path?  How about creating an index on a numeric column
> that contains NaNs, or a timestamp column that contains Invalid?
> 
> Unless you believe these values are not being used in the field at all,
> there's a problem.  (And if you do believe that, you shouldn't be
> worried about my changing their behavior ;-))
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026