Thread: numeric_normalize() is a few bricks shy of a load

numeric_normalize() is a few bricks shy of a load

From
Tom Lane
Date:
9.4 introduced a function numeric_normalize() whose charter is to
produce a string representing a numeric, such that two numerics' string
representations will compare equal if and only if the numeric values
compare equal.  (This is used, so far, only by GIN indexes on JSONB.)
Thomas Fanghaenel pointed out to me that this function is utterly,
completely broken: it will reduce distinct values such as 1 and 1000
to the same string, and what's much worse for the GIN use-case,
it can reduce values that should compare equal to distinct strings.
Here's a test case demonstrating the latter assertion:

regression=# create table t1 (f1 jsonb);
CREATE TABLE
regression=# insert into t1 values('{"x": 12000}');
INSERT 0 1
regression=# insert into t1 values('{"x": 12000.00}');
INSERT 0 1
regression=# select * from t1 where f1 @> '{"x": 12000}';      f1        
-----------------{"x": 12000}{"x": 12000.00}
(2 rows)

regression=# create index on t1 using gin (f1);
CREATE INDEX
regression=# set enable_seqscan TO 0;
SET
regression=# select * from t1 where f1 @> '{"x": 12000}';     f1      
--------------{"x": 12000}
(1 row)

regression=# select * from t1 where f1 @> '{"x": 12000.00}';      f1        
-----------------{"x": 12000.00}
(1 row)

Since JSONB GIN indexes are always considered lossy, the
reduction-to-same-string case is masked by index rechecks, resulting
only in an inefficient index that fails to distinguish keys it could
distinguish.  However, the reduction-to-different-strings case results
in wrong answers, as illustrated above.

I think there's no choice but to fix this in 9.4.  The only reason
it even needs discussion is that this would invalidate index entries
in beta testers' JSONB GIN indexes, such that queries would not find
entries that they did find before.  I'm not sure we can do anything
about this except recommend that beta testers REINDEX such indexes
after updating to 9.4next.

Thoughts?
        regards, tom lane



Re: numeric_normalize() is a few bricks shy of a load

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I think there's no choice but to fix this in 9.4.  The only reason
> it even needs discussion is that this would invalidate index entries
> in beta testers' JSONB GIN indexes, such that queries would not find
> entries that they did find before.  I'm not sure we can do anything
> about this except recommend that beta testers REINDEX such indexes
> after updating to 9.4next.
>
> Thoughts?

I'm not happy about it, but I agree with your assessment.  Beta testers
will survive (not exaclty the first time they've had to deal with such
changes..).  We surely can't ship it as-is.
Thanks,
    Stephen