numeric_normalize() is a few bricks shy of a load - Mailing list pgsql-hackers

From Tom Lane
Subject numeric_normalize() is a few bricks shy of a load
Date
Msg-id 24216.1415229566@sss.pgh.pa.us
Whole thread Raw
Responses Re: numeric_normalize() is a few bricks shy of a load  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
9.4 introduced a function numeric_normalize() whose charter is to
produce a string representing a numeric, such that two numerics' string
representations will compare equal if and only if the numeric values
compare equal.  (This is used, so far, only by GIN indexes on JSONB.)
Thomas Fanghaenel pointed out to me that this function is utterly,
completely broken: it will reduce distinct values such as 1 and 1000
to the same string, and what's much worse for the GIN use-case,
it can reduce values that should compare equal to distinct strings.
Here's a test case demonstrating the latter assertion:

regression=# create table t1 (f1 jsonb);
CREATE TABLE
regression=# insert into t1 values('{"x": 12000}');
INSERT 0 1
regression=# insert into t1 values('{"x": 12000.00}');
INSERT 0 1
regression=# select * from t1 where f1 @> '{"x": 12000}';      f1        
-----------------{"x": 12000}{"x": 12000.00}
(2 rows)

regression=# create index on t1 using gin (f1);
CREATE INDEX
regression=# set enable_seqscan TO 0;
SET
regression=# select * from t1 where f1 @> '{"x": 12000}';     f1      
--------------{"x": 12000}
(1 row)

regression=# select * from t1 where f1 @> '{"x": 12000.00}';      f1        
-----------------{"x": 12000.00}
(1 row)

Since JSONB GIN indexes are always considered lossy, the
reduction-to-same-string case is masked by index rechecks, resulting
only in an inefficient index that fails to distinguish keys it could
distinguish.  However, the reduction-to-different-strings case results
in wrong answers, as illustrated above.

I think there's no choice but to fix this in 9.4.  The only reason
it even needs discussion is that this would invalidate index entries
in beta testers' JSONB GIN indexes, such that queries would not find
entries that they did find before.  I'm not sure we can do anything
about this except recommend that beta testers REINDEX such indexes
after updating to 9.4next.

Thoughts?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Amazon Redshift
Next
From: Andres Freund
Date:
Subject: Re: WIP: dynahash replacement for buffer table