Home > mailing lists

numeric_normalize() is a few bricks shy of a load - Mailing list pgsql-hackers

From	Tom Lane
Subject	numeric_normalize() is a few bricks shy of a load
Date	November 5, 2014 23:19:31
Msg-id	24216.1415229566@sss.pgh.pa.us Whole thread Raw
Responses	Re: numeric_normalize() is a few bricks shy of a load
List	pgsql-hackers

Tree view

9.4 introduced a function numeric_normalize() whose charter is to
produce a string representing a numeric, such that two numerics' string
representations will compare equal if and only if the numeric values
compare equal.  (This is used, so far, only by GIN indexes on JSONB.)
Thomas Fanghaenel pointed out to me that this function is utterly,
completely broken: it will reduce distinct values such as 1 and 1000
to the same string, and what's much worse for the GIN use-case,
it can reduce values that should compare equal to distinct strings.
Here's a test case demonstrating the latter assertion:

regression=# create table t1 (f1 jsonb);
CREATE TABLE
regression=# insert into t1 values('{"x": 12000}');
INSERT 0 1
regression=# insert into t1 values('{"x": 12000.00}');
INSERT 0 1
regression=# select * from t1 where f1 @> '{"x": 12000}';      f1        
-----------------{"x": 12000}{"x": 12000.00}
(2 rows)

regression=# create index on t1 using gin (f1);
CREATE INDEX
regression=# set enable_seqscan TO 0;
SET
regression=# select * from t1 where f1 @> '{"x": 12000}';     f1      
--------------{"x": 12000}
(1 row)

regression=# select * from t1 where f1 @> '{"x": 12000.00}';      f1        
-----------------{"x": 12000.00}
(1 row)

Since JSONB GIN indexes are always considered lossy, the
reduction-to-same-string case is masked by index rechecks, resulting
only in an inefficient index that fails to distinguish keys it could
distinguish.  However, the reduction-to-different-strings case results
in wrong answers, as illustrated above.

I think there's no choice but to fix this in 9.4.  The only reason
it even needs discussion is that this would invalidate index entries
in beta testers' JSONB GIN indexes, such that queries would not find
entries that they did find before.  I'm not sure we can do anything
about this except recommend that beta testers REINDEX such indexes
after updating to 9.4next.

Thoughts?
        regards, tom lane

pgsql-hackers by date:

From: Josh Berkus
Date: 05 November 2014, 23:15:37
Subject: Re: Amazon Redshift

From: Andres Freund
Date: 05 November 2014, 23:20:17
Subject: Re: WIP: dynahash replacement for buffer table

numeric_normalize() is a few bricks shy of a load - Mailing list pgsql-hackers

Previous

Next