Thread: Re: [COMMITTERS] pgsql: Allow GiST distance function to return merely a lower-bound.
Re: [COMMITTERS] pgsql: Allow GiST distance function to return merely a lower-bound.
From
Heikki Linnakangas
Date:
On 05/15/2015 03:17 PM, Heikki Linnakangas wrote: > On 05/15/2015 03:05 PM, Fujii Masao wrote: >> Seems this patch causes the regression test of pg_trgm fail. >> The regression diff that I got is: >> >> *** /home/postgres/pgsql/head/contrib/pg_trgm/expected/pg_trgm.out >> 2013-07-23 16:46:22.212488785 +0900 >> --- /home/postgres/pgsql/head/contrib/pg_trgm/results/pg_trgm.out >> 2015-05-15 20:59:16.574926732 +0900 >> *************** >> *** 2332,2343 **** >> (3 rows) >> >> select t <-> 'q0987wertyu0988', t from test_trgm order by t <-> >> 'q0987wertyu0988' limit 2; >> ! ?column? | t >> ! ----------+------------- >> ! 0.411765 | qwertyu0988 >> ! 0.5 | qwertyu0987 >> ! (2 rows) >> ! >> drop index trgm_idx; >> create index trgm_idx on test_trgm using gin (t gin_trgm_ops); >> set enable_seqscan=off; >> --- 2332,2338 ---- >> (3 rows) >> >> select t <-> 'q0987wertyu0988', t from test_trgm order by t <-> >> 'q0987wertyu0988' limit 2; >> ! ERROR: index returned tuples in wrong order >> drop index trgm_idx; >> create index trgm_idx on test_trgm using gin (t gin_trgm_ops); >> set enable_seqscan=off; > > Hmm, OK. pg_trgm works for me, but I'll take a look. (rover_firefly also > went red, due to rounding differences in the regression test) There are two issues here: 1. I forgot to initialize the "recheck" variable before calling the distance function. pg_trgm's distance function is never lossy, and it doesn't set recheck to anything. If 'recheck' happens to be true, because it's uninitialized, the executor node will try to reorder the tuples. 2. There is confusion on the datatype of the distance. pg_trgm's <-> operator returns float4, but the GIST code and pg_trgm's GIST distance function always returns a float8. Gist thus returns the distance as a float8, but the executor node re-calculates it as a float4, and then tries to compare the two using float8 < operator. The immediate problem goes away if we fix 1. and initialize the variable, as the executor won't try to re-calculate or compare the ORDER BY expressions if the index never returns a lossy tuple, but the confusion on the datatypes is still real. It's not very nice that gist has a hardcoded assumption that the distance is always measured as a float8. It would be better to support whatever datatype the distance function returns, and use the type's comparison functions. But as a quick fix, I think we should add a special case for float4. Gist should check if the datatype of the original ordering operator is float4, and convert the float8 used internally to float4 before returning it. If the datatype is anything other than float4 or float8, throw an error. - Heikki