Re: Remove lossy-operator RECHECK flag? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Remove lossy-operator RECHECK flag?
Date
Msg-id 10447.1208194075@sss.pgh.pa.us
Whole thread Raw
In response to Re: Remove lossy-operator RECHECK flag?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Remove lossy-operator RECHECK flag?  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
I've committed the runtime-recheck changes.  Oleg had mentioned that
GIST text search could be improved by using runtime rechecking, but
I'll leave any refinements of that sort up to you.

One thing I was wondering about is that GIN and GIST are set up to
preinitialize the recheck flag to TRUE; this means that if someone
uses an old consistent() function that doesn't know it should set
the flag, a recheck will be forced.  But it seems to me that there's
an argument for preinitializing to FALSE instead.  There are four
possibilities for what will happen with an un-updated consistent()
function:

1. If we set the flag TRUE, and that's correct, everything is fine.

2. If we set the flag TRUE, and that's wrong (ie, the query is really
exact) then a useless recheck occurs when we arrive at the heap.
Nothing visibly goes wrong, but the query is slower than it should be.

3. If we set the flag FALSE, and that's correct, everything is fine.

4. If we set the flag FALSE, and that's wrong (ie, the query is really
inexact), then rows that don't match the query may get returned.

By the argument that it's better to break things obviously than to
break them subtly, risking case 4 seems more attractive than risking
case 2.

This also ties into my previous question about what 8.4 pg_dump should
do when seeing amopreqcheck = TRUE while dumping from an old server.
I'm now convinced that the committed behavior (print RECHECK anyway)
is the best choice, for a couple of reasons:
* It avoids silent breakage if the dump is reloaded into an old server.
* You'll have to deal with the issue anyhow if you made your dump with the older version's pg_dump.

What this means is that, if we make the preinitialization value FALSE,
then an existing GIST/GIN opclass that doesn't use RECHECK will load
just fine into 8.4 and everything will work as expected, even without
touching the C code.  An opclass that does use RECHECK will fail to
load from the dump, and if you're stubborn and edit the dump instead
of getting a newer version of the module, you'll start getting wrong
query answers.  This means that all the pain is concentrated on the
RECHECK-using case.  And you can hardly maintain that you weren't
warned about compatibility problems, if the dump didn't load ...

On the other hand, if we make the preinitialization value TRUE,
there's some pain for users whether they used RECHECK or not,
and there won't be any obvious notification of the problem when
they didn't.

So I'm thinking it might be better to switch to the other
preinitialization setting.  Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Brendan Jurd"
Date:
Subject: Re: Lessons from commit fest
Next
From: Alvaro Herrera
Date:
Subject: Re: Lessons from commit fest