Re: record identical operator - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: record identical operator
Date
Msg-id 20131003151233.GO2706@tamriel.snowman.net
Whole thread Raw
In response to Re: record identical operator  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: record identical operator  (Robert Haas <robertmhaas@gmail.com>)
Re: record identical operator  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
* Robert Haas (robertmhaas@gmail.com) wrote:
> I'm wary of inventing a completely new way of doing this.  I don't
> think that there's any guarantee that the send/recv functions won't
> expose exactly the same implementation details as a direct check for
> binary equality.

I don't follow this thought.  Changing the binary representation which
is returned when users use the binary-mode protocol would be a pretty
massive user-impacting change, while adding a new way of storing NUMERIC
internally wouldn't even be visible to end users.

> For example, array_send() seems to directly reveal
> the presence or absence of a NULL bitmap.

That's part of the definition of what the binary protocol for array *is*
though, so that's simply a fact of life for our users.  That doesn't
mean we can't, say, change the array header to remove the internal type
OID and use a mapping from the type OID that's on the tuple to the type
OID inside the array- as long as array_send() still produces the same
binary structure for the end user.

> Even if there were no such
> anomalies today, it feels fragile to rely on a fairly-unrelated
> concept to have exactly the semantics we want here, and it will surely
> be much slower.

I agree that it would be slower but performance should be a
consideration once correctness is accomplished and this distinction
feels a great deal more "correct", imv.

> Binary equality has existing precedent and is used in
> numerous places in the code for good reason.  Users might be confused
> about the use of those semantics in those places also, but AFAICT
> nobody is.

You've stated that a few times and I've simply not had time to run down
the validity of it- so, where does internal-to-PG binary equality end up
being visible to our users?  Independent of that, are there places in
the backend which could actually be refactored to use these new
operators where it would reduce code complexity?

> On the other hand, if you are *replicating* those data types, then you
> don't want that tolerance.  If you're checking whether two boxes are
> equal, you may indeed want the small amount of fuzziness that our
> comparison operators allow.  But if you're copying a box or a float
> from one table to another, or from one database to another, you want
> the values copied exactly, including all of those low-order bits that
> tend to foul up your comparisons.  That's why float8out() normally
> doesn't display any extra_float_digits - because you as the user
> shouldn't be relying on them - but pg_dump does back them up because
> not doing so would allow errors to propagate.  Similarly here.

I agree that we should be copying the values exactly- and I think we're
already good there when it comes to doing a *copy*.  I further agree
that updating the matview should be a copy, but the manner in which
we're doing that is using an equality check to see if the value needs to
be updated or not which is where things get a bit fuzzy.  If we were
consistently copying and updating the value based on some external
knowledge that the value has changed (similar to how slony works w/
triggers that dump change sets into a table- it doesn't consider "has
any value on this row changed?"; the user did an update, presumably for
some purpose, therefore the change gets recorded and propagated), I'd be
perfectly happy.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
Next
From: Stephen Frost
Date:
Subject: Re: record identical operator - Review