Re: record identical operator - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: record identical operator
Date
Msg-id 20131003222231.GT2706@tamriel.snowman.net
Whole thread Raw
In response to Re: record identical operator  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: record identical operator  (Hannu Krosing <hannu@krosing.net>)
List pgsql-hackers
* Robert Haas (robertmhaas@gmail.com) wrote:
> You could argue that HOT isn't user-visible, but we certainly advise
> people to think about structuring their indexing in a fashion that
> does not defeat HOT, so I think to some extent it is user-visible.

I do think saying HOT is user-visible is really stretching things and do
we really say somewhere "please be careful to make sure your updates to
key fields are *BINARY IDENTICAL* to what's stored or HOT won't be
used"?  If anything, that should be listed on a PG 'gotchas' page.

> Also, in xfunc.sgml, we have this note:
>
>         The planner also sometimes relies on comparing constants via
>         bitwise equality, so you can get undesirable planning results if
>         logically-equivalent values aren't bitwise equal.

And that would be under "C-Language Functions", which also includes
things like "take care to zero out any alignment padding bytes that
might be present in structs".

> There are other places as well.  If two node trees are compared using
> equal(), any Const nodes in that tree will be compared for binary
> equality.

These would primairly be cases where we've created a Const out of a
string, or similar, which the *user provided*, no?  It strikes me as at
least unlikely that we'd end up storing a given string from the user in
different ways in memory and so this consideration, again, makes sense
for people writing C code but not for your general SQL user.

> So for example MergeWithExistingConstraint() will error out
> if the constraints are equal under btree equality operators but not
> binary equal.  equal() is also used in various places in the planner,
> which may be the reason for the above warning.

I wonder if this would need to be changed because you could actually
define constraints that operate at a binary level and therefore don't
overlap even though they look like they overlap based on btree equality.

> The point I want to make here is that we have an existing precedent to
> use bitwise equality when we want to make sure that values are
> equivalent for all purposes, regardless of what opclass or whatever is
> in use.  There are not a ton of those places but there are some.

I agree that there are some cases and further that these operators
provide a way of saying "are these definitely the same?" but they fall
down on "are these definitely different?"  That makes these operators
useful for these kinds of optimizations, but that's it.  Providing SQL
level optimization-only operators like this is akin to the SQL standard
defining indexes.

> Sure, that'd work, but it doesn't explain what's wrong with Kevin's
> proposal.  You're basically saying that memcpy(a, b, len) is OK with
> you but if (memcmp(a, b, len) != 0) memcpy(a, b, len) is not OK with
> you.  I don't understand how you can endorse copying the value
> exactly, but not be OK with the optimization that says, well if it
> already matches exactly, then we don't need to copy it.

Adding new operators isn't all about what happens at the C-code level.

That said, I agree that PG, in general, is more 'open' to exposing
implementation details than is perhaps ideal, but it can also be quite
useful in some instances.  I don't really like doing that in top-level
operators like this, but it doesn't seem like there's a whole lot of
help for it.  I'm not convinced that using the send/recv approach would
be all that big of a performance hit but I've not tested it.

> We can certainly rip out the current implementation of REFRESH
> MATERIALIZED VIEW CONCURRENTLY and replace it with something that
> deletes every row in the view and reinserts them all, but it will be
> far less efficient than what we have now.  All that is anybody is
> asking for here is the ability to skip deleting and reinserting rows
> that are absolutely identical in the old and new versions of the view.

If this was an entirely internal thing, it'd be different, but it's not.

> Your send/recv proposal would let us also skip deleting and
> reinserting rows that are ALMOST identical except for
> not-normally-user-visible binary format differences... but since we
> haven't worried about allowing such cases for e.g. HOT updates, I
> don't think we need to worry about them here, either.  In practice,
> such changes are rare as hen's teeth anyway.

I'm not entirely convinced that what was done for HOT in this regard is
a precedent we should be building on top of.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Sergey Konoplev
Date:
Subject: Any reasons to not move pgstattuple to core?
Next
From: Alexander Korotkov
Date:
Subject: Re: GIN improvements part 1: additional information