Re: record identical operator - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: record identical operator
Date
Msg-id CAHyXU0x3fijRVDJhNf9+9o5+BLZu2Orqas9dY+Gq1z+SESjz_g@mail.gmail.com
Whole thread Raw
In response to Re: record identical operator  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Tue, Sep 24, 2013 at 2:22 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> Now I admit that's an arguable point.  We could certainly define an
>> intermediate notion of equality that is more equal than whatever =
>> does, but not as equal as exact binary equality.
>
> I suggested it up-thread and don't recall seeing a response, so here it
> is again- passing the data through the binary-out function for the type
> and comparing *that* would allow us to change the interal binary
> representation of data types and would be something which we could at
> least explain to users as to why X isn't the same as Y according to this
> binary operator.
>
>> I think the conservative (and therefore correct) approach is to decide
>> that we're always going to update if we detect any difference at all.
>
> And there may be users who are surprised that a refresh changed parts of
> the table that have nothing to do with what was changed in the
> underlying relation, because a different plan was used and the result
> ended up being binary-different.  It's easy to explain to users why that
> would be when we're doing a wholesale replace but I don't think it'll be
> nearly as clear why that happened when we're not replacing the whole
> table and why REFRESH can basically end up changing anything (but
> doesn't consistently).  If we're paying attention to the records changed
> and only updating the matview's records when they're involved, that
> becomes pretty clear.  What's happening here feels very much like
> unintended consequences.

FWIW you make some interesting points (I did a triple take on the plan
dependent changes) but I'm 100% ok with the proposed behavior.
Matviews satisfy 'truth' as *defined by the underlying query only*.
This is key: there may be N candidate 'truths' for that query: it's
not IMNSHO reasonable to expect the post-refresh truth to be
approximately based in the pre-refresh truth.  Even if the
implementation happened to do what you're asking  for it would only be
demonstrating undefined but superficially useful behavior...a good
analogy would be the old scan behavior where an unordered scan would
come up in 'last update order'.  That (again, superficially useful)
behavior was undefined and we reserved the right to change it.  And we
did.  Unnecessarily defined behaviors defeat future performance
optimizations.

So Kevin's patch AIUI defines a hitherto non-user accessible (except
in the very special case of row-wise comparison) mechanic to try and
cut down the number of rows that *must* be refreshed.  It may or may
not do a good job at that on a situational basis -- if it was always
better we'd probably be overriding the default behavior.  I don't
think it's astonishing at all for matview to pseudo-randomly adjust
case over a citext column; that's just part of the deal with equality
ambiguous types.  As long as the matview doesn't expose a dataset that
was impossible to have been generated by the underlying query, I'm
good.

merlin



pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: [PATCH] bitmap indexes
Next
From: Peter Geoghegan
Date:
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE