Re: record identical operator - Mailing list pgsql-hackers

From Robert Haas
Subject Re: record identical operator
Date
Msg-id CA+Tgmoar70E+oQGnQvJUBUP37_0dKCaz4nm-us9PNu6cbDRGNA@mail.gmail.com
Whole thread Raw
In response to Re: record identical operator  (Stephen Frost <sfrost@snowman.net>)
Responses Re: record identical operator
Re: record identical operator
List pgsql-hackers
On Mon, Sep 23, 2013 at 1:19 PM, Stephen Frost <sfrost@snowman.net> wrote:
> So I've gathered and I'm not terribly surprised that it's broken.  It
> was unfortunate that we committed it as such, imv.  I'm *not* convinced
> that means we should build on that in a direction that doesn't appear to
> be getting us any closer to real matviews.  Being in master doesn't make
> it released.
>
>> This is an improvement
>> over what is available in 9.3, because even though you still have to
>> rerun the full query, you don't have to lock out access to the table
>> in order to apply the changes.
>
> I see how you can view it as an improvement, but consider what you'd say
> if a user showed up on IRC wanting to implement *exactly* this in their
> DB using a cronjob and a bit of plpgsql.  Would we encourage him/her and
> say "gee, that's a great approach, just compare the whole row!"  Heck
> no, we'd say "uhh, you need a key there that you can depend on when
> you're doing your updates."  We'd probably even suggest that they, I
> dunno, make that key their *primary key* for the matview.  If they asked
> what would happen with their little plpgsql code when using citext or
> other, similar, type, we'd tell them exactly what would happen when
> using such a type with regular updates, and maybe even describe how they
> could get themselves into trouble if they tried to issue updates which
> changed those key fields and therefore run into possible lock contention
> from the unique index on values-that-are-not-really-unique-to-them.
>
> I can't see anyone encouraging that and here we are building a major new
> feature on it!  To be honest, I'm amazed that I appear to be alone with
> this.

I've written various scripts over the years for this kind of thing,
and they all compared the whole row.  I used a PK comparison to
determine which row needed to be updated and then compared the
remaining fields to figure out which values needed to be updated.  I
had assumed that's how most people did it; what do you do?

Anyway, that is exactly what Kevin is proposing to do here and, to be
clear, he's NOT proposing to use the binary-identical semantics to
identify the row to be updated.  That will happen using the semantics
of whatever index the user chooses to create on the PK column.
Rather, he's only using the binary-identical to decide which rows are
completely unchanged.  I might be wrong here, but it seems to me that
that renders most of your argument here moot. Under Kevin's proposal,
changing a citext column that acts as a PK for the matview WON'T cause
the row to be deleted and reinserted, but what it will do is say, oh,
it's the same row (the values are equal) but the case is different
(the rows are not binary-equal), let me go update the PK column with
the new value.  From where I stand, that seems like exactly the right
behavior.  What do you think should happen instead?

>> Now, the next project Kevin's going to work on, and that he was
>> working on when he discovered this problem, is incremental
>> maintenance: that is, allowing us to update the view *without* needing
>> to rerun the entire query.  This record comparison operator will be
>> just as important in that context.
>
> You state this but I don't see where you justify this claim..

The next sentence was intended as justification.

>> The *only* strategy refreshing a
>> materialized view that *doesn't* need this operator is the only we
>> have in 9.3: through out all the old data and replace it with the
>> result of re-executing the query.
>
> I really don't agree with this, but allow me to play along a bit.  What
> is going to happen with this incremental updates when you want to use an
> index to figure out what to update?

We already do use an index to figure out what to update with
concurrent refresh; or at least we can, when the query planner thinks
that is the cheapest strategy.  I don't imagine that that part will
change significantly for incremental updates.

> Are you going to build the index
> using your binary equality operator, which could allow *duplicates that
> are not really duplicates*, even in the externally-visible world? And
> then we'll be returning multiple rows to the user when we shouldn't be?

No, see above.  There's no intention that the user should need to use
this new opclass to make materialized views work correctly.  The user
is required to define a unique index in order to use REFRESH
MATERIALIZED VIEW CONCURRENTLY, and that unique index defines the
semantics used to compare existing rows to new candidate rows.  Only
after we've matched up the old and new rows do we use the
exact-equality semantics to decide whether to update the existing row
or do nothing.

> There's a whole slew of evil examples of this- it's called unnecessary
> DISTINCT's.  Or perhaps you'll simply look up based on the actual
> equality answer and then *pick one* to use, perhaps the most recent,

Sure, all of that would suck, but nobody's proposing anything like that.

> but
> then that may or may not equal what the actual query would generate
> *anyway* because of costing, join ordering, etc, etc, as I already
> pointed out.

Most queries I write produce the same results every time.  The order
of the rows may vary when I don't care, but I take care to control the
query so that the contents of the rows don't vary.  For example, if I
use string_agg() to group rows, I make sure the rows always arrive in
the same order, using ORDER BY.  The reason I do this is because, even
today, failing to do so causes too many user-visible artifacts.
However, I admit that people who don't adhere to that practice will
see additional anomalies with materialized views just as they do when
they run the same query repeatedly.  However, that will be true in any
materialized view implementation and has little or nothing to do with
the present proposal.

> You're trying to guarantee something here that you can't.
> Trying to fake it and tell the users "oh, you'll be ok" is actually
> worse because it means they'll try and depend on it and then get all
> kinds of upset when it ends up *not* working.

I don't know what this is referring to.

> I'd much rather tell users "look, if you want to use these for *data*
> fields, that's fine, because our smart matview will figure out the keys
> associated with a GROUP BY and will update based on those, but if you
> GROUP BY a column with a type whose equality operator allows things to
> be different, you'll get *the same behavior you would get from the
> query*, which is to say, it won't be deterministic."

Again, Kevin is proposing something strictly BETTER than this.  We
*will* figure out the keys associated with the GROUP BY and we *will*
update based on those, assuming you define the unique index on the
materialized view with the same opclass used for the GROUP BY.
However, if you GROUP BY a column with a type whose equality operator
allows things to be different, then the materialized view will
reference the actual value produced by the last execution of the
query, rather than possibly producing something else.

>> If you want anything smarter than
>> that, you MUST compare old and new rows for equality so that you can
>> update only those rows that have been changed.  And if you compare
>> them *any strategy other than the one Kevin is proposing*, namely
>> binary equality, then you may sometimes decide that a row has not been
>> changed when it really has, and then you won't update the row, and
>> then incremental maintenance will be enabled to produce *wrong
>> answers*.  So to me this has come to seem pretty much open and shut.
>
> The incremental maintenance approach, I'd hope, would be keeping track
> of what changed and what rows in the view need to be regenerated due to
> those underlying rows changing.  Doing that, you *would* know what
> needed to be updated, and how.  If you're not keeping track of the
> specific rows that changed underneath, then I'm curious how these
> incremental matviews will work differently from what's being described
> here, where we're trying to make the whole freakin' row the key.

Kevin posted an email discussing the algorithm that he was proposing
to implement back in May.

http://www.postgresql.org/message-id/1368561126.64093.YahooMailNeo@web162904.mail.bf1.yahoo.com

I would suggest that you read the referenced papers for details as to
how the algorithm works.  To make a long story short, they do need to
keep track of what's changed, and how.  However, that still seems
largely orthogonal to the present discussion.

>> We all know that materialized views are not going to always match the
>> data returned by the underlying query.  Perhaps the canonical example
>> is SELECT random().  As you pointed out upthread, any query that is
>> nondeterministic is a potential problem for materialized views.  When
>> you write a query that can return different output based on the order
>> in which input rows are scanned, or based on any sort of external
>> state such as a random-number generator, each refresh of a
>> materialized view based on that query may produce different answers.
>
> Of course.
>
>> There's not a lot we can do about that, except tell people to avoid
>> using such queries in materialized views that they expect to be
>> stable.  However, what we're talking about here is a completely
>> separate problem.  Even if the query is 100% deterministic, without
>> this patch, the materialized view can get out of sync with the query
>> results.
>
> Only a subset of the queries involving these types are 100%
> deterministic and those are the stupid simple cases where the view is
> just being used as a synonym.  As soon as you start doing anything that
> actual uses the equality operator (and we have a *lot* of things that
> do!), you're going to get nondeterministic behavior.  You're trying to
> paper over this and I'm not impressed.

This does not match my experience.  I have written many complex views
over the years which it would have been beneficial to materialize.
For example, I had a 34-table join involving multiple aggregates in
one web app I deployed.  I would not consider that to be using the
view "just as a synonym", but I can assure you that the results were
quite deterministic.  That view involved both text and numeric data,
as I recall.

> Or are you telling me that a matview using the binary-equality technique
> will be deterministic when your GROUP BY a citext column in the view?  I
> don't buy it.

No, I'm not telling you that.  That would be a stupid thing to say, so
I'm glad I'm not saying it.

>> Granted, most of the ways in which it can get out of sync are fairly
>> subtle: failing to preserve case in a data type where comparisons are
>> text-insensitive; gaining or loosing an all-zeroes null bitmap on an
>> array type; adding or removing trailing zeroes after the decimal point
>> in a numeric.  If the materialized view sometimes said "1" when the
>> query was returning "0", we'd presumably all say "that's a bug, let's
>> fix it".  But what we're actually talking about is that the query
>> returns "0.00" and the view still says zero.  So we're doing a lot of
>> soul-searching about whether that's unequal enough to justify updating
>> the row.  Personally, though, there's not a lot of doubt in my mind.
>
> If it was as open-and-shut as that, and we *could* be 100%
> deterministic in *all* cases with these special types when going through
> views, I'd be a lot more on-board.  However, we can't; not without
> insisting that the users use some new set of "binary-aggregation"
> operators which are able to *pick* which of the equal-but-not-really
> options they want; and by-the-way, you can *bet* that's going to need to
> be type-specific.

We don't have to control which operators users can select.  The
existing implementation, as committed, lets the user control that by
choosing which unique indexes to define.  It's actually a little bit
clunky at the moment; if you define multiple unique indexes on the
materialized view, it seems that we consider a row to be "the same
row" only if the equality operators for all columns in all such
indexes think that the rows hasn't been updated.  We might want to
revisit that; I can see insisting that the user pick exactly one
unique index to define the comparison semantics, and giving them
explicit syntax for doing so, precisely to eliminate some of the lock
contention and other problems that might occur from excess updates.  I
guess the importance of such a change depends largely on how likely
you think it is that someone might put a unique index on a
materialized view for some reason other than to define the comparison
semantics.  But whether we change that or not, the basic point you are
concerned about here is already covered.

>> If I create a table and I put 0 into a column of that table and then
>> create a materialized view and that 0 ends up in the materialized
>> view, and then I update the 0 to 0.00 and refresh the view, I expect
>> that change to propagate through to the materialized view.  It works
>> that way if I select from a regular, non-materialized view; and it
>> also works that way if I select from the table itself.
>
> Sure- and then you try to do the same thing with a join or a group by.
> Just because it works in the simple SELECT case doesn't mean we get to
> tell users "you can depend on this!"

No, but I think the case you're concerned about will work
mostly-acceptably today and even better after the proposed patch.  If
you disagree, perhaps you could prepare a test case demonstrating the
problem.  I don't think that either Kevin or myself are trying to ram
through a feature that is half-baked here, and I certainly agree that
it is very important to get as much of this possible worked out before
9.4 arrives in stores everywhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: record identical operator
Next
From: Noah Misch
Date:
Subject: Re: dynamic shared memory